a abordagem poesia para a integraç ˜ao de dados e serviços na

164
A Abordagem POESIA para a Integrac ¸˜ ao de Dados e Servic ¸os na Web Sem ˆ antica Renato Fileto Tese de Doutorado

Upload: ngodung

Post on 08-Jan-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

A Abordagem POESIA para a IntegracaodeDadoseServicosna Web Semantica

RenatoFileto

Tesede Doutorado

InstitutodeComputac¸aoUniversidadeEstadualdeCampinas

A AbordagemPOESIA para a Integracaode Dadose Servicosna WebSemantica

RenatoFileto1

Outubrode2003

BancaExaminadora:

� Profa. Dra. ClaudiaBauzerMedeiros(Orientadora)

� Profa. Dra. AnaCarolinaSalgadoCentrodeInformatica—UFPE

� Prof.Dr. CaetanoTrainaJuniorICMSC—USP

� Prof.Dr. CaltonPuCollegeof Computing,Georgia Tech(USA)

� Prof.Dr. Celio CardosoGuimaraes

� Prof.Dr. EdmundoMadeira

� Prof.Dr. JoaoCarlosSetubal

1Estetrabalho recebeuapoioda Embrapa, CAPES,FuncampedosprojetosMCT/PRONEX-SAI eWeb-Maps do CNPq. Osco-orientadoresdo Georgia Techforam parcialmentesuportadospelosprogramasOpe-ratingSystemseITR(divisaoCISE/CCR) da NSF, por um contrato do programaSciDACda DoEeum contratodo programaPCES(IXO) da DARPA.

ii

Substituapelaficha catalografica

iii

A AbordagemPOESIA para a Integracaode Dadose Servicosna WebSemantica

Esteexemplarcorrespondearedacaofinal daTesedevidamentecorrigidaedefendidaporRenatoFi-letoeaprovadapelaBancaExaminadora.

Campinas,1 deDezembrode2003.

Profa. Dra. ClaudiaBauzerMedeiros(Orientadora)

Tese apresentadaao Instituto de Computac¸ao,UNICAMP, comorequisitoparcialparaaobtencaodo tıtulo deDoutoremCienciadaComputac¸ao.

iv

Substituapela folha coma assinaturada banca

v

c�

RenatoFileto,2003.Todososdireitosreservados.

vi

Quiero vivir enunmundoenelquelosseresseansolamentehumanos,sinmastıtulosqueesse,sindarseenla cabezaconunaregla,conunapalabra, conunaetiqueta...

Quiero quela granmayorıa, la unicamayorıa,todospuedanhablar, leer, escuchar, florecer.Noentendı la luchasinopara queestatermine.Noentendı nuncael rigor,sinopara queel rigor noexista.

He tomadouncaminoporqueesecaminonosllevaa todosa la felicidadduradera...Luchopor esabondadubicua,extensa,inexhaustible.

De todolo vivido,mequedaunafeabsolutaenel destinohumano.Unaconviccion cadavezmasconscientedequenosacercamosa unagran ternura.

Enesteminutocrıtico,enesteparpadeodeagonıa,sabemosqueentrar la luz definitivapor losojosentreabiertos.

Nosentenderemostodos.Progresaremosjuntos.Yestaesperanzaesirr evocable.

PabloNeruda– Confiesoquehevivido

vii

Aosmeusavos,Lazaro (in memorian)eBenedita,cujo exemplodeamor, serenidadeededicacao,

permite-nosaomenoscontemplara formamaissublimedesabedoria.

viii

Agradecimentos

Aosmeuspais, minhairma e todosos meusfamiliarese amigos,quesouberam amar-medaformacomosou,apoiando-mepara atingir meuspropriosobjetivoseorgulhando-sedoqueeupossarealizardebom.

A Profa. ClaudiaBauzerMedeiros,pessoadeadmiravelcapacidadeedomaiselevadocarater.Obrigado,decoracao,pelaatencao,pacienciae oportunidadesconcedidosa todososprivile-giadosemte-la comoorientadora.

Aosmeusco-orientadores,CaltonPu e Ling Liu, pelaatencao e estımuloquemedispensaramdurantea estadanoGeorgia Tech eateosdiasatuais.

AosfuncionariosdaUnicampedoGeorgia Tech,quesaocapazesdedesempenharsuasfuncoescomgentilezasuficienteatepara dissolverasminhasinquietudes.

Aoscolegasda Unicampe do Georgia Tech, quecompartilhamseusconhecimentos,empre-gam muito do seutempoem benefıcio de todose tornam essesambientesacademicosmaisagradaveisedivertidos.

Aosamigoscomquemsempre possocontar: Evandro, Fatima, Nilza, Dna. Terezinha,Nair,Daniel, Mirja, Sandro, Ze Luiz, Arnold, Marcos,Felipe, Eric, Jean,Yoko, Emmanuel,Ellen,Claudio,Ana,Hend,... Todostemmeensinadomuito,tornadominhavidamaisbelaedemons-tradoa grandezae a similaridadeda essenciae dasaspiracoeshumanas,independentementederaca, cultura, idadeeoutrosatributos.

A Embrapa, pelo apoio e confianca, incluindo a manutenc¸ao dos vencimentose a ajuda decolegascompetentes,especialmenteno entendimentodasnecessidadesdasaplicacoes.A CA-PESpeloauxılio e bomatendimentoduranteo doutoradosanduıche. Ao CNPq,queviabilizoua participacao emdiversoseventoscientıficose contribui para a manutenc¸ao da boa infrae-strutura do laboratorio LIS, atravesdosprojetosMCT/PRONEX-SAI(SistemasAvancadosdeInformacao) eWebMaps.

Ao povo brasileiro, quecontinuasustentandotudoisso;especialmenteaosmaishumildes,queasvezesnemtemconscienciadisso.Queestanacao aprendaa distribuir oportunidadesparatodos. Que va alemdo desenvolvimentoeconomico, tecnologico, cientıfico e cultural, paradar suasingularcontribuicao a humanidade– mostrar caminhospara o convıvio afetuosodaspessoas,dentro detodadiversidade.

ix

Resumo

POESIA(Processesfor Open-EndedSystemsfor InformationAnalysis), a abordagempropostanestetrabalho,visa a construc¸ao de processoscomplexos envolvendointegracao e analisededadosdediversasfontes,particularmenteemaplicacoescientıficas.A abordagemecentradaemdoistiposdemecanismosdaWebsemantica:workflows cientıficos,paraespecificare comporservicosWeb;e ontologiasdedomınio, paraviabilizar a interoperabilidadee o gerenciamentosemanticosdosdadoseprocessos.

As principaiscontribuicoesdestatesesao: (i) um arcabouc¸o teorico paraa descricao, lo-calizacao e composic¸ao de dadose servicos na Web,com regrasparaverificar a consistenciasemanticade composic¸oesdessesrecursos;(ii) metodosbaseadosem ontologiasde domınioparaauxiliar a integracao de dadose estimara provenienciade dadosem processoscoopera-tivos na Web; (iii) implementac¸ao e validacao parcial daspropostas,em uma aplicacao realno domınio deplanejamentoagrıcola,analisandoosbenefıciose aslimitacoesdeeficienciaeescalabilidadedatecnologiaatualdaWebsemantica,faceagrandesvolumesdedados.

x

Abstract

POESIA(Processesfor Open-EndedSystemsfor InformationAnalysis),theapproachproposedin this work, supportsthe constructionof complex processesthat involve the integrationandanalysisof datafrom several sources,particularly in scientificapplications.This approachiscenteredin two typesof semanticWebmechanisms:scientificworkflows, to specifyandcom-poseWebservices;anddomainontologies,to enablesemanticinteroperabilityandmanagementof dataandprocesses.

Themaincontributionsof this thesisare: (i) a theoreticalframework to describe,discoverandcomposedataandservicesontheWeb,includingrulesto checkthesemanticconsistency ofresourcecompositions;(ii) ontology-basedmethodsto helpdataintegrationandestimatedataprovenancein cooperative processeson the Web; (iii) partial implementationandvalidationof the proposal,in a real applicationfor the domainof agriculturalplanning,analyzingthebenefitsand scalability problemsof the currentsemanticWeb technology, when facedwithlargevolumesof data.

xi

Conteudo

Resumo x

Abstract xi

1 Intr oducao 11.1 Motivacaoe Contexto do Trabalho . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Organizac¸aodaTese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 A Survey on Inf ormation SystemsInteroperability 72.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 InformationSystemsInteroperability. . . . . . . . . . . . . . . . . . . . . . . 8

2.2.1 Viewpointsof SystemsInteroperability . . . . . . . . . . . . . . . . . 92.2.2 TechnologiesaddressingInteroperability . . . . . . . . . . . . . . . . 10

2.3 DatabaseSystemsInteroperability . . . . . . . . . . . . . . . . . . . . . . . . 112.3.1 CentralizedDatabaseSystems . . . . . . . . . . . . . . . . . . . . . . 112.3.2 HeterogeneousDatabaseSystems . . . . . . . . . . . . . . . . . . . . 122.3.3 IntegratedAccessto Multiple Databases. . . . . . . . . . . . . . . . . 132.3.4 WebDatabases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 DataIntegration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4.1 DataStructuring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.2 CharacterizingDataHeterogeneity. . . . . . . . . . . . . . . . . . . . 162.4.3 SolvingSyntacticandStructuralConflicts . . . . . . . . . . . . . . . . 162.4.4 ReconcilingSemantics. . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.5 TheDataIntegrationSteps . . . . . . . . . . . . . . . . . . . . . . . . 17

2.5 Building Blocksto IntegrateDatain CooperativeSystems. . . . . . . . . . . . 182.5.1 Gateways . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5.2 WrappersandMediators . . . . . . . . . . . . . . . . . . . . . . . . . 192.5.3 DataWarehouses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5.4 TheView Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

xii

2.6 TheSemanticWeb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6.1 XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6.2 RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.6.3 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7 WebServices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.7.1 ArchitectureandBasicStandards . . . . . . . . . . . . . . . . . . . . 302.7.2 CooperativeDistributedProcessesenabledby WebServices . . . . . . 32

2.8 ApplicationsandSupportingEnvironments . . . . . . . . . . . . . . . . . . . 342.8.1 ScientificWorkflows . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.8.2 GeographicInformationSystemsInteroperability . . . . . . . . . . . . 35

2.9 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3 POESIA: An Ontological Workflow Approachfor ComposingWebServicesin Agricultur e 383.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2 Applicationscenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.2.1 Agricultural zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.2 Casestudy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.2.3 Technicalchallenges. . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.3 Ontologicaldelineationof utilizationscopes. . . . . . . . . . . . . . . . . . . 443.3.1 Semanticrelationshipsbetweenwords . . . . . . . . . . . . . . . . . . 443.3.2 POESIAontologiesandontologicalcoverages . . . . . . . . . . . . . 47

3.4 ThePOESIAactivity model . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493.4.2 Activity pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.4.3 Activity patternaggregation . . . . . . . . . . . . . . . . . . . . . . . 533.4.4 Activity patternspecialization . . . . . . . . . . . . . . . . . . . . . . 553.4.5 Thecombinedrefinementmechanism. . . . . . . . . . . . . . . . . . 573.4.6 Processframework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.5 Implementationissues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.5.1 Checkingspecifications . . . . . . . . . . . . . . . . . . . . . . . . . 623.5.2 ComposingWebservices:animplementationperspective . . . . . . . . 643.5.3 Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.6 Relatedwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.7 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4 Using Domain Ontologiesto Help Track Data Provenance 714.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714.2 MotivatingExample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

xiii

4.3 POESIAOntologiesandOntologicalCoverages. . . . . . . . . . . . . . . . . 744.4 OntologicalEstimationof DataProvenance . . . . . . . . . . . . . . . . . . . 754.5 OntologicalNetsfor DataIntegration . . . . . . . . . . . . . . . . . . . . . . 77

4.5.1 DataIntegrationOperators. . . . . . . . . . . . . . . . . . . . . . . . 784.5.2 DataReconcilingthroughArticulation of Ontologies . . . . . . . . . . 814.5.3 SemanticWorkflows . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.6 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.7 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5 Applying SemanticWebTechnologyin Agricultural Sciences 865.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.2 Motivation:AgriculturalZoning . . . . . . . . . . . . . . . . . . . . . . . . . 875.3 SolutionContext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

5.3.1 ThePOESIAApproach. . . . . . . . . . . . . . . . . . . . . . . . . . 905.3.2 POESIAOntologiesasWebServices . . . . . . . . . . . . . . . . . . 91

5.4 An Ontologyfor theAgricultureRealm . . . . . . . . . . . . . . . . . . . . . 925.4.1 TheOntologyDesign. . . . . . . . . . . . . . . . . . . . . . . . . . . 925.4.2 TheOntologyon Protege . . . . . . . . . . . . . . . . . . . . . . . . . 93

5.5 Exploiting OntologicalRelationships. . . . . . . . . . . . . . . . . . . . . . . 945.5.1 OntologicalCoveragesto ExpressandInterrelateScopes. . . . . . . . 945.5.2 RepresentingOntologicalRelationships. . . . . . . . . . . . . . . . . 985.5.3 DefiningOntologyViews . . . . . . . . . . . . . . . . . . . . . . . . . 99

5.6 EngineeringConsiderationsandSystemsEvaluation . . . . . . . . . . . . . . 1005.6.1 ArchitectureandDesignTradeoffs . . . . . . . . . . . . . . . . . . . . 1005.6.2 ConstructingOntologyViews . . . . . . . . . . . . . . . . . . . . . . 1025.6.3 ExperimentalEvaluation . . . . . . . . . . . . . . . . . . . . . . . . . 103

5.7 RelatedWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.8 Conclusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

6 Conclusoes 1106.1 Contribuicoes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1106.2 Extensoes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Bibliografia 113

I Formal Definitions and Propertiesfor POESIA 134

II POESIA Ar chitecture and Implementation Issues 148

xiv

Capıtulo 1

Intr oducao

“The onlyabnormalityis theincapacityto love.”

AnaısNin

1.1 Moti vacaoe Contextodo Trabalho

A motivacaodestetrabalhoe a construc¸ao de sistemascomputacionaisparaa coleta,integra-caoe processamentodedados,visandoa extracaode informacaoemaplicacoescientıficasnaagricultura.A aplicacaoutilizadacomoestudodecasoeo zoneamentoagrıcola– determinac¸aodasterrasmaisapropriadasparao cultivo dediferentesculturasemumadadaregiaogeografica.Um processodezoneamentoagrıcolaclassificaasterrasemparcelasdeacordocomo seugraudeaptidaoparaumadeterminadaculturae asepocasdo anomaisindicadasparaa realizacaodostratosculturais(taiscomo,plantioeadubac¸ao).O objetivo edeterminarasmelhoresopcoesparao usoprodutivo esustentavel dasterras.

As informacoesresultantesdo zoneamentoagrıcolasaofundamentaisparao planejamentoe gerenciamentode toda a logıstica da producao e distribuicao. Orgaosgovernamentaiseinstituicoesfinanceiras,por exemplo,baseiam-senessasinformacoesparadefinir e executarpolıticasdeconcessaodeemprestimosagrıcolas.Essaspolıticasvisamdirecionarosfazendei-rosparapraticasquecontribuamparaminimizarosriscose aumentara produtividadedeseusempreendimentos.Experienciasemdiversossetoresdaagriculturabrasileiranosultimosanoscomprovamosbenefıciosdessetipo deabordagem.

O zoneamentoagrıcolaenvolveaanalisedediversosfatorestaiscomoclima,relevo e tiposde solo,de modoa compatibilizarasnecessidadesdasculturas,nasdiversasfasesdo seude-senvolvimento,comascondicoesambientaisesperadasnasdiferentesregioesaolongodo ano.Osdadosnecessariosaoprocessamentoe analisedeinformacoessaoobtidosdefonteshetero-

1

1.1. MotivacaoeContexto doTrabalho 2

geneas,incluindosensoresparacoletardadosdefenomenosfısicose biologicos(por exemplo,estac¸oesmeteorologicas,satelitese dispositivos de automac¸ao laboratorial). Muitas vezesenecessario integrar dadosoriundosde sistemaslegadose de diferentesinstituicoes,a fim deminimizaroscustoscomcoletadedadoseconseguir volumeeamostragemespaciale temporalsuficientesparaaobtencaoderesultadosconfiaveis.

Um processodezoneamentoagrıcolaenvolveacooperac¸aodediversasespecialistas,traba-lhandoemorganizac¸oesdistintaseutilizandoumagrandevariedadedeplataformascomputaci-onaise ferramentasdeanalisededados.Porexemplo,agronomoscontribuemcomtecnicasdeplantioemodelosdegerenciamentodelavourase biologosfornecemosrequisitosnutricionaisparao bomdesenvolvimentodasplantas.Estatısticosfazemaanalisederiscosdeperdasnasla-vouras(porexemplo,devido asecaougeada).Ambientalistasavaliamo impactodaselecaodeculturaagrıcolasobreo meioambiente,a curtoe longoprazo.Emsuma,diversosespecialistascombinamasuaexperienciaeumagamaderecursoscomputacionaisparaconstruirmodelosdezoneamentoagrıcola. Essesmodelose osprocessoscomputacionaisqueelesoriginamvariamcomaculturaagrıcola,regiaogeograficaepraticasdosespecialistase instituicoesenvolvidos.

O desafio,do pontodevistadesistemasdeinformacao,e organizare conectarosrecursoscomputacionais(dadose servicos) necessarios. Al em disso,e fundamentalpromover o reusode tais recursos,permitindotambem suacomposic¸ao. A importanciado reusonestetipo dedomınio podeseravaliadausandoum exemplosimples.Considereo desenvolvimentodepro-cessosdezoneamentoagrıcolaparaas20principaisculturasagrıcolasnoBrasil,e10variedadesdistintasdecadacultura(comdiferentesrequisitosclimaticose nutricionais).Dividir o Brasildeacordocomasfronteirasestaduais(27 estados)resultaem5400modelos.Todavia, grandepartedosrecursoscomputacionaisutilizadose mesmoda estruturadosprocessosresultantespodesercompartilhada.A dificuldadeempromover o reusoresideem gerenciaro acervo demodelose recursoscomputacionais,demodoa promover suacomposic¸aoemprocessoscadavezmaissofisticados.Metodossistematicoseautomatizadosparagerenciartaisrecursosepro-cessossaocruciais,poiso gerenciamentomanuale caroesujeitoaerros.Pararesponderaestedesafio,saonecessariosresultadosemintegracaodedados,interoperabilidadeecomposic¸aodeservicosnaWeb.

Integracao de dadosconsisteem produzirumavisao unificadade dadosheterogeneos,demodoa permitir o seuintercambioe processamentoconjunto. Propostasparasolucionaresseproblema,na maioria das vezes,partemdo pressupostode mundo fechado,e requeremaestipulac¸aodeum esquemaunico,paracompatibilizarasnecessidadesdedadosdeumaorga-nizacao.Visoesdo esquemaglobalpermitemrestringiro acessoecontemplarnecessidadeses-pecıficas.No entanto,o pressupostodemundofechadofrequentementesemostraimpraticavel,especialmenteno contexto deaplicacoesdistribuıdasnaInternet.

EstateseapresentaPOESIA (Processesfor Open-EndedSystemsfor InformationAnaly-sis) parafazerfrente a tais desafios. POESIA e uma abordagemparaa composic¸ao de da-

1.2. Organizac¸aodaTese 3

dos e servicos em processoscooperativos na Web semantica. Em POESIA, o intercambiode informacoese a cooperac¸ao de sistemasautonomosno processamentode dadosenvolveintegracaodedadosemdiversospontose emmultiplos estagios.A abordagemPOESIAcom-binaontologiasdedomınio, modelosdeatividadese workflows paraa composic¸aodeservicosna Web. Estaabordagemcomplementaoutraspropostasparaa recuperac¸ao, selecao e com-posicao de servicos, com novasfacilidadesparao gerenciamentodosrecursosutilizadosemprocessoscooperativos.

As principaiscontribuicoesdatesesao:

� descricaodosrequisitosdeprocessosdezoneamentoagrıcolae elaborac¸aodepropostasparacontempla-los;

� desenvolvimentode um arcabouc¸o teorico, baseadoem ontologiasde domınio, mode-los de atividadese workflows cientıficos, paraa descricao, organizac¸ao, recuperac¸ao ecomposic¸ao de servicos na Web, com regrasparaverificar a consistenciasemanticadecomposic¸oesderecursos;

� combinac¸aodeumaontologiadedomınio e descricoesdefluxosdedadosparaavaliar aprovenienciadedadoseauxiliara integracaodedadosemprocessosdistribuıdosnaWeb;

� validacao parcial do arcabouc¸o teorico, atraves da implementac¸ao de alternativasparalidar com grandesvolumesde dadosem um domınio especıfico, demonstrandoasdefi-cienciasda tecnologiaatualda Web semanticae propondoalternativas,que incluemacombinac¸aodetal tecnologiacommetodosconvencionaisdegerenciamentodedados.

1.2 Organizacaoda Tese

Os capıtulos centraisdestatesesao artigospublicadosou submetidosparapublicacao. Asdefinicoese a notacaoutilizadasemcadaartigo foramasquemelhorseenquadravamaosre-sultadosapresentadose/outrabalhosrelacionados.Assim,o leitor deve ficar atentoa algumasvariacoes.

OCapıtulo2 eumarevisaobibliograficasobreinteroperabilidadedesistemasdeinformacao,submetidaao corpoeditorial da serie relatorios tecnicosdo IC/Unicamp. Ela cobretrabalhosem interconexaode bancosde dadosrelacionais,classificac¸ao de problemasde integracao dedados,principaispadroesearquiteturas,alemdosmaisrecentesprogressosemWebsemantica,servicosWeb e workflows cientıficos. Estarevisaodescreve algunsdosproblemasemabertoabordadospelatese.Al emdisso,detalhaconceitosteoricosapenasmencionadosnoscapıtulossubsequentes,facilitandodestaformaa leitura.

O Capıtulo 3 (POESIA:An Ontological Workflow Approach for ComposingWeb Servicesin Agriculture) [83], salvo por pequenascorrecoesefetuadasnestaversaorevisadaparaa tese,

1.2. Organizac¸aodaTese 4

correspondea um artigo aceitoparapublicacao no VLDB Journal, volume12, numero4, de2003.Esteartigodescreve osfundamentosdaabordagemPOESIA.Ele mostracomoumaon-tologiadedomınio podeserutilizadaparaorganizarvastosrepertoriosdepadroesdeatividades,quedescrevem a composic¸ao de dadose servicos na Web parao processamentode dadosci-entıficos. EstapropostadePOESIAdefinesuaarquiteturae abordaa questaodeontologiadeformateorica. Oscapıtulossubsequentesabordamaspectosespecıficosdo desenvolvimentoemanipulac¸aodeontologiasnaimplementac¸aodeaplicacoes.

Umaontologiadedomınio emPOESIAe organizadaemmultiplasdimensoes(por exem-plo, espac¸o, tempo,instituicao,produtoagrıcola). Coberturasontologicas– tuplasde termostomadosdaontologia– descrevemo escopodeutilizacaodedadoseservicos,isto e,o contextoespecıfico emqueversoesdistintasdosservicospodemserutilizadas.Correlacoessemanticasentreescoposde aplicacao definemmeiospararecuperare comporrecursos,bemcomove-rificar a consistenciasemanticadascomposic¸oes. O artigo transcritono Capıtulo 3 defineoperac¸oes– agregacao,especializac¸aoe instanciac¸ao– paraapoiara composic¸aodeservicos.Essasoperac¸oes,aplicadasa padroesdeatividades,permitemdefinir frameworksdeprocessoscooperativose adapta-losdeacordocomnecessidadesespecıficas.Um framework deprocessoeconstituıdodeumconjuntodepadroesdeatividades,implementadasporservicosWeb,quesecomunicamparaatingir algumobjetivo comum(por exemplo,determinara aptidaoagrıcola).Cadapadraodeatividadeesta associadoa umacoberturaontologica,quedefineo seuescopodeutilizacao,deacordocomconceitosespecıficosdo domınio. A adaptac¸aodeum frameworkconsisteemselecionarversoesdepadroesdeatividade,deumahierarquiadeatividadese sub-atividadespararealizarumadadatarefa, referentesa um escopode utilizacao especıfico (porexemplo,determinaraaptidaoagrıcolaparacafeno Centro-Suldo Brasil).

OCapıtulo4 (UsingDomainOntologiesto HelpTrackDataProvenance) [84], foi publicadono SBBD 2003. Ele apresentaum metodobaseadoem ontologiade domınio, estruturadadamaneiraprescritanaabordagemPOESIA,paraestimaraprovenienciadedados,i.e.,adescricaodasorigensdeumdadoedoprocessoutilizadoparaproduzi-lo.O metodoapresentadoderivaaprovenienciadedadose capturaa semanticaoperacionaldeprocessosdeintegracaodedados,usandoaontologiaparadescreverecorrelacionarescoposegranularidadesdedados.

Osestudosdecasoutilizadosnesseartigoreferem-sea duasdatawarehouses:(i) atributosclimatologicose(ii) producaodefrutasnoBrasil. Ambasorganizamseusdadossegundoasdi-mensoestempoeterritorio, sendoqueaprimeiratambeminclui umadimensaoparaespecificarasorganizac¸oesresponsaveispelacoletadosdados,e a segundautiliza umacategorizacaodeprodutosagrıcolas,paraclassificaros tiposde frutasproduzidos.O artigomostracomoessasdimensoespodemserdescritaspor umaontologiamultidimensional,tal comoprescritopelaabordagemPOESIA.O processodecargadawarehouse(i), por exemplo,envolve diversosre-positoriosintermediarios,queproveemservicosdeacessoadadosclimatologicosprovidospordiferentesinstituicoes. Essesservicos tem diferentescoberturasespaciais.O artigo propoe a

1.2. Organizac¸aodaTese 5

utilizacao de coberturasontologicasparadelimitaro escopodosdiferentesservicos e estimarasfontesdedadosquecontribuemparaum dadofornecidopor um servico. O artigo tambemsugerecomocoberturasontologicaspodemauxiliarnaintegracaodedados,noscasosemqueafaltadeumidentificadorcomumpodesersanadapeladescricaodoescopodosdados,utilizandocoberturasontologicas.

O Capıtulo 5 (ApplyingSemanticWebTechnology in Agricultural Sciences), submetidoaoInformationSystemsJournal – SpecialIssueon SemanticWebandWebServices, reportaumaexperiencianaconstruc¸aoemanipulac¸aodeumaontologiaparao domınio agrıcola.Eleanalisaaslimitacoesdospadroese ferramentasatuaisdaWebsemanticaparalidar comgrandesvolu-mesdedadosdeontologiasreais.O artigoapresentaumasolucaoescalavel, baseadanacriacaodevisoesdaontologia,paraacarga,apresentac¸aoemanipulac¸aodessaontologia.Estasolucaoe implementadaemumabiblioteca,denominadaOntoCover, queconjugaa tecnologiadaWebsemanticacomtecnicastradicionaisdemanipulac¸aodedados.

A especificac¸aodaontologiamanipuladapeloOntoCoverpodeserproduzidacomumafer-ramentadeedicaodeontologiase exportadavia RDF. A estruturageraldaontologia(analogaa um diagramade classes)e semprecarregadade umaespecificac¸ao em RDF Schema,con-tida em um arquivo texto. As instanciaspodemsercarregadasde um arquivo texto contendosuasespecificac¸oesem RDF ou de um bancode dadosrelacional,mantendotriplas RDF ouinstanciasde entidadesde um modelode dadosconvencional(por exemploEstado,Cidade,etc.). O sistemadebancodedadosrelacionalprove acessoeficienteaosdados.O OntoCovercria a visao da ontologia,conformeespecificadopelo desenvolvedorda aplicacao, e permitevisualiza-la,navegarsobresuaestrutura,selecionartermosparacomporcoberturasontologicase compararessascoberturasontologicas. O OntoCover foi desenvolvido em Java e podeseracopladoa aplicacoesondeessasfacilidadesbasicassejamnecessarias. O artigo apresentao resultadode experimentosmostrandoquea carga e a criacao de visoesde uma ontologiapodemser realizadasmuito maiseficientementeutilizandobancosde dadosrelacionaiscommodelagemconvencional,do quemanipulandotriplasRDF representandoaspropriedadesdasinstanciasdeclassesdaontologia.

Finalmente,o Capıtulo 6 concluia tese,evidenciandosuascontribuicoeseextensoes.O Anexo I inclui um conjuntodedefinicoese demonstrac¸oes,descrevendoformalmenteas

propriedadesfundamentaisdasontologiasdedomınio e do modelodeatividadespropostosnaabordagemPOESIA.

O Anexo II apresentaa arquiteturageral de sistemasparaPOESIA,e descreve como aimplementac¸aorealizada,particularmenteo OntoCover, seinserenessaarquitetura.

Osoutrostrabalhospublicadosduranteo doutoradosaobrevementedescritosaseguir.

1. TheDesignof DecisionSupportSystemsfor EffectiveUseof Spatio-Temporal Data [85]foi apresentadono EDBTPh.D.Workshopde2000,e constituium esboc¸o do projetodetesenaquelemomento.

1.2. Organizac¸aodaTese 6

2. An XML-Centered Warehouseto Manage Information of the Fruit SupplyChain [86],publicadona World Conferenceon Computers in Agriculture and Natural Resources(WCCA)de2001,descreveumadatawarehousesobreaproducaodefrutasnoBrasil.

3. Issueson Interoperability of Heterogeneousand GeographicalData [82], publicadonoSimposioBrasileirodeGeoinformatica(GeoInfo)de2001,e umaresenhasobreintegra-caodedados,sobo enfoquedegeoprocessamento.

4. QueryingMultiple BioinformaticsInformation Sources: Can SemanticWeb ResearchHelp? [34], cujo autorprincipal e David Buttler, colega do Georgia Techdurantea es-tadia naquelainstituicao, foi publicadona revista SIGMOD Record 31(4) 2002. EsseartigodiscuteaspotenciaiscontribuicoesdaWebsemanticaabioinformatica.

5. AplicandoOntologiasdeObjetosGeograficospara Facilitar a NavegacaoemGIS[236],cujo autorprincipal e o alunode iniciacao cientıfica Lauro R. Venancio,foi aceitoparapublicacao no GeoInfo 2003. Esseartigo descreve o OntoCarta, uma ferramentaqueutiliza umaontologiade domınio parafacilitar a navegacao em mapase possibilitaraintegracaodeobjetosgeograficosnaWeb. O OntoCartaexecutasobrenavegadoresWeb,eaderenteaospadroesatuaisdaWebsemanticaeutiliza ferramentasdedomınio publico(inclusiveo OntoCover) nasuaimplementac¸ao.A ontologiadedomınio empregadaparanavegacaoemmapasdirigidaporconhecimentoeaqueladesenvolvidaparaapoiaraabor-dagemPOESIAnaareadeagricultura.

Chapter 2

A Surveyon Inf ormation SystemsInter operability

2.1 Intr oduction

The traditionalparadigmfor informationsystemsdevelopmentis basedon the cycle model-ing-design-implementation,andconsidersasingledatabaseframework, with oneschemausingonedatamodel. Theadventof heterogeneoussystemsand,morerecently, theWeb, is chang-ing this picture. Large amountsof dataareavailablein distinct formatsandplatforms. Datarepositoriesvariesfrom structureddatabasemanagementsystemsto unstructuredfiles. Thelack of agreementon datarepresentationandsemanticsacrossheterogeneoussystemsmakestheinteroperabilityproblemverycomplex.

Web systemsare in permanentevolution, with new devices, new datasourcesand newrequirements.Thepossibilityof dynamicconnectionsamongsystemscomponentson theWebaddscomplexity to thesituation.Thedemandfor interoperabilityhasboostedthedevelopmentof standardsandtools to facilitatedatatransformationandintegration.Nevertheless,therearestill many challengesto bemet,especiallythoseconcernedwith datasemanticsandbehavior ofcooperativesystems.

This work surveys someresultsfrom the literaturerelatedwith interoperabilityand,morespecifically, dataintegration. Our goal is theconstructionof datawarehouses(or materializedviews) integratingseveralkindsof datasources,particularlyfor scientificapplicationsin agri-culture. Data warehousesare a suitablestartingpoint for researchandexperimentson dataintegration. Themaintenanceof consolidateddataat thewarehouseconfersgreaterversatilityto datarepresentationandmanipulation.The unidirectionalflow of datafrom the sourcestothe warehouse,aswell asthe warehouseupdatepolicy which doesnot requireon-line accessto datasources,simplifiesdataprocessing.The problemcanbe decomposedinto two steps(i) extractingdatafrom the sourcesto feedthe warehouse,and(ii) integratingthesemultiple

7

2.2. InformationSystemsInteroperability 8

sourcedatainto thewarehouse.Theemphasisof this work is on thesecondstep.Thefocusison representationalandsemanticissues,andthefundamentaldataintegrationproblems.

Distinctdatasourcesmaybemaintainedindependently. In fact,autonomousmanagementofdatabasesis frequentlya prerequisitefor informationsystems.However, valuableinformationmay be extractedwhencollectionsof dataobtainedfrom differentdatasourcesareanalyzedasa whole. The integratedanalysisof datafrom differentsourcestriggersa wide variety ofdataheterogeneityproblems.Furthermore,connectionof autonomousheterogeneousdatabasescomplicatesclassicaldatabaseproblemssuchasconsistency maintenance,concurrency control,transactionsanddistributedqueryprocessing,andoptimization.Our researchis not concernedwith any of theseproblems.Only consistency maintenanceis consideredin somedegree.Thecoreof ourresearchis semanticdataheterogeneity, especiallywhenscientificdataareinvolved.

Insteadof trying to coerceall datainto a singleunifiedview in onestep,we considerinte-grationof smallcollectionsof data,in severalpointsof distributedandcooperative processes.Integratedviews of selecteddatasets,materializedor not, definethe inputsof dataprocessingactivities of distributedprocesses.The outputsof suchan activity, regardedasa datasetorservice,canbe the input of anotherone. Thus,complex processesinvolving dataintegrationcanbebuilt by composingdatasetsandservicesin anopenenvironmentlike theWeb.

The remainderof this paperis organizedin the following way. Section2.2 presentsbasicconceptsrelatedwith informationsystemsinteroperability. Section2.3analyzesinteroperabilityin thecontext of databasesystems.Section2.4focusesondatarepresentation,dataheterogene-ity conflicts,anddataintegration,establishinga framework to analyzerelatedproblemsandproposedsolutions.Section2.5 presentsthemosttypical apparatusfor dataintegration. Sec-tion 2.6 describesthe the major standardsandtechnologiesof thesemanticWeb. Section2.7outlinesthe Web servicestechnologyandhow it canbe usedto build cooperative distributedsystems.Section2.8 refer to applicationsdemandingtechnologyto supportinteroperability,particularlyin scientificrealms.Finally, Section2.9presentstheconclusions.

2.2 Inf ormation SystemsInter operability

Interoperability is theability of two systemsto exchangeinformation,andcorrectinterpretandprocessthis information[144, 118, 105, 9]. It requiressomedegreeof compatibilitybetweensystems,to enabledataexchangeandcorrectinterpretation.Ideally, cooperativesystemsshouldbecompliantwith computationalandapplicationdomainstandards.However, this levelof stan-dardizationmaybeimpossibleto attainin practice,dueto therateof technologicalchanges,thelack of universallyacceptedstandards,the existenceof legacy systems,or just for reasonsofautonomyof eachinformationsystem.Thus,in many cases,theonly wayto reachinteroperabil-ity is by publishingtheinterfaces,schemasandformatsusedfor informationexchange,makingtheir semanticsasexplicit aspossible,sothat they canbeproperlyhandledby thecooperative

2.2. InformationSystemsInteroperability 9

systems.

2.2.1 Viewpointsof SystemsInter operability

Hasselbring[120] shows that informationsystems’interoperabilitymustbe consideredfromthreeviewpoints:applicationdomain,conceptualdesignandsoftwaresystemstechnology. Fig-ure2.1 illustratesthestructureof asetof informationsystemsandtheir interoperabilityin eachoneof theseviewpoints.

Figure2.1: Theviewpointsof informationsystemsinteroperability

Theuser’sviewpointconcernsthedistinctviewsandspecializationsof domainexperts.Thedesigner’s viewpoint refersto requirementsmodelingandsystemsdesign.Theprogrammer’sviewpointrefersto thesystemsimplementation.

Conflictsmayappearin eachof thosethreeviewpoints.On theotherhand,interoperabilitymustbe achieved in all theseviewpoints,i.e., usersof a systemmustunderstandinformationcomingfrom anothersystem,thesystemdesignmustaccommodatethe“foreign” data,andthecomputerprogramsmustautomateinformationexchange(i.e., thedatatransfersandtransfor-mations).Thehardestproblemsof datainteroperabilityoccurat theapplicationandconceptualviewpoints[2].

Furthermore,eachviewpoint hasthe instancelevel (solutions,projects,applicationpro-grams),themeta-level (with approachesandmodelsusedto describethecharacteristicsof the

2.2. InformationSystemsInteroperability 10

instances),and,maybe,themeta-metalevel, wherethemodelsaredefined.Hence,heterogene-ity canalsobeconsideredat successive levelsof abstraction.

2.2.2 TechnologiesaddressingInter operability

Thegrowth of computernetworkshaspushedthedevelopmentof systemscommunicationtech-nologiesbeyond protocolsfor messagepassing. Several paradigmsrelatedwith distributedheterogeneoussystemsinteroperabilitycanbe singledout in the literature. Someof the mostprominentof theseparadigmsin theInterneteraaredescribedin thefollowing.

Distributedobjectsis theparadigmonthecoreof technologieslikeCORBA andDCOM [194].Eachobjecthasanobjectid, thecodeto implementits behavior, anda statedeterminedby the valueassociatedwith a numberof internalvariables. An objectencapsulatesitinternalstateandcodeandprovidesan interfacebasedon methodsto externally accessandmodify its state. Distributedobjectscommunicatewith eachother throughremotemethodinvocation.CORBA (CommonObjectRequestBroker Architecture)[52, 194] isthe architectureof OMG (ObjectManagementGroup)for distributedobjects. CORBAobjectscanbe anywherein a network andareaccessedby remoteclients, via methodinvocations,without having to know whereeachserver object resides,what operatingsystemit executesonandhow theobjectis implemented.Thelanguageandthecompilerusedto createCORBA serverobjectsaretransparentto clients.

Infopipes[207] arebuilding blocks to implementstreamdataprocessing.An infopipe is alanguageandplatformindependentabstractionfor a dataflow from a producerto a con-sumer. It includesdataprocessing,buffering andfiltering. The infopipemodelincludesfacilitiesfor managingqualityof serviceproperties(e.g.,performance,availability, secu-rity), composingandrestructuringdataflows duringexecution.This modelhasinherentparallelismandembracescontentsemanticsanduserrequirements,allowing informationflow controlandresourceuseoptimization.

Peer-to-Peer [179] refer to a classof systemsthatemploy resourcesdistributedacrossa net-work to perform somefunction in a decentralizedfashion. The resourcesencompassprocessingpower, data,storagemeansand network bandwidth. The function can bedistributedcomputing,contentssharing,communicationor collaboration.Thekey char-acteristicof a peer-to-peersystemis that, in oppositionto theclient-server architecture,eachpeercanprovide someserviceto otherpeers,at thesametime that it benefitsfromtheservicesprovidedby otherpeersof its community. Peer-to-peersystems,suchasNap-ster, andKazaa,becamepopularfor allowing peopleto shareaudioandvideofileson theWeb.

2.3. DatabaseSystemsInteroperability 11

CompositeWebServices[234, 250] useWebservices– i.e., self-describingandindependentsoftwaremodulesaccessiblethroughthe Internet– as the building blocks to constructinter-institutionalcooperativeprocesses.Webservicescommunicatevia messages,usingstandardWeb protocols. Theseservicesencapsulateautonomoussystemscomponentswith Web-basedinterfaces,takingadvantageof theubiquity of theWebto provide wideaccessto thosecomponents.Thefundamentalproblemsof this paradigmarethediscov-eryof theservicesavailableontheWebto fulfill aparticularneed;andthecoordinationofservicesin distributedprocessesto achieve specificgoals. Webservicestechnologyhasbeendevelopedandappliedin areaslike electroniccommerceandfinance.Our researchcombinesWebservices,workflows, andsemanticWebtechnology, to solve problemsofscientificapplicationsinvolving dataintegrationandcooperativework on theWeb.

XML andJava arealsoexpectedto play an importantrole in the implementationof inter-operabledistributedinformationsystems[45, 193]: theformerasa syntacticstandardfor datarepresentation(Section2.6.1),andthe latterasa portablelanguage,allowing the transferenceof sourcecodedobjects’behavior from oneplatformto another.

2.3 DatabaseSystemsInter operability

Informationsystemsare characterizedby the flow consistingof “data input, processingandoutput”. Theuncoordinatedcreationof heterogeneousfilesto storedataof autonomoussystemsleadsto problemswhendifferentapplicationshave to accessshareddata. Databasesystemswereproposedto solve theseproblemsin centralizedenvironments[152].

2.3.1 Centralized DatabaseSystems

Databaseanddatabasemanagementsystems(DBMS) [72, 73, 5] areamongthemostcommonmeansof managingdata.A centralizeddatabasesystemaccommodatesall thedataof anorga-nizationin auniqueinternalschema.Views[24, 243, 92,225],or externalschemas,aredistinctlogical databaseimages,allowing (groupsof) usersto accessa centraldatabaseaccordingtotheirspecificneeds.A view is usuallybuilt by usingadatabasequerylanguageto write aquerydefininganimageof a limited amountof data.

Databaseviewsareassignedto particularapplicationsaccordingto users’requirementsandprivacy concerns. A view can be materializedor non-materialized.Materializedviews arecopiesof datato supportdifferentdatabaseimages.Non-materializedviews, on theotherhand,are just abstractions,producedby translatingrequeststo the abstractviews into requeststoactualdatabaseor lower level views.

The userof a database(or view) mustknow the datamodelemployed andthe (external)schema,in orderto accessthedatabasedirectly throughtheDBMS. An alternativeapproachis

2.3. DatabaseSystemsInteroperability 12

theconstructionof applicationprogramsatoptheDBMS to helpusersin their daily activities.The developmentof systemsintegratingdifferentdatabasesdemandsconsiderablecoordina-tion of the teamsresponsiblefor thedistinctdatabases,views andapplicationprograms.Thiscoordinationis very difficulty to be achieved, even whenthe integrationinvolvesonly a fewdepartmentswithin thesameorganization.

2.3.2 HeterogeneousDatabaseSystems

Heterogeneousdatabasesystems(HDBS) [72, 219, 152, 126, 5] are software packagesthatintegratevariouspreexistingdatabasesystems(DBSs)or HDBSscalledcomponents.Thesamecomponentcanparticipatein variousHDBSs. Componentscanbe developedindependentlyandwithoutany concernaboutsubsequentintegration.

ShethandLarson[219] characterizeHDBSsusing threeorthogonalaxes: heterogeneity,distribution, andautonomy. The heterogeneityof a HDBS dependson thenumberandsever-ity of discrepanciesamongits constituentDBSs,with respectto their schemas,datamodels,query languages,transactionmanagementcapabilities,DBMS, hardware, operatingsystemsandcommunicationprotocols.Discrepanciescanappearatany abstractionlevel (datainstances,schema,datamodel). The heterogeneitycanbe reflectedin the datarepresentationor be justa matterof interpretation. Distribution refersto the locationof the HDBS’ components.Inprinciple,distribution is orthogonalto heterogeneity. A distributedsystemcaninvolve differ-enthardware,softwareandcommunicationplatforms. Autonomyrefersto the freedomof theHDBS’ componentsto defineandmanagetheirdatabases.Theneedfor maintainingautonomyandthedemandfor sharingdataareoftenconflictingrequirements.Theintegrationof differentdatabasescannotcompletelyblock the capacityof eachcomponentDBS to manageits datawithout interferenceof the HDBS generalmanager[5]. Autonomycanbe classifiedin fourcategories[219,5]:

1. Designautonomyrefersto theindependenceof eachcomponentDBS to designits data-base.

2. Communicationautonomyrefersto the ability of a componentDBS to decidewhetherto communicatewith othercomponentDBSs. A componentDBS with communicationautonomyis ableto decidewhenandhow it respondstoarequestfromanothercomponentDBS.

3. Executionautonomymeansthata componentDBS is independentto executeoperations(requestedbothlocally andexternally),with full controlof transactionprocessing.

4. Associationautonomyassertsthat componentDBSscanindependentlydecidewhat in-formationthey wantto sharewith theHDBS, to which requeststhey reply, whento startandwhento finish theirparticipationin theHDBS.

2.3. DatabaseSystemsInteroperability 13

2.3.3 Integrated Accessto Multiple Databases

Theapproachesto enableintegratedaccessto multiple physicaldatabasescanberoughlyclas-sifiedin two categories:schemaintegration[18,72] andthefederatedapproach[151, 219, 152].Theformerconsistsin providing someunifiedschemathroughwhich theusersaccesstheinte-grateddata.The latter, on theotherhand,canjust supplysomemeansfor accessingexportedviews of the heterogeneousdatabases,leaving muchof thedataintegrationonusto the users.Figure2.2 illustratesthedifferencesbetweentheseapproaches.In thedistributedapproach(onthe left), the schemaof eachdistributeddatabaseis a view of theunified schema.In the fed-eratedapproach,on the otherhand,the export/importschemasof the federateddatabasesareexternallyhandled.Theschemaintegrationapproachmakesdataheterogeneitytransparenttotheusers,while thefederatedapproachconcedemoreautonomyto thecomponentdatabases.

Figure2.2: Distributedandfederateddatabasesystems

Thereareseveraloptionsfor implementingHDBSs,with varyingcouplingdegreesamongthecomponentDBSs,andofferingdifferenttrade-offs betweencooperationandautonomy. El-magarmidandPu[72] giveanintroductionto suchsystems,classifyingthemasfollows.

� Distributeddatabasesystem(DDBS)[72, 5, 73,197] consistsof asinglelogicaldatabasethat is physicallydistributed. Despitethe physicalfragmentationof data,a DDBS sup-portsasingledatamodelandquerylanguage,with oneschemaintegratingall its contents.

� Federateddatabasesystem(FDBS) [219] (alsocalledheterogeneousdatabasesystem–HDBS) is a distributeddatabasesystemallowing heterogeneouscomponentswith differ-entdatamodels,querylanguagesor schemas.

� Multidatabasesystem(MDBS) [151, 152] is a collectionof looselycoupleddatabases.The key propertiesof a MDBS are the autonomyof the participantdatabasesand the

2.4. DataIntegration 14

absenceof a globally integratedschema.MDBSsareemployedwhenuserswantto pre-serve their autonomy, evento thepoint of refusingto participatein a globally integratedschema.

All thesedatabasesystemsarchitecturesrely on someintegratedor export/export schema.However, they do not addressthe resolutionof dataheterogeneityconflicts to build suchanschema.They eitherconsiderthatthis problemhasbeensolvedor leave it to theuser.

2.3.4 Web Databases

WebDatabases[60, 240, 106, 187, 36] make datastoredin local databasesaccessiblethroughthe Web, enablingapplicationslike on-line storesand digital libraries. The most commoninterfacesfor queryingWebdatabasesareformsandnavigationmenuson Webbrowsers.Thequeryspecificationresultingfrom a userinteractionwith suchaninterfaceis encodedandsentto a Web Server, which submitsthe queryto the DBMS. The result is convertedinto HTMLformatto bereturnedvia theInternetandshowedin thebrowser. Optionsfor implementingtheinteractionbetweentheWebServer andtheDBMS aredescribedin [143,71].

Thechallengeof thequeryingWebdatabasesresearchis theconstructionof a unifiedandsimpleinterface.Themostcommonapproachto solve this problemis thegenerationof wrap-persandmediatorsto integratedatafrom Webpagesprovidedby Webdatabases[240, 36, 35,158]. Thesesolutionstendto becomplex, inefficient andunsuitablein many cases,dueto thedynamicsof the sourcesinterfaceandavailability. Othersolutionsavailable in the literatureinclude[187, 106, 60]. Neiling et al. [187] presentautomatedmeansto recover andintegratethecontentsof relatedWebdatabases(e.g.,movie databases).Gravanoet al. [106] describeasystemto organizeWebdatabasesin hierarchiesof classes,accordingtheir contents.Silva etal. [60] usekeywordsspecifiedby theuserto derive structuredqueriesto besubmittedto oneor moreDBMSs.

2.4 Data Integration

Heterogeneousdata arethosedatapresentingdifferencesin their representationor interpreta-tion,althoughreferringto thesamereality [151]. Dataheterogeneityconflictsaretheincompat-ibilities thatmayoccuramongdistinctdatasets.Theinteroperabilityproblemconsideredin thissectionis dataintegration[69, 200], i.e.,providingasingleview for asetof heterogeneousdata,with unified syntax,structureandsemantics.Data integration involvesthe resolutionof het-erogeneityconflictsandtransformationsof sourcedatato accommodatethemin theintegratedview.

In orderto make dataintegrationpossible,it is necessary, at first, to categorizethekindsofdatato beintegratedandtheheterogeneityconflicts.Then,conflictscanbesolvedin asequence

2.4. DataIntegration 15

determinedby their categories.Therestof this sectiondiscussestheproposalsavailablein theliteratureanddefinesa framework to analyzeandhandledataintegrationproblems.

2.4.1 Data Structuring

Structured Data

Conventionaldatabasesystemstake advantageof ratherstrict datastructuring,expressedviaa databaseschemausinga datamodel, to provide datamanagementfacilities, with efficientdataaccessandconsistency maintenance.That is the caseof the classicalrelationaldatabasemanagementsystemsandeventheobject-orientedsystems.

Datastructuringpresentsvirtuesanddrawbackswith respectto dataintegration. On theonehand,structuregrantsuniformity for dataprocessingandhelpsmaintainingconsistency.On theotherhand,anstructuredintegratedview from two or moreheterogeneousdatasetsissometimesverydifficult to obtain.

Semanticdatamodels[18], suchas the entity-relationshipdatamodel, allow datato bedescribedin an abstractand intelligible manner, at the conceptuallevel. Thus, thesemodelscan facilitate dataintegration. However, semanticdatamodelsare not versatileenoughandinformationcanbelost on convertingdataamongheterogeneousdatabaseschemasusingthesedatamodels.Theautomationof thedataconversionprocessis alsodifficult, becauseof thegapbetweentheimplementationandtheconceptualviewpoints.

Semi-structured Data

Semi-structureddata[2, 1, 32, 117, 199]arethosedatawhosestructureis irregularandpartiallyknown. In orderto allow theidentificationof thedataelementsin theirregularstructure,semi-structureddatahaveto beself-describing.Thus,thedataandbasicdescriptionsof theirstructureandmeaning(metadata)areassembledtogether. Dif ferentlyfrom structureddata,wherestruc-ture(typeandschema)aredefinedprior to thecreationof datainstances,semi-structureddatainstancescanbecreatedat thesametime their structureis defined.

Semi-structuredschemasanddatamodelsareusuallyformalizedasgraphs,whosenodesrepresentdataelementsandwhoseedgesrepresentnestingandreferencerelationshipsbetweendataelements[2, 199]. This datastructuringis suitablefor dataintegrationandWebsystems.Currentresearchin databasesincludeshow to model,query, restructure,storeandmanagesemi-structureddata[2, 66, 1]. Otherresearchthemesincludeextractingsomestructurefrom datain formatssuchasthoseprevalentin theWeb[2, 89,36,35,158, 189], text documents[4] andspreadsheets[145], in orderto integratethesedata.

2.4. DataIntegration 16

2.4.2 Characterizing Data Heterogeneity

Themostwidespreadway to characterizedataheterogeneityis to separaterepresentationfrominterpretationconcerns[219]. Representationalconflictsreferto syntacticor structuraldiscrep-anciesin the portrayalof heterogeneousdata. Semanticconflictsrefer to disagreementaboutthemeaning,interpretationor intendeduseof thesameor relateddata.

The solution of representationalconflicts usually requiresthe analysisof their semanticcounterpart,i.e., establishingcorrespondences(perfector not) betweenthe meaningsof dataitemsfrom heterogeneoussources.Semanticmatchesareoftenachievedonly for specificdo-mains.

Bothrepresentationalandsemanticconflictsmayoccurin any level of abstraction:instance,schema,datamodel. Thusdataheterogeneityconflictscanalsobe classifiedaccordingto thefollowing categories[118,178,137, 136]:

� Dataconflictsarediscrepanciesin therepresentationor interpretationof instantiateddatavalues,whichcandiffer in their measurementunit, precisionandspelling.

� Schemaconflictsaredifferencesin schemasdueto alternativesto depictthesamereality,suchasusingdistinct namesfor thesameentitiesor modelingattributesasentitiesandvice-versa.

� Data versusschemaconflictsaredisagreementsaboutwhat is dataandmetadata;e.g.,adatavalueunderoneschemacanbethelabelof anentity or attributein anotherschema.

� Datamodelconflictsresultfrom theuseof differentdatamodels.

2.4.3 Solving Syntacticand Structural Conflicts

Theearliersolutionsfor representationalheterogeneity[144, 178,142] arerestrictedto there-lationaldatamodel.They extendSQL to allow theconversionof tableandattributelabelsintodatavaluesandvice-versa.Otherworksexplore languageswith logical foundations,aggrega-tion andrestructuringcapabilities[99, 100].

Proposalsfor integratingsemi-structuredandotherdiversedatasourcesaresurveyedin [210,89]. Severalproposalsconcerntheestablishmentof astandardsyntaxanddatamodel.Someofthemarecenteredin objectmodels[118, 209], while othersusesemi-structureddatato representheterogeneousdataat a moreabstractlevel [47, 48, 199,117]. Theuseof semi-structureddataconfersversatility to datarepresentation,enablingdatatransformationsandmappingsamongirregular structures.On the otherhand,asdatamodelingconstructsfrom typical datamod-els often carry semantics,informationcanbe lost on convertingdatafrom sucha datamodelinto semi-structureddata.Theinformationlossproblemcanbehandledby maintainingpropermetadataassociatedwith thetransformedsemi-structureddata.

2.4. DataIntegration 17

2.4.4 ReconcilingSemantics

Thesolutionof semanticconflictsrelieson thestandardizationof themeaningof theconcepts,terminology, andstructuringconstructsfound in sourcedata[218, 195]. It involvesmetadataenrichmentto supportthe investigationof semanticmatchingamongdataitemsfrom distinctdatasets.

The first stepis to semanticallydescribedata,by associatingconsensualdescriptionstopublishedandexchangeddata[134]. At this stage,the establishmentof an accordis usuallypossibleonly for smallcommunities[105]. Commonsemanticscanbeexpandedto widercom-munities,asinformationis betterunderstoodandappropriatelevelsof abstractionaredevisedto makepossibledataexchangewith minimal lossof meaning.

2.4.5 The Data Integration Steps

Dataintegrationcanberegardedasa sequenceof steps,involving transformationsandinvesti-gationof correspondencesamongdataelements,in orderto producea unifiedview of hetero-geneousdata. Figure2.3, adaptedfrom [200], illustratesthe informationflow alongthe dataintegrationsteps.

Figure2.3: Thedataintegrationsteps

Heterogeneousdataarefirst convertedto a homogeneousformat (e.g. XML), usingtrans-formationrulesthatexplainhow to transformdatafrom thesourcedatamodelto thetargetdatamodel. The translateddataandschemasaresemanticallypoor for integrationpurposes.Thusthey mustbeenrichedwith semanticinformation(e.g.,measurementunits,meaningof theterms

2.5. Building Blocksto IntegrateDatain CooperativeSystems 18

appearingin tagsanddatavalues).Then,thecorrespondencesbetweenelementsfrom heteroge-neoussourcesareinvestigated,usingthesemanticdescriptionsandsimilarity rules,to producea collectionof correspondenceassertions.Finally, the correspondenceassertionsandintegra-tion rulesareusedto producean integrationspecification,which describeshow dataelementsfrom heterogeneoussourcesmustbetransformedandmixedto produceaunifiedview.

Even thoughdataintegrationultimately requireshumanintervention,it is crucial to auto-mateor at leastassistsomelaborioustasks,in orderto make dataintegrationpracticable.Thegoal of automatedfacilities is to make dataintegrationeasierandrepeatable,while allowingusersto makedecisionsalongtheintegrationprocess.

2.5 Building Blocks to Integrate Data in Cooperative Sys-tems

Thissectiondescribessomecategoriesof softwareapparatusthathavebeenproposedto supportintegrateddataviews. Suchapparatusallow the interconnectionof heterogeneousdatareposi-tories,programs,materializedandnon-materializedviews, in suchaway thattheoutputof onesoftwaremodulecansupplytheinput to anothermodule.

2.5.1 Gateways

Figure2.4: A databasegateway

A Gateway is asoftwarecomponentthatallowsaDBMS and/oranapplicationprogramdi-rectly connectedto this DBMS to accessdatamaintainedby anotherDBMS, using the datamodel and datamanipulationlanguageof the former. It is necessaryto develop one spe-cific gateway for eachDBMS pair. Gatewaysdo not provide transparency for heterogeneous

2.5. Building Blocksto IntegrateDatain CooperativeSystems 19

databaseschemaandinstances.Hence,gatewaysdo not offer supportto establisha unifyingview of heterogeneousdata.Figure2.4presentsagatewayproviding accessto database“Y” foranapplicationprogramandits directlyconnecteddatabase“X”.

2.5.2 Wrappers and Mediators

Wrappers andmediators [244, 97] provide datamanipulationservicesover a reconciledviewof heterogeneousdata.Wrappersencapsulatedetailsof eachdatasource,allowing dataaccessundera homogeneousdatarepresentationandmanipulationstyle (commondatamodel and,sometimes,standardizedschema).Mediatorsoffer anintegratedview of thedatasetsof severaldatasourcesthatcanincludewrappersandothermediators.Somesystemsadoptmultiplelevelsof mediatorsin order to modularizethe datatransformationand integrationalongsuccessivelevelsof abstraction.

Figure2.5: Wrappersandmediators

Figure2.5shows two wrappersandonemediatorproviding integratedaccessto two differ-entdatasources.Themediatorbrokersthe requestsfrom theapplicationsinto requeststo thewrappersof the particularsourcesinvolved in the request.On receiving the repliesfrom thesourcewrappers,themediatorcomposesthe resultsto returnan integratedresultto theappli-cation. Datatransformationandmappingspecificationsdrive thefunctioningof wrappersandmediators.Wrappergeneratorsanddatamappingspecificationlanguages[97] enablethespec-ificationof dataintegrationin amoreintelligible mannerthanusingconventionalprogramminglanguagesto hardcodewrappersandmediators.

2.5. Building Blocksto IntegrateDatain CooperativeSystems 20

2.5.3 Data Warehouses

A data warehouse[205, 127, 163, 46, 159] is a separateddatabasebuilt specificallyfor de-cision support. It provides the basisfor analysisof large amountsof data,collectedfrom avarietyof possiblyheterogeneousdatasources.A datawarehousereplicatesandintegratesdatafrom sourcessuchasrelationaldatabasesmaintainedby on-linetransactionprocessingsystems(OLTP), spreadsheetsandtextual data. Thesesourcestypically run in theoperationallevel oforganizations,while datawarehousesareintendedfor thestrategic level.

Datawarehousingis the activity of collecting, transformingand integratingdatafor con-solidatedanalysis.This canbeperformedoff-line with periodicalupdates,perhapsovernight.Theseparationbetweenthedatawarehouseandthedatasourcespreventsthewarehousefrominterferingin the functioningof thesystemsat theoperationallevel andconfersflexibility fordataorganizationandprocessingin the warehouse.Data from the sourcesis first processedbeforebeingstoredat thewarehouse.

Therearespecificmethodsfor modelingandorganizingdatain a warehouse– e.g. multi-dimensional,star, andsnowflake style schemas[125] – andalsofor dataprocessinganduserinteraction– e.g.,on-lineanalyticalprocessing(OLAP) [98,107, 46,119,64]. Figure2.6showstheloadingof datafrom thesourcesinto awarehouseandtheirusefor dataanalysispurposes.

Figure2.6: A datawarehouse

2.5.4 The View Approach

Wrappersandmediatorssupportnon-materialized(i.e., abstract)integratedviews for hetero-geneousdata,while datawarehousesprovide materializedviews (i.e., concretesetsof copied,

2.6. TheSemanticWeb 21

transformedand integrateddata). In datawarehouses,the unidirectionaldataflow, from thedatasourcesto the warehouserepositorysimplifiesthe view updateproblem[113, 208, 261].The datawarehousecannotbeupdatedby endusers.Updatesdoneto the sourceshave to beperiodicallyloadedin thewarehouseto reflectthemin theunifiedview. Figure2.7 illustratesageneralview-baseddataintegrationsystem.In this case,updatesposedon theexportingviewsare difficulty to be performedin the lower levels, especiallythe original datasources. Thetransformationsappliedfor dataanalysispurposes(e.g.,dataaggregation)canleadto complexproblemsof datalineageandview updating[55, 56].

Figure2.7: Theview approach

Many of the techniquesdevelopedfor views in heterogeneousdatabasesystemscan beemployed for the constructionof wrappers,mediatorsand datawarehouses.Unfortunately,integratinghighly heterogeneousdataand exporting them to specificdataanalysistools areharderproblems. They demanddatatransformationandmanagementfacilities beyond thoseprovidedby thecurrentDBMSs.Viewsstoredin warehousesalsoinvolvehistoricalinformationthatmaynotremainin theoriginalsources.Nevertheless,severalworkstaketheview approachfor theintegrationof heterogeneousdata[18, 112,231] anddatawarehouses[159].

2.6 The SemanticWeb

The semanticWeb [215, 80, 63, 260, 68, 22] is an emerging researchareawhosegoal is toachieve informationsystemsinteroperabilityandenableavarietyof sophisticatedapplications,by taking advantageof semanticdescriptionsof Web resources(dataandservices). It is aninfrastructureon which differentapplicationscanbe developed[76]. It intendsto enrichthe

2.6. TheSemanticWeb 22

currentWebwith formalizedknowledgeanddata,thatdifferenthumanbeingsand/orcomputerscanexchangeandprocess.

The key requirementfor the semanticWeb is interoperability. Data and metadatamustcomply to consensualformatsandconceptualizations,in order to enabletheir exchangeandproperprocessing.Therefore,standardsfor expressingdataandmetadataarecrucial for thesemanticWeb. Figure2.8,adaptedfrom [140], illustratesthesemanticWeblayersof standardsandtechnologies.

Figure2.8: Layersof semanticWebstandardsandtechnologies

Thelowestlayer, characterencoding+ URI, providesaninternationalstandardfor codingcharactersets(Unicode)anda meansto uniquely identifying resourcesin the semanticWeb(the URI specification[232]). The XML [256] layer, which includesnamespaces[185] andschemadefinitions[256, 257], constitutesa standardsyntax,with an underlyingdatamodel,to expressinterchangeabledataandschemas.In the RDF + RDFSlayer, RDF [147] allowsstatementsassociatingresourceswith their properties.RDFS(RDF Schema)[29] enablethedefinition of vocabulariesthat can be referredto by the URIs in which they are published.Thesevocabulariescanbe usedto associatetypesto resourcesandproperties.The Ontologylayerenrichesvocabulariesandsupportstheirevolution,by extendingtherepertoryof conceptsandsemanticrelationshipsamongthem.Severallanguagesfor describingontologiesin theWebhavebeenproposedto fulfill theneedsof this layer[80, 101, 109,165, 196, 61,191].

The top layers: Logic, Proof andTrust arestill underdevelopment. The Logic layer ex-pressesknowledgeby rules,while the Proof layer usestheserulesto infer otherknowledge.TheTrust layerprovidesmechanismsto determinethedegreeof truston inferredknowledge.Digital Signature permeatesseveral layersto ensuresecurity, by usingmeanslike encryptionanddigital signatures.

Theremainderof thissectiondescribestheXML, RDFandontologylayersof thesemanticWebin moredetail,analyzingthemajorstandardsandtechnologiesandhow they interrelate.

2.6. TheSemanticWeb 23

2.6.1 XML

XML (eXtensibleMarkup Language)[256, 2] is a syntaxstandard,with a graph-baseddatamodel,to representandexchangesemi-structureddata. XML derivesfrom the ISO standardSGML (StandardGeneralizedMarkupLanguage)[128]. Theselanguagesareknow asmeta-markuplanguagesbecausethey allow thedefinitionof specificmarkuplanguages.LikeHTML,XML employs tagsandattributesof tagsto structuredata.However, thestructureandtagsofanXML documentareuserdefined.In XML, tagsandstructureareintendedto describedatameaning,notdatapresentationasin HTML. Webservers,browsersandcertainapplicationsareableto processXML-encodeddata.

Figure2.9 presentsa fragmentof a XML documentcontainingclimatedata,specificallywaterbalancedata(measurementsof climatedata,soil moistureandevaporationof this mois-ture). Thesedatarefer to a particularpoint in the earthsurface,denotedby its geographiccoordinatesandthenameof thecity wherethatpoint is located.Themajordataelementcon-tainedin this XML document,WaterBal , expressesthegeographicpositionby meansof theXML attributeslocation , latitude and longitude , attachedto its openingtag. Thisdataelementincludesseveral climatemeasuresfor eachmonth. Eachmeasureis representedby anatomicdataelement.Thevalueof eachmeasureappearsbetweentheelement’s openingandclosing tags. For example,the valueof the averagetemperaturein Januaryis enclosedby the tags<Temperature> and</Temperature> . This atomicdataelementis nestedin the compositeelementcongregatingall themeasuresfor January, delimitedby the<Jan>

and</Jan> tags. The default namespaceassociatedwith this XML documentpointsto thedescriptionof its schema(presentedin Figure2.10),via ahttpaddress.

Figure2.9: An XML documentfor climatedata(waterbalance)

The emergenceof XML posesmany challengesto academiaand industry [67, 44, 242,

2.6. TheSemanticWeb 24

150]. Leadingsoftwarevendorsaremoving toward adoptingXML, eitherasan internaldatarepresentationmodelfor their softwareor just for dataexchangeamongdifferentapplicationsandplatforms. The publicationof datain XML format canmake the Web a hugeXML datasourcefor all sortsof information.

Therearemany technologiesbeingdevelopedto explorethepotentialof XML (e.g.,XMLquery languages[2, 258, 25]). The useof XML asa datarepresentationstandardcanbringmany benefitsfor dataintegration[2, 150]. Furthermore,sinceXML is a semi-structureddatamodel,it canlendversatilityandopennessto datarepresentationandintegration.

However, XML alonedoesnot solve all the dataheterogeneityconflicts. XML datasetsfrom independentsourcescan presentschemaand semanticconflicts, even if thesesourcesprovide dataaboutthesamedomainfor thesameapplication.Theresolutionof theseconflictsrequiresconsensualsemanticsto be associatedwith XML contentsandtags. This cannotbedonein onestep. Interoperabilityrequiresmultiple agreementson XML datamodelingandterminology.

Common Schemasand Metadata Standards

DTD andXML Schemaareschemalanguagesfor XML [148]. Schemaspecificationscanbestoredwith XML data,or in a separatedocument,that canbe referencedto by several XMLdocuments.DTD (DocumentTypeDescription)[256] is partof theXML specificationitself. Itdefinesthestructureof XML documentsusingalist of elementdeclarations.Thesedeclarations,in thestyleof regularexpressions,definethetypesof atomicXML componentsandthenestedstructureof compositeelements.

XML Schema[257] offersanXML-basedsyntaxto describethestructureandconstrainingthe contentsof XML documents.XML SchemareconstructsandextendsDTD capabilities.Figure2.10presentsan XML Schemadescriptionfor theclimatedatadocumentpresentedinFigure2.10. The first line of this descriptiondeclaresthe namespacefor the XML Schemavocabulary. The secondonestatesthat a documentconformingto this schemamusthave anelementcalledWaterBal (thestringusedin its tags)of the typeWaterBalType . An ele-mentof typeWaterBalType includestwelvenestedelementsof thetypeAggregValues ,to hold theclimatemeasurementsfor eachmonthof theyear. WaterBalType alsoincludesattributesto specifythegeographiclocationto which theclimatedatarefer.

Note that the schemadescriptionis not enoughto ensurethe correctinterpretationof theXML dataandsupportdataintegration. Much semanticinformationis missing.For example,thereis no indicationof themeasurementunitsandthegeographiccoordinatesystemusedinthe XML andXML Schemafragmentsof climatedata. In addition, the meaningof the dataelementsis not clearlyspecifiedby their tags.For example,Temperature probablyreferstotheaveragetemperaturein themonth,while AvgRainfall refersto theaverageaccumulatedrainfall duringtheparticularmonth(theseaveragesarederivedfrom temporalseriesof weather

2.6. TheSemanticWeb 25

Figure2.10:An XML schemafor climatedata(climatedata)

data).Themeaningof certainattributeslikePotET andRealET (potentialevapotranspirationandrealevapotranspiration,respectively) areevenharderto infer, andrequireexpertknowledgeto befully understood.

This exampleillustratesthe needto associateconsensualsemanticswith XML dataandtheir markup.Theuseof standardschemasandmetadatastandards,with well documentedandwidely agreedmeaning,candecreasethisproblem.GeneralmetadatastandardssuchasDublinCore[65] definevocabulariesandtheprecisemeaningof termsfor generaluse,while metadatastandardsandstandardschemasdevelopedfor specificfieldshelpto establishsomeconsensusinsidethesefields[237]. However, thesestandardsandformatsarenotenoughbecause:(i) theyhindertheautonomyof informationsystems,(ii) they donotcontemplatetheevolutionof thesesystems,(iii) they donotcoverall typesof data,and(iv) they areunsuitableto providedifferentviewsof thesamedata.

2.6.2 RDF

RDF [147, 80] is the major format for machine-processablemetadatain the semanticWeb.RDF is basedon knowledgerepresentationformalismssuchasframes[180] anddescriptionlogics [15]. The basicconstructof the RDF model is the statement– a triple of the formsubject-predicate-object, wheresubjectrefersto a resource(anything thatcanbedenotedby aURI), predicateis apropertyof thatresource,andobjectis thevalueof thatproperty. Theobjectcanbea literal (e.g.,a string)or anotherresource.An RDF statementdeclaresa propertyof a

2.6. TheSemanticWeb 26

resourceandcanalsoberegardedasa resource-property-valuetriple, whereresourceis usedasa synonym for subject, propertyfor predicateandvaluefor object. Thus,onecanstipulateanRDF triple (http://www.Embrapa.br, PART OF, http://www.Brazil.gov.br) to indicatethat theorganizationwhosehomepageis accessibleby theURI http://www.Embrapa.bris partof theBraziliangovernment.

RDFS(RDF Schema)extendsRDF with classesof resources,values,andproperties.AnRDFS specificationdefinesa structureof classes,propertiesand subclassesfor a particulardomainor application,similar to anobject-orientedclassdiagram.

Figure 2.11, adaptedfrom [133], illustratesthe useof RDF and RDFS to describeWebresources.Two differentRDFschemas,onthetopof thefigure,describeresourcesfor gatheringweatherdata(e.g.weatherstations).TheRDFschemaontheleft describestheseresourcesfromthe point of view of scientistswho areinterestedin analyzingweatherdata. Thesescientistsconnecttheirapplicationsto datacollectingdevicesavailableontheWeb(e.g.via Webservices)to obtainsuchdata.Their applicationsareconcernedwith thegeographiclocationof thedatacollecting devices and how different land parcels(e.g., states,counties)are interrelated. Acompany responsiblefor themaintenanceof thedatacollectingdevices,on theotherhand,hasa differentview of thesameresources.For sucha company, eachdevice is anequipment,withcategoryandmodel.Eachequipmentis associatedwith oneclient.

Eachresourcein theunifiedRDF specificationon thebottomof Figure2.11is an instanceof someclass(i.e., anotherresourcedescribingits type) of oneor both RDF schemason thetop. For example,theweatherstation&ws1 is aninstanceof WeatherStation in theRDFschemaontheleft andof Equipment in theRDFschemaontheright. &ws1 is ashorthandfortheURI http://www.embrapa.br/Weathe rSta tionX .Statementsinvolving resourceinstancesmustmatchstatementsdefinedat the RDFSlevel. For example,&ws1 belongsto&Embrapa and is locatedin &Rio , a county of &RJ State. The URIs of land parcelsandclientsareomittedfor simplicity.

In additionto their usein providing differentviews of thesameresources,RDF/RDFSalsohelp to defineunifiedviews of heterogeneousresources.For example,theweatherstationsofFigure2.11, having different technicalcharacteristicsandbelongingto different institutions,canbeoriginally describedandhandledin differentways. Furthermore,their positionscanbedefinedin distinctsystemsof geographiccoordinates,andthearrangementof landparcelscandiffer acrossinstitutions(e.g.,watersupplycompaniesdivide landin hydrologicalbasins).Thedataprovidedby differentweatherstationscanalsodiffer in theirstructuringandrepresentation(e.g.,measurementunits). Several layersof RDF/RDFSdescriptionsprovide the solutionfortheseconflicts.

TheRDF/RDFSstandardsplay thefollowing fundamentalrolesin thesemanticWeb:

� denoterelationshipsinvolving resourcesandresourcedescriptions;

2.6. TheSemanticWeb 27

Figure2.11:RDFdescriptionsof resourcesfor collectingscientificdata

� providedistinctviewsof thesameresources,tailoredfor differentdomainor applications;

� build unifiedviews for collectionsof heterogeneousresources;

� describeknowledgein termsof vocabulariesof conceptsandthesemanticrelationshipsamongtheseconcepts.

The XML/RDF Mismatch

RDF/RDFScanbeexpressedusingXML syntax.However, many XML handlingfacilitiesarenot appropriatefor handlingRDF. XML andRDF/RDFSareboth basedon directedgraphs,but have differentmodels.The RDF/RDFSmodelis a directedgraphin which labelednodesrepresentresourcesor literalsandlabeleddirectededgesrepresentpropertieslinking resourcesto thevaluesof their properties.Theedgesof theRDF graph-basedmodelareunorderedandtheir labelsdefineproperties.TheXML semi-structureddatamodel,on theotherhand,is morehierarchical.Thelabelednodesof theXML modelrepresentdataelementsor attributes,anditsdirectededgesrepresentnestingandreferencerelationshipsbetweendataelements.In theXML

2.6. TheSemanticWeb 28

modeledgesareunlabeledandtheoutgoingedgesof anodehavea totalorder. Patel-SchneiderandSimeon[202,201] pointoutproblemsresultingfrom thismismatchbetweentheXML andRDF/RDFSmodels.They proposea semanticfoundationfor theWeb,basedon modeltheory,to reconcileXML andRDF informationsources.

Handling RDF/RDFS

XML query languages,suchasXQuery [258, 2], arenot suitablefor RDF, due to the mod-els’ mismatch.Thus,several languagesandtoolshave beendevelopedspecificallyfor query-ing RDF metadata.Jena[132, 248] is a populartoolkit for handlingRDF triples. It allowsnavigation in RDF triples throughanapplicationprograminterface(API) or theRDQL querylanguage,an implementationof SquishQL[177]. Nevertheless,procedurallanguagesfor han-dling RDF triples andtheir componentsarecumbersome.For many applications,a template-baseddeclarative languagewould bemoreappropriate.RQL (RDF QueryLanguage)[133] isadeclarative languagefor queryingRDFaccordingto its graphmodel.RQL adaptsfunctional-ity of querylanguagesfor semi-structuredandXML data[2], to provide functionalconstructs,in the style of OQL [40], for uniformly queryingRDF/RDFS.Sesame[30] is a server-basedarchitecturefor storingandqueryinglargequantitiesof metadatain RDF/RDFS,with supportfor RQL andconcurrency control. Most of thecurrentfacilitiesfor handlinglargeRDF repos-itories, includingJenaandSesame,rely on relationalor object-orienteddatabasemanagementsystemsto providepersistenceandscalability[132,248, 74, 162,133, 30]

2.6.3 Ontologies

Ontologies [233, 109, 110, 172] aresharedconceptualizationsof knowledgeaboutdelimiteddomains.An ontologyorganizesdefinitionsandinterrelationshipsinvolving a setof concepts(e.g.,entities,attributes,processes).It capturesthe meaningof classesand instancesfrom auniverseof discourse,by arrangingthe symbols(e.g.,words,expressions,signs)referringtothem,accordingto semanticrelationships[247].

An ontologyentailsor embodiesa particularviewpoint of a givendomain.This viewpointmustbesharedby a groupof individuals,formedaccordingto factorslikegeographicproxim-ity, cultural background,profession,interestsor involvementin particularenterprises.Thesepeopleestablishagreementswith respectto their views of the world andthe symbolsusedtocommunicatetheir views. Ontologiescanbeexplicit or implicit, formal or informal. However,they mustbeexplicit andformal, to berepresentedandprocessedby computers.

There is no convention with respectto the form of a machine-processableontology. Asimpletypehierarchy, specifyingclassesandtheir subsumptionrelationships,likea taxonomy,is anontology. Evena relationalschemacanserve asanontology, by specifyingthepossiblerelationshipsandintegrity constraintsin adatabase.

2.6. TheSemanticWeb 29

Ontologiesconstitutea meansto structureknowledgeto supportinformationretrieval andinteroperability[109]. The sharedknowledgecarriedin ontologiesenableprecisestipulationandresolutionof queries[111, 216,121, 13, 7, 184,134] andinformationbrokering[135,173]in openenvironments.Ontologiesalsohelp dataintegration,particularly the investigationofcorrespondencesbetweenelementsof heterogeneousdatasources[13, 171, 21, 172]. Relatedresearchproposesthedevelopmentof informationsystemscomponentsby translatingontolo-giesinto object-orientedhierarchiesto implementthesesystems,giving rise to theconceptofOntology-DrivenInformationSystems[110, 90].

The following paragraphsdescribethecurrentlyproposedmeansto describe,developandmanageontologiesin thesemanticWeb. Sections2.7and2.8includemorespecificdiscussionsof theuseof ontologiesin semanticWebapplications.

Ontology SpecificationLanguages

Severallanguagesandformalismshavebeenproposedto expressknowledgein ontologies[101,109]. DAML+OIL andOWL aresomeof themostprominentontologylanguagesfor theseman-tic Web. They extendtheRDF/RDFSvocabulary andenrichexpressivenessfor delineatingon-tologies(e.g.,to expressdisjunctionof classesandotherconstraints).DAML+OIL [165] com-binesthebasicconstructsandsyntaxof DAML-ONT (DARPA AgentMarkupLanguage)[61,80] with OIL’s (OntologyInferenceLayer) [191] frame-basedmodelingprimitives[180] andformalsemanticsandreasoningservices,basedondescriptionlogics[15].

OWL (WebOntologyLanguage)[196] is a W3C candidatestandardrecommendation.It isintendedto describeclassesandrelationsthatareinherentin Webdocumentsandapplications.OWL carriesinfluencesof DAML+OIL, amongother languagesandformalisms. Like OIL,OWL comesin threedifferentflavors,with increasingexpressivenessandcomplexity.

Descriptionsof otherontology languagesappearin [80, 101, 201]. The relationshipandintegrationof XML with ontologyrepresentationlanguagesandformalismsis addressedin [13,7, 202,201, 6, 139].

OntologiesDevelopmentand Management

The developmentof ontologiesis a laboriousanderror pronetask,especiallyif it is donebyhand.Ontologyengineeringtools [190, 227,103] canautomatepartsof this taskandhidetheidiosyncrasiesof the ontologyspecificationlanguagesandformalisms. Thesetools canoffergraphicalinterfaces,facilities for knowledgeacquisition(e.g., legacy datasetconversionandincorporationin theontology),remoteaccessto knowledgerepositoriesandmeansto checkthequalityandconsistency of thespecificationsproduced.

Protege [190,227, 103, 78] is anexampleof anopen-sourcegraphictool for ontologyedit-ing andknowledgeacquisition.It canbeextendedwith pluginsto incorporatenew functionality.

2.7. WebServices 30

Availablepluginsallow, for example,thedevelopmentandexchangeof ontologyspecificationsin avarietyof formats,includingDAML+OIL andOWL.

Methodologiesandguidelinesfor developingontologiesappearin [109,226,19]. They helpto enhanceproductivity andto improve thequality of theontologiesdeveloped.Methodsandtoolsfor automaticallyextractingontologiesfrom text documentsandsemi-structureddataareproposedin [94, 62,181,182, 184].

Thespreadingof ontologiesfor differentdomainandapplicationsleadsto interoperabilityproblemsamongdiverseontologies.Proposedsolutionsfor thisprobleminvolveontologycom-positionalgebrasandgraph-basedmodelsfor ontologiesarticulation[79, 183, 131, 247, 245,246].

Finally, Jess[91] andAlgernon[122] areexamplesof inferenceenginesfor the semanticWeb. TheseengineshandleRDF/RDFSspecificationsandrelatedformatsasrulesformalizingdeclarative knowledge.They apply inferenceto derive otherknowledgefrom thebaseknowl-edgepresentin ontologyspecifications.Theseenginescanbe pluggedto an ontologyeditorsuchasProtegeor simply processRDF/RDFSexportedby sucha tool.

2.7 WebServices

A Web service[81, 222, 39, 253] is a softwaremoduleaccessiblethroughthe Internet. Webservicesareusuallyself-describingandindependent.They communicatewith clientsandotherservicesvia messages,over standardWebprotocols.EachWebservicecanbe identifiedby aURI andexposesaXML interfaceto allow its discoveryandinvocationacrosstheWeb.

TheWebservicestechnologyis basedon thenotionof building new applicationsby com-biningnetwork-availableservices.Theservicesparticipatingin distributedprocessescooperateto achievesomegoal,by exchangingmessagesandcoordinatingtheirexecutions.It enablesin-teroperabilityof informationsystems,while allowing decouplingandjust-in-timeapplicationsintegration. The resultingcooperative systemsare potentially self-configuring,adaptive androbust,becausethey canallow thedynamicincorporationof alternativeservicesandavoid sin-gle pointsof failure.Furthermore,implementingsystemscomponentsasWebservicesreducescomplexity, asapplicationdesignersdo not have to worry aboutplatformandimplementationdetails,whichareencapsulatedby theWebservicesinterfaces.

2.7.1 Ar chitecture and BasicStandards

A serviceorientedarchitecturepostulatescooperationof softwarecomponentswith threedis-tinct roles: serviceproviders,servicerequestersandservicebrokers. A serviceprovider holdstheimplementationof oneor moreservicesandmanagesthepublic interfacesthatmake theseservicesavailableon the Web. A servicerequesteris the party that hasa needto be fulfilled

2.7. WebServices 31

by somepublishedservice. It canbe a humanuseraccessingservicesthrougha consoleorWeb browser, an applicationprogramor anotherWeb service. The servicebroker providesasearchablerepositoryof servicedescriptions,whereserviceproviderspublishtheirservicesandservicerequestersfind descriptionsandbinding informationto accessservicescontemplatingtheirparticularneeds.

Serviceproviders,requestersandbrokerscommunicateusingstandardtechnologies.Therearemany standardscurrentlyunderdevelopmentto allow languageandplatform independentimplementationof Web services[141, 229]. Figure2.12outlinesthe layersof standardsandtechnologiessupportingWebservices-basedapplications.

Figure2.12:Layersof Webservicesstandardsandtechnologies

TheNetworkProtocolslayerprovidesthebasiccommunicationfacilitiesandprotocols(e.g.,HTTP).SOAP [28] is a lightweightprotocolfor servicesto exchangeXML-encodedmessagesandmake procedurecalls over the Internet. Messagescanbe routedalong a messagepath.SOAP providesenvelopingfacilities to describethe intentof a messageandhow to processit,a setof encodingrulesfor expressinginstancesof application-defineddatatypes,anda con-ventionfor representingremoteprocedurecalls andresponses.ThoughSOAP wasoriginallydesignedto useHTTP asthetransportprotocol,it canrun on othernetwork protocolssuchasFTP, SMTPor evenraw TCP/IPsockets. SOAP is extensible,allowing differentcommunica-tion modelssuchasone-way, request-responseandmulticast.In addition,SOAP is not tied toany languageor componenttechnology.

WSDL(Web ServicesDefinition Language)[254] is a XML-basedformat for describingWeb services. WSDL specifieswhat a Web servicedoes,whereit is locatedand how it isinvoked. In WSDL, a serviceis regardedasa setof relatedendpointscalledports. Theportsof a servicecancommunicatewith portsof otherservicesvia messages,thatcancontaineitherdocument-orientedor procedure-orientedinformation. The abstractdefinitionsof ports andmessagesareseparatedfrom their network deploymentanddataformatbindings.This allowsthe reuseof abstractdefinitions: port typesthat definesetsof operationssupportedby ports,and data types that definethe databeing exchanged. A concretedataformat and protocol

2.7. WebServices 32

specificationfor a port type constitutesa reusablebinding. WSDL canwork in conjunctionwith SOAP, HTTP GET/POSTor MIME.

UDDI (UniversalDescription,Discovery andIntegration)[230] is a setof standardXMLschemas,SOAP messagesandAPI specificationsto build catalogsfor findingspecificWebser-vices. UDDI providesinformationaboutbusiness(e.g.,name,description,contact),servicesofferedandparticularstandardsusedto bind with theseservices. It alsoprovides identifiersandvarioustaxonomiesto describebusiness(e.g.,relatedindustry, productsandservices,geo-graphicalregion). A UDDI registryis itself aWebservice,providing facilitiesto create,modify,deleteandqueryservicedescriptions.Theseregistriescanbepublic or private. IBM andMi-crosoftprovide public UDDI registries.Serviceprovidersonly have to registerto oneof thesepublic registries,sinceupdatesto any of themarereplicatedin theothersonadaily basis.

The two top layersof Figure2.12refer to thesemanticandfunctionalaspectsof Webser-vicesintegration.Theselayersarestill underdevelopmentwith many proposalsfrom industryand academia.The semanticWeb serviceslayer employs semanticWeb technologies,suchasontologies,to supportWeb servicesdiscovery, selectionandcomposition,accordingto theneedsof specificdomainsor applications.The CooperativeProcesseslayer concernsthe co-ordinatedexecutionof Webservicesin cooperativeprocessesacrossorganizationalboundaries.Finally, AccessControl andSecurityPoliciescanbeenforcedin any Webservicesimplementa-tion layer.

2.7.2 CooperativeDistrib uted Processesenabledby WebServices

SemanticWebServices

SemanticWebservices[166] areassociatedwith well-definedsemanticsto expresstheir func-tionalproperties,capabilities,applicabilityandontologicalrelationships,in orderto enabletheirutilization in cooperativeprocessesoveranopenanddistributedenvironment.Researchin thisarearely on semanticWebideasandtechnologies[124, 260, 108,220, 164,213, 221, 14,198,38, 166,37].

The capabilitiesof registriessuchasUDDI and languageslike WSDL arenot enoughtosupportservicesdiscovery [198]. DAML-S (or DAML-services)[14] is an extensionof theDAML ontologyspecificationlanguagefor Webservices.It includesmechanismsto describ-ing,discover, select,activate,compose,andmonitorWebresources.Theworkof [198] employsDAML-S for servicesdiscovery, presentingan algorithm to matchservicerequestswith theprofileof advertisedservices,basedon theminimumdistancebetweenconceptsin a taxonomytree. CardosoandSheth[38] presentmetricsto selectWebservicesfor composingprocesses.Thesemetricstake into accountfunctionalandoperationalfeaturessuchasthepurposeof theservices,quality of service(QoS)attributes,andtheresolutionof structuralandsemanticcon-flicts. McIlraith et al. [166] useagentprogrammingto definegenericproceduresinvolving the

2.7. WebServices 33

interoperationof Webservices.Theseprocedures,expressedin termsof conceptsdefinedwithDAML-S, donotspecifyconcreteservicesto performthetasksor theexactwayto useavailableservices.Suchproceduresareinstantiatedby applyingdeductionin thecontext of aknowledgebase,which includespropertiesof theagent,its user, andtheWebservices.

Topologyandmodelshave beenproposedto enablecooperationandcompositionof ser-vices.Schlosseretal. [213] proposeagraphtopology, determinedby agloballyknow ontology,to speedup communicationof Web servicesin a peer-to-peersystem.Maximilien andSingh[164] presenta modelfor gatheringandassessinginformationrelative to the useof Web ser-vicesto determinetheir trustfulness.Sirin et al. [220] presentsa prototypeto guidea userinthedynamiccompositionof Webservices.Finally, Gruninguer[108] show how anontologyforprocessspecificationlanguagescanserveasasemanticfoundationfor thecompositionof Webservices.

WebServicesCoordination

Nowadays,thereis a myriadof proposalsconcerningthe interoperabilityandsynchronizationof Webservices[234,116, 20,87,204, 250,175]. Examplesof Webservicescompositionlan-guagesincludeBPEL4WS(BEA, IBM, Microsoft) [250], WSFL(IBM) [255], BPML (BPMI),XLANG (Microsoft), WSCI (BEL, Intalio, SAP, Sun), XPDL (WfMC), EDOC (OMG) andUML 2.0 (OMG). Somechallengesof thesetechnologiesare: (i) reducingtheamountof low-level programmingnecessaryfor theinterconnectionof Webservices(e.g.,throughdeclarativelanguages),(ii) providing flexibility to establishinteractionsamonggrowing numbersof con-tinuouslychangingWebservicesduringrun time,and(iii) devisingmechanismsfor thedecen-tralizedandscalabletransactioncontrolfor cooperativeprocessesrunningontheWeb. Muchofthecurrenttechnologyfor syncronizingprocessesarebasedon centralizedcontrol,evenif thetheexecutionis distributed. This centralizationis inappropriatefor Web systems,for reasonsof autonomyandscalability. Thus, in oppositonto techniquesto orchestrateservices,Web-basedworkflows requiretechnologyto allow serviceto choreographytheir executions,basedonagreeduponprotocols.

VanderAalst [234] comparesthemajorcandidatestandardsfor Webservicescompositionandsynchronization.He pointsout problemsrelatedwith the lack of formal semantics,ex-pressiveness,complexity andadequacy of theseproposals.[234] suggeststheincorporationofwell-establishedprocessmodelingtechniquesin a singlestandardfor Web servicescomposi-tion. Theuseof Petri-netsfor this purposeis consideredin [116, 235,186]. Activity modelsappearin [93, 157,156,155, 154].

2.8. ApplicationsandSupportingEnvironments 34

2.8 Applications and Supporting Envir onments

SemanticWeb applicationstake advantageof knowledge,representedin proposedstandardslike RDF, to leverageautomatedmeansto describe,organize,discover, selectand composeWebresourcesfor thesolutionof a varietyof problems.Themostusualapproachis to definesemanticmarkupbasedonsomeontology, andusethemto integrateandprovideunifiedaccessto dataandservices,typically via Web portals. Therearemany examplesof this approachintheliterature[121,216,111, 13].

Someexperimentalsystemspossessdistinctive features.Edutella[188] is a Peer-to-PeerinfrastructureusingRDF metadatato facilitateaccessto educationalresources.In Edutella,eachpeerholdsa setof resourcesandhasanRDF repositoryof resourcedescriptions,to allowqueryingits contentsat the storagelayer (e.g.,SQL) or userlayer (e.g.,RQL). Peerscanbeheterogeneousin their internalorganizationandthe query languagethey provide. The com-mondatamodelandtheexchangelanguageof Edutellaenablesa standardinterfacefor posingqueriesto specificpeersor communitiesandfind resourcesacrossthenetwork.

Piazza[115] is an infrastructureto provide interoperabilityof datasourcesin theWeb,bymappingtheircontentsat thedomainlevel (RDF)andthedocumentstructurelevel (XML), andaddressingthe interoperationbetweentheselevels. The mappingsarespecifieddeclarativelyfor smallsetsof nodes.A queryansweringalgorithmchainsthesemappingstogetherto obtainrelevantdatafrom acrossthenetwork.

Papersfocusingspecificallyscientificapplicationsof the semanticWeb andWeb servicesinclude[224,160,174,102, 43]. Somescientificapplicationsrefer to particularfieldssuchasbioinformatics[34, 153,41, 223,114,104], earthsciences[17, 241] andtheenvironment[16,42, 161]. Thegrid – a platformfor coordinatedresourcesharingthroughtheInternet,increas-ingly usedfor scientificdataprocessing– andthe semanticWeb have mutualcharacteristicsandgoals[102]. Both operatein aglobal,distributedanddynamicenvironment,andbothneedcomputationallyaccessibleandsharablemetadatato supportautomatedinformationdiscovery,integrationandaggregation.

POESIA(Chapter3) introducestheconceptof ontologicalcoverages– tuplesof termstakenfrom amultidimensionalontology– whichareusedto describetheutilizationscopeof dataandprocessingresources,particularly in agriculturalsciences.The partial orderingamongthesedescriptorsenablethe organization,discovery, andreuseof resources.POESIAalsoincludesmechanisms,basedon ontologies,workflows andactivity models,to semanticallyorient thecompositionof Webservicesin cooperativedistributedprocesses(Chapter3) andhelpto tracetheinformationflow acrosstheseprocesses(Chapter4).

Webservicesdevelopmentandexecutionplatformsaredescribedin [88,53,138,249, 176].Bandholtz[16] proposethe useof Web servicesto shareontologiesanddescribesthe imple-mentationof aservicenetwork for this purpose.

2.8. ApplicationsandSupportingEnvironments 35

2.8.1 Scientific Workflows

Scientificwork is typically basedin experiments[43]. Sometimesscientistsrely on simplifiedmodelsof realworld phenomenato found their investigation,andusevastamountsof datatocorroboratetheir results.The technologicaldevelopmenthasgenerateda greatavailability ofdata,from a variety of heterogeneoussources,that scientistscanuseto enhancetheir experi-ments.Moreover, scientistscanexchangemodelsandcomputerprogramsimplementingthesemodels.Althoughscientificwork canvaryamongdiversepeople,disciplinesandorganizations,it canbenefita lot from dataandsystemsinteroperability.

ScientificWorkflows[41, 12, 168, 239,8] useworkflow technology[130, 123, 59] to man-agescientificwork. They regardscientificexperimentsascomplex processeswith intricatedatatransformationsand informationflow. Theseprocessesmay encompassautomaticandman-ual activities. The dataandexecutiondependenciesamongtheseactivities canbe very com-plex, yielding interoperabilityand synchronizationproblems. Many scientific processesaredistributed,in orderto enablecooperationof differentgroupsandfosterreuseof partialresults.Therefore,semanticWebservicetechnologiesarefundamentalto implementtheseprocessesinanopenenvironmentencompassingdifferentplatforms.

Scientificprocessesdiffer from businessprocessesin several aspects.Scientificwork de-mandsfreedomto try alternative waysof doing things. The sequenceof steps(andeven thegoal,sometimes)is not totally known in advance.Thescientistperformsometaskanddecideson thefurtherstepsonly afterevaluatingthepreviousones.Specificsubjectsin scientificpro-cessesmanagementincludedocumentation[238] andreorganization[156] of theseprocesses.

The exploitation of the workflows paradigmfor managingscientific processeshasbeenexploitedin specificdomainssuchasbioinformatics[34,41,223, 170] andgeoinformatics[214,241, 169, 11, 129, 259, 10]. For instance,Cavalcantiet al. [41] combinesmetadatasupportwith Webservicesin a framework to supportscientificworkflows andapplythis framework tostructuralgenomics.Seffino etal. [214], on theotherhand,usescientificworkflowsto describeandreusepatternsof geographicdataprocessingin agriculturalandenvironmentalapplications.

2.8.2 GeographicInf ormation SystemsInter operability

Geographicinformationsystems(GIS)[3, 167, 49] managedatareferringto geographicentitiesor phenomena.Thesedataaregeo-referenced,i.e.,they carrysomeindicationof thegeographiclocation. A GIS providesspecializedbasicfacilities to processgeographicdata,beingusefulfor informationextraction,planninganddecisionsupport,amongotherkindsof applications.

TheGIS market is characterizedby proprietaryformatsthatmake interoperabilityhardtoachieve. Many formatshavebeenproposedfor exchanginggeographicdata[192, 217, 9]. How-ever, scientistshaveprogressively foundout thatstandardformatsarenotenoughto strengthenGIS interoperability[105]. The conversionof datathroughtheseformatsoften resultsin in-

2.9. Conclusions 36

formationloss, incorrectinterpretationof dataandpoor informationquality [51]. It happensbecauseformatsfor geographicdataexchangearemainly concernedwith syntax,structureandthegeometryof geographicobjects.EvenGML (GeographyMarkupLanguage)[192] do notensurethecorrectinterpretationof data,becauseit doesnot take into accountthesemanticsandthebehavior of geographicobjects.

Theimportanceof establishingasemanticbasisfor geographicdatarepresentationandman-agementhasbeenrecognizedin severalpapers[70, 203, 54,90,161, 252]. Corcoleset al. [54]describesanapproachfor integratinggeographicdata,basedon mappingsbetweenontologiesandXML schemas.They presentanontologyto supportthecreationandexchangeof semanticdescriptorsfor geographicresources(XML documentscontaininggeographicdata). The de-scriptorsandthelinks amongthemandtheresourcesthemselvesarebothexpressedin RDF. Itenablesauniquelanguagefor queryingGML documents,withoutknowledgeof theirstructure.

Ontologiesfor the integration of geographicdataappearin [90, 161, 252]. Fonsecaetal. [90] employs ontologiesto defineclassesfor developinggeographicapplications. Theirapplicationsrely on ontologyserversandmediatorsto accesstheir datasources.It allows, forexample,loadingdatainstancesfrom heterogeneousdatasources,usinga schemadefinedbyoneontology.

GIS interoperabilityalsorequiresadditionallevels of integrationsuchascommonalityofsystemsbehavior and system-userinteraction. The adoptionof a commongeographicdatamodel [228, 26] or at leasta framework to unify heterogeneousmodels[50] constitutesoneingredientto achieve this goal.

2.9 Conclusions

Integrationof heterogeneousdatahasbeenoneof thegreatestchallengesin databaseresearch.Theadventof theWebis pushingthedemandfor solutions,andreformulatingthisproblemintoa morecomplex setting– thediscovery, selectionandcompositionof dataandservices.Solu-tions for all theseproblemsinvolve versatilestandardsandenrichingtheWebwith semantics,in orderto allow interoperabilitywhile embracingdiversity.

TheWebis becomingthecommonplatformfor implementingcooperative distributedsys-tems.ThesemanticWebandworkflows basedon thecollaborationof servicesacrosstheWeb,areexpectedto expandtheroleof computersto supporthumanactivitiesin avarietyof fields.Inthisopendistributedenvironment,dataprocessingandsemanticscannotbedissociated,becausethe meaningof datadependson the whole processemployed to producethem. Technologyto supportthe idealizedsystemsis underfastdevelopment,in areasrangingfrom knowledgemanagementto Web servicesdevelopmentandcomposition. Concreteapplicationsmust bedevelopedin thenearfutureto fulfill endusers’expectations.

This survey hasoutlinedthe researchon informationsystemsinteroperability, from work

2.9. Conclusions 37

on interconnectionof relationaldatabases,to the mostrecentdevelopmentsin semanticWebservices.The major contributionsare: (1) describingandcomparingproposedstandardsandarchitectures;(2) categorizing heterogeneityand proposedsolutions; (3) discussingspecificneedsrelatedwith dataandservicesintegration,particularlyfor scientificapplications.

Acknowledgments

This work waspartially supportedby Embrapa,CAPES,CNPqandthe MCT/PRONEX-SAIandtheCNPqWebMapsprojects.Thanksto professorsAna CarolinaSalgado,CaetanoTrainaJunior, Celio CardosoGuimaraesandEdmundoMadeira,whoprovidedseveralsuggestionsfortheimprovementof thiswork.

Chapter 3

POESIA: An Ontological WorkflowApproachfor ComposingWebServicesin Agricultur e

3.1 Intr oduction

Webservices[253] arecomponentsfor constructingnext-generationWebapplications.ThesecompositeWebapplicationsarebuilt by establishingmeaningfuldataandcontrolflowsamongindividualWebservices.Thesedataandcontrolflows form workflowsconnectingcomponentsdistributedover theInternet.However, therehasbeenvery limited researchon thecompositionof Webservicesusingworkflow conceptsandtechniques.This is partiallydueto thelimitationsof centralizedcontrol in traditionalworkflow managementsystems,which areinadequateforthescalabilityandversatilityrequirementsof Webapplications(e.g.,dynamicrestructuringofprocesses[168] andactivities [157]).

This paperbridgesthis gapby applyingadvancedworkflow andactivity conceptsin thecompositionof Web servicestoward the constructionof sophisticatedSemanticWeb applica-tions. Our approachis calledPOESIA(Processesfor Open-EndedSystemsfor InformationAnalysis),an openenvironmentfor developingWeb applicationsusingmetadataandontolo-giesto describedataprocessingpatternsdevelopedby domainexperts.Thesepatternsspecifythecollection,analysis,andprocessingof datafrom avarietyof Internetsources,thusprovidingbuilding blocksfor next-generationSemanticWebapplications.

Themaincontribution of thepaperis POESIA’s supportof Webservicecompositionusingdomainontologieswith multiple dimensions(e.g.,space,time,andobjectdescription).Tuplesof termstakenfrom theseontologies,calledontological coverages, formally describeandorga-nize theutilization scopesof Webservices.A utilization scopeis a context in which differentdatasetsandspecificversionsof arepertoireof servicescanbeused.In POESIA,Webservices

38

3.2. Applicationscenario 39

arecomposedunderthesescopesthroughwell-definedoperationssuchasspecializationandaggregation.Rulesbasedon thecorrelationof utilization scopesandtheir ontologicalrelation-shipsenablesystematicmeansto verify thesemanticandstructuralconsistency of Webservicescompositions.In addition,POESIAontologiesareusedin thedeterminationof thegranularitiesfor selectingandintegratingdataandprocessesaswell ashelpingto describetheir semantics.

Thesecondmaincontributionof thispaperconsistsin showing how POESIAresolvessomeopenissuesin Web servicescomposition.This is donethroughthemodelingof a substantialapplicationof practicalimpactusingPOESIA.Our applicationis in theareaof environmentalinformationsystems,specifically, agriculturalzoning– thedeterminationof landsuitability forimportantcrops. Agricultural zoning is a challengingapplicationfor several reasons.First,several kinds of heterogeneousscientificdatastreams,suchasmeteorologicalmeasurements,aregatheredcontinuouslyin largevolumesandcorrelatedfor specifictemporalandspatialcon-ditions. Second,thesedatasourcesare distributedover the Web, increasinglythroughWebservices.Third, agriculturalzoningis a cooperative (distributed)decision-makingprocessin-volving expertsfrom severalfields.Finally, it requirescontinuousprocessingsincethesituationis frequentlyreevaluateddependingon temporal(seasonal)changes.

POESIA is a contribution toward the realizationof the vision of the SemanticWeb forscientificapplications.It allowsthepartialautomationof someexpertreasoningfor organizing,reusing,and composingnot only databut also the Web servicesthat provide accessto andprocessthesedata.

The remainderof this paperis organizedas follows. Section3.2 describesour exampleapplication.Section3.3 definesthe domainontologiesandontologicalcoveragesthat arethebasisof our approach.Section3.4 presentsthe POESIAapproachto specifyandreuseWebservices.Section3.5 outlinesthemain technicalissuesin the implementationof thePOESIAenvironment.Section3.6discussesrelatedwork, andSection3.7concludesthepaper.

3.2 Application scenario

3.2.1 Agricultural zoning

Agricultural zoningis a scientificprocessto determinelandsuitability in a geographicregionfor a collectionof crops. This processclassifiesthe land into parcelsaccordingto their suit-ability for aparticularcropandthebesttimeof yearfor key cultivationtasks(suchasplanting,harvesting,pruning,etc). Thegoal of agriculturalzoningis to determinethebestchoicesfora productive andsustainableuseof the land while minimizing the risks of failure. However,someconstraintsmay imposeinevitable trade-offs that leadto compromises(e.g.,short-termproductivity vs. long-termsustainability). Typically, agriculturalzoning requireslooking atmany factorssuchasregionaltopography, climate,soil properties,andcroprequirements.Ad-

3.2. Applicationscenario 40

Figure3.1: Determininglandsuitability for Coffeaarabica in Brazil’sCenter-South

ditional concernsinclude interactionswith wildlife, environmentalpreserves,andsocialandmarket impact.

As illustratedin Section3.2.2,agriculturalzoning is a complex processconsistingof in-tricate interactionsamonga variety of datasources. The processis built by cooperationofexpertsfrom many scientificandengineeringdisciplines.For example,agronomistscontributewith planting techniquesandcrop managementmodels. Biologistsprovide crop growth andnutrientrequirements.Statisticiansprovide risk managementanalysisfor potentialcrop fail-ures(e.g.,dueto severeweather).Environmentalscientistsanalyzetheimpactof cropselectionover theenvironmentfor boththeshortandlong term.Theseandotherscientistsandengineersbring togethertheir expertiseanda varietyof computationalanddataanalysistoolsto build anagriculturalzoningmodel.

At run time,anagriculturalzoningprocessobtainsrelevantdatafrom avarietyof heteroge-neoussources,primarily sensorsthatcollectdataon physicalandbiologicalphenomena(e.g.,weatherstations,satellites,laboratoryautomationequipment).Sincegatheringandprocessingreal-timedatacanbecostly, databasesystemsandexisting documentsin differentformatsarefrequentlyusedasalternativesources.In any case,largeamountsof fine-graineddataareusu-ally requiredfor extractingthe neededinformation. Both dataanddataprocessingtools canbe encapsulatedandprovided throughWeb services. In summary, agriculturalzoning com-binestoolsandservicesdevelopedby a diversesetof scientistsandintegratesdatafrom many

3.2. Applicationscenario 41

heterogeneoussourcesthroughcoordinatedactivities,asdescribedby POESIA.Agricultural zoninghasbeena labor-intensive processthat is both expensive andslow to

developdueto thecomplexitiesmentionedabove. This is aseriousissuesinceit is anextremelyimportantproblemfor a country with many commercialcropssuchas Brazil. Supposewewant to producean agriculturalzoning model for the top 20 cropsfor eachregion. Let usconsiderthe10 majorvarietiesof eachcrop(thesevarietiesusuallyhavedifferentweatherandsoil requirements).Simply dividing Brazil accordingto stateboundaries(27 states)will resultin morethan5000models. It is clearthatwe needa systematicway to developandmaintainthesemodelssincemanualprocesseswill betooexpensiveanderrorprone.

3.2.2 Casestudy

Figure3.1 illustratesa specificagriculturalzoningprocess,namely, land suitability for Cof-feaarabica in theCenter-Southregion of Brazil. Coffeaarabica is themainspeciesof coffeeproducedby Brazil. Althoughcoffee is no longerthecountry’s numberoneexport product,itremainsoneof themajor farmexport productsdueto thehigh commercialvalueof goodcof-fee. Thezoningprocessfor Coffeaarabica is composedof severaldistributedandcooperatingactivities, representedby ellipses.Datafrom severalsourcesareprocessedby theseactivities,andtheresultsgeneratedby eachactivity aretransferredto otheractivitiesor datarepositories.

Accordingto domainexperts[75,262], themostinfluentialenvironmentalfactorsfor Coffeaarabicaare: (1) soil wateravailability, (2) air temperature,and(3) the risk of freezing.Thesefactorsarereflectedin the structureof the land suitability processin Figure3.1, which relieson a datawarehouseof climateattributesto obtainaggregatedvaluesof measurements,suchasmaximum,minimum,andaveragetemperature,andtotal rainfall, in appropriatetime gran-ularities. This warehouseis a compositeWeb serviceencompassingresourcesfor collectingandmaintainingclimatedatafrom several regionsandinstitutions. It servesasinput to threeactivities that canbe executedin parallel– EstimateWater Balance, AssessAir Temperature,andAssessFreezingRisk. Theactivity EstimateWater Balancetakestheexpectedrainfall andtheaverageair temperaturefor eachmonthof theyear, thewaterretentioncapacityof thesoils,andsomephenologicalcoefficientsof coffee plants(collectedfrom legacy databasesystemsandscientificpublicationsin agronomy)to estimatethewaterbalance– a measurementof theexpectedamountof moistureavailablein thegroundthroughtheyear. EstimateWaterBalanceis followedby AssessWaterDeficit, whichcomparesthedatafrom waterbalancewith thewaterdemandsof theplantsduring their successive phenologicalstages,producingthewaterdeficitindex (WDI) – ameasurementof theexpecteddeficit of waterfor thecropthroughouttheyear.

In a similar way, theactivities AssessAir Temperature andAssessFreezingRiskuseotherclimatedataandtopographicdatato producetheaverageair temperature,theprobabilityof airtemperatureexceeding34� C, andtheprobabilityof freezing.Thesepartialresults(indicesand

3.2. Applicationscenario 42

Figure3.2: Landsuitabilitymapfor Coffeaarabica in Parana State

probabilities)are visualizedasmaps,showing the distribution of the relevant measurementsor estimationsacrossthe region. Whenall theseactivities finish anddeliver their results,theactivity ClassifyParcelsfusesthesepartial resultsto determinethesuitability of theexpectedenvironmentalconditionsacrossthelandsfor thecrop.

The datasourcesandactivities for agriculturalzoningmay be dispersedacrossdifferentsitesover the Internet. Furthermore,theseprocessesaresensitive to crop, location,andtime,i.e., they dependon thespeciesandvarietyof thecrop,theenvironmentalcharacteristicsof theregion,andtheopinionof theexpertsinvolved.Thegranularitiesfor which theseprocessesaredefinedareusuallynotuniform. Indeed,for somecropsit is possibleto deviseagenericzoningprocess,while othercropsrequirespecificprocessesfor eachplant variety. Similarly, certainzoningprocessesaredefinedfor vastregionsandothersfor specificlandparcels.

The mapof Figure3.2, borrowed from [75], shows the land suitability resultsfor Coffeaarabica in the stateof Parana. It shows, for instance,that in the southernareaof the state,onefreezingevent happenson averageevery 2 years. Freezingscanimpair the productivityandeven kill coffee trees,renderingthat areaunsuitablefor coffee cultivation. Governmentsandfinancial institutionsrely on this kind of information,for instance,to defineandenforceadequateloan grantingpolicies. Thesepolicies direct farmersto choicesand practicesthat

3.2. Applicationscenario 43

contribute to lessenrisks and increasethe productivity of their enterprises.Experiencesinsectorsof Brazilianagriculture[212] in thelastfew yearscorroboratetheeconomicadvantagesof adoptingthis scientificapproachto agriculturalzoning.

3.2.3 Technicalchallenges

In ourapplicationexample,thesemanticsof dataareinterrelatedwith theprocessesthatmanip-ulatethem,sothatdataandprocessescannotbecompletelydecoupled.Interconnectedactivi-tiescooperatewith eachotherto processdatacollectedfrom severalheterogeneousdistributedsources,giving riseto distributedprocesseswhosecomplexity requirestheirorganizationin sev-eralabstractionlevels.Theoutputsof a processcancontributeto theinputsof otherprocesses.The datasourcesto be taken into accountandthe resultinginformation,for eachspecificap-plication,aredynamicallydefinedby userrequirementsandcontingenton climatic conditions.Theanalysisof theresultsgivesfeedbackto improvetheprocessor devisenew ones.However,despitethenumerousvariantsof theseprocesses,somepatternscanberecognized.

Thesescientific processesare in fact vastanddistributedefforts for dataintegrationandfusion. By data integration we meanthe transformationsappliedto heterogeneousdatasothat they canbeanalyzedtogetherfor somespecificpurpose.It doesnot imply thatdatamustbe coercedand congealedinto a global schema. What mattersis the correct interpretationanduseof the data. Data fusion consistsin applying somefunction to a collection of datavaluesto produceothermeaningfulvalues(e.g.,fusetheexpectedenvironmentalconditionsofa landparcelto determineits suitability for acrop).Our experiencewith scientificapplicationsshows thatdataintegrationandfusionarescatteredacrosstheconstituentactivitiesof complexprocessesat distinct abstractionlevels. Expertsin this kind of context facemany challenges,someof whicharedescribedbelow.

IdentifyingResources Lack of catalogsandinspectionmechanismsto find andreuseavailableWebresourcesto solveeachparticularproblem.

SystemsInteroperability Domain expertsand technicianswastetime converting dataamongformatsof differenttools.Thiseffort shouldbespenton application-specificissues.

DataTraceability Thereis no meansto track dataprovenance,i.e., their original sourceandtheway they wereobtainedandprocessed.This hamperstheevaluationof whetherthequalityof adataitemsatisfiestherequirementsof aparticularapplication.

ProcessDocumentationandExecutionProcessesarerarely documented.Whenthis is done,thespecificationsproducedareeithernot broadenoughfor giving a generalview of theprocessesor not formal enoughto allow the automaticrepetitionof the processwithdifferentdatasets.

3.3. Ontologicaldelineationof utilizationscopes 44

ProcessVersatility Thereshouldbeschematicmeansto reformulateprocesseson thefly. Thiskind of decisionsupportsystemrelieson continuousfeedbackto improve theprocesses– asdatakeeparriving andresultsareproduced,theprocessesmayevolve.

AdaptationandReuseMechanismsfor adaptationandreuseof Webservicescouldboostpro-ductivity andenhancethequality of theresults.

Theseissuesarecommonto severalkindsof applicationsinvolving distributedprocessesoverthe Web. The following sectionsdescribethe POESIAapproachfor handlingsomeof theseissues.

3.3 Ontological delineationof utilization scopes

Ontologies[110] describethemeaningof termsusedin aparticulardomain,basedonsemanticrelationshipsobserved amongtheseterms. In the POESIAapproach,they play a crucial rolein composingWebservices.Concretely, ontologiesdelineatetheutilization scopesof datasetsandprocessesandorienttherefinementandcompositionof Webservices.A utilization scope,or scopefor short,is a context in which differentdatasetsandspecificversionsof a repertoireof servicescan be used. In this section,we describethe structureof our multidimensionalontologiesandhow they delineateandcorrelateutilization scopes.Thesearethe foundationsof our schemeto catalogand reusecomponentsand ensurethe semanticconsistency of theresultingWebservicescompositions.

3.3.1 Semanticrelationshipsbetweenwords

Let � be a setof simpleand/orcompositewordsreferringto objectsor conceptsfrom a uni-verseof discourse� . Objectsarespecificinstances(e.g.,Brazil ). Conceptsareclassesthatabstractlydefineandcharacterizea setof instances(e.g.,Country ) or classes.Theuniverseof discoursegivesacontext wherethemeaningof eachword ����� is stableandconsistent.

Thefield of linguisticsdefinesseveralsemanticrelationshipsbetweenwords. We considerthefollowing subsetin this work:

SynonymTwo wordsaresynonymsof eachotherif they refer to exactly thesameconceptsorobjectsin � .

Hypernym/hyponymA word � is a hypernymof anotherword � (conversely �� is a hyponymof � ) if � refersto a conceptthat is a generalizationof theconceptreferredto by � in� . Hyponym is theinverseof hypernym.

3.3. Ontologicaldelineationof utilizationscopes 45

Holonym/meronym A word � is aholonymof � (conversely� is ameronymof � ) if �� refersto aconceptor objectthatis partof theonereferredto by � in � . Meronym is theinverseof holonym.

Roughlyspeaking,synonym standsfor equivalenceof meaning,hypernym for generaliza-tion (IS A), andholonym for aggregation(PART OF). For example,in the agriculturerealm,Cultivar is asynonymof Variety of Plant andCrop is ahypernymof Cultivar .

A setof words � is saidto besemanticallyconsistentfor theuniverseof discourse� anda setof semanticrelationships� if at mostonesemanticrelationshipof � holdsbetweenanypair of wordsin � . Thisensuressomecoherencefor themeaningsof thewordsin � for � .

Thesemanticrelationshipsdefinedabove preserve certainproperties.Let � , � , and �� beany threewordsand denoteoneof thesemanticrelationshipsconsidered.Then,for a givenuniverseof discourse� , thefollowing conditionshold:

� ����������������� (reflexivity)

� �� ������� �� �� �� �� (transitivity)

� ���������������������� �� �� � �� (transitivity wrt synonyms)

Thesepropertiesenabletheorganizationof asetof semanticallyconsistentwords � accord-ing to theirsemanticrelationshipsin agivenuniverseof discourse� . Thesynonymrelationshippartitions � into a collectionof subsetssuchthat the wordsof eachsubsetareall synonyms.Thetransitivenessof thehypernymandholonymrelationshipscorrelatesthesemanticsof wordsfrom differentsubsetsof synonyms, inducinga partial orderamongthe wordsof � . The re-sultingarrangementof semanticallyconsistentwords is a directedgraph !#" thatexpressestherelativesemanticsof thewordsof � for theuniverseof discourse� (seeproof in Annex I). Thenodesof !#" arethesubsetsof synonymsof � . Thedirectededgesof !#" representthesemanticrelationshipsamongthewordsof differentsubsets.Thereis a directededgefrom vertex $ tovertex $% of !#" if andonly if eachword of $ is thehypernymof all thewordsof $& or eachwordof $ is theholonymof all thewordsof $ .

Considerthe casewhereall the wordsof � representconcepts.Thenan arrangementofsemanticallyconsistentwords is called an arrangementof semanticallyconsistentconcepts.Figure3.3illustratesanarrangementof conceptsfor territorialsubdivisions.It is anextractfromaverylargesetof ontologicalconceptsusedby expertsfor developingagriculturalapplications.

Theconceptsappearin therectangles.Theedgesrepresentinghypernym relationshipsaredenotedby a diamondcloseto thespecificconcept,andtheedgesrepresentingholonym rela-tionshipsaredenotedby ablackcirclecloseto thecomponentconcept.Thisgraphdenotesthata Country is composedof a setof States or, alternatively, a setof Country Regions .A Country Region maybeaMacro Region , anOfficial Region , or anotherkind

3.3. Ontologicaldelineationof utilizationscopes 46

of region. Macro andOfficial Regions arecomposedof States , but a region of typeMetro Area is composedof Counties . Eco Region andMacro Basin defineotherpartitionsof spacebasedon ecologicalandhydrologicalissues,respectively. Thereis no con-strainton the geometryof the land parcelsmodeledaccordingto theseconcepts,except forthe containmentrelationshipsimplied by the hypernymandholonymrelationships(e.g.,eachstate mustbeinsideonecountry ).

Figure3.3: An arrangementof conceptsrelative to territorial subdivisions

Givenanarrangement!'" for a semanticallyconsistentsetof words � , we saythata word�(�)� encompassesanotherword �*�)� , denotedby �,+-.�� , if andonly if � and �� areinthesamevertex of !'" (i.e., �(-/� or �0�������������1� ) or thereis a pathin !'" leadingfromthevertex containing� to thevertex containing� (i.e., thereis asequenceof hypernymand/orholonymrelationshipsrelatingthe meaningof � to the morerestrictedmeaningof �� ). Theencompassrelationshipis transitive(seeproof in Annex I). Accordingto Figure3.3,Country

+ - State , Country + - County , andsoon.Now considerthe instantiationof theconceptsfrom Figure3.3. For example,theconcept

Country can be instantiatedto Brazil , State to its states,and so on. Let us call theinstancesof conceptsterms. If thereis a semanticrelationshipbetweentwo conceptsof anarrangementof concepts,the samerelationshipholdsbetweentermsinstantiatedfrom theseconcepts.Therefore,thearrangementof semanticallyconsistentconceptsplaysa role like thatof a schemafor the correspondingsetof terms,inducinga similar structure(direct graph)toarrangethesemanticallyconsistentterms.Figure3.4aillustratesasubgraphof thearrangementof conceptsfrom Figure3.3andonecorrespondingarrangementof termsreferringto Brazilianregions,states,andsoon.

Termsarenot restrictedto instancesof objects. Figure3.4b illustratesan arrangementof

3.3. Ontologicaldelineationof utilizationscopes 47

Figure3.4: Arrangementsof semanticallyconsistentterms

conceptsand one correspondingarrangementof termsreferring to cropsand their varieties.Grains,beans,rice, corn,etc. do not refer to specificobjectsbut to concepts(or classes).Thisis anexampleof a specializationrelationshipbetweenthetermsandtheir respective concepts.Furtherformalizationof thesenotionsis outsidethescopeof this paperandappearsin (AnnexI).

3.3.2 POESIA ontologiesand ontological coverages

A POESIAontology is a collectionof arrangementsof semanticallyconsistentterms.Eachar-rangementdescribesa particulardimensionof the domain. For instance,Figure3.4 presentsfragmentsof arrangementsof termsfor the (a) spaceand(b) productdimensions,with there-spective arrangementof conceptson the left of eachhierarchy. On referringto a termof sucha hierarchy, one must qualify the term with the correspondingconceptof the respective ar-rangementof conceptsby usingtheexpressionconcept(term)in orderto avoid ambiguity. ThusState(RJ) refersto the Brazilian state calledRio de Janeiro (RJ is an acronym),while County(RJ) refersto thecounty of thesamename.

An entirepath in the hierarchymay be requiredto preciselyindicatea term (e.g., if thesamecounty nameappearsin differentstates ). An unambiguousreferenceto a term ofan ontology 2 is a pathin oneof the arrangementsof termsof 2 . This pathis expressedbythe concatenatedsequenceof concept(term)verticesvisited within it. This sequence,whentaken as a string, must be uniqueacrossall the dimensionsof the ontology. For instance,State(RJ).County(Campos) is anunambiguousreferenceto thecounty calledCam-

pos in thestate calledRio de Janeiro . ThetermCrop(beans) is anunambiguous

3.3. Ontologicaldelineationof utilizationscopes 48

reference,too,becausethereis only onecrop calledbeans .Finally, we arereadyto defineontologicalcoveragesandtheir properties.An ontological

coverage is a tupleof unambiguousreferencesto termsof aPOESIAontology. Someexamplesof ontologicalcoveragesare:

[Country(Brazil)] ,

[Crop(beans)] ,

[Country(Brazil),Crop(beans) ] , and

[Country(Brazil),Crop(beans) ,Cro p(ri ce)] .

Eachof theseontologicalcoveragesexpressesoneutilizationscope, or scopefor short,i.e.,acontext in whicha datasetor servicecanbeused.

An individual termof anontologicalcoverageexpressesa utilization scopein a particulardimension.For instance,the termCountry(Brazil) , definedin thespacedimension,ex-pressesthe utilization scope“the whole country calledBrazil ”. The universalcoverage(denotedby 3 ) is theemptytuple. It doesnot restricttheutilization scopein any dimension.Thescopeexpressedby termsreferringto thesamedimensionis a restrictionof theuniversalscopeto theunionof thescopesexpressedby theindividualterms.For instance,theontologicalcoverage[State(RJ),State(SP)] expressesascopeobtainedby theunionof thescopesindividually expressedby the termsState(RJ) andState(SP) . The scopeexpressedbytermsreferring to differentdimensionsrestrictsthe universalscopeto the intersectionof thescopesexpressedby the individual terms. For example,[State(RJ),Crop(orange)]

restrictsthe scopeto the intersectionof the scopesdefinedby the spatial dimensiontermState(RJ) andthe agriculturalproductdimensionterm Crop(orange) . To narrow thescopein aparticulardimension,onehasto chooseamorespecifictermin theontology(e.g.,gofrom State(RJ) to County(Campos) ). Theabsenceof termsfor a particulardimensionmeansthatthescopeis not restrictedto thatdimension.

The semanticrelationshipsamongthe termsof a POESIA ontology inducesemanticre-lationshipsamongontologicalcoverages. Given two ontologicalcoverages,4 and 4 , de-fined with respectto the sameontology 2 , 4 encompasses4# , denotedby 4 + -546 , if andonly if for eachterm � �74 there is anotherterm � �54 such that � + - � (where� and � are in the samedimensionof 2 ). For example, [Country(BR)] + - [Coun-

try(BR).Region(CS)] , i.e., thewholecountryencompassesits Center-Southregion.Theencompassrelationshipbetweenontologicalcoveragesis transitive, inducinga partial

order amongcoveragesreferring to the sameontology (seeproof in Annex I). The univer-sal coverageencompassesany other. Thus, 3 + - [Country(BR)] , [Country(BR)] + -

3.4. ThePOESIAactivity model 49

Figure3.5: A schemafor POESIAontologiesandontologicalcoverages

[Country(BR),Crop(beans)] , andsoon. Onecanalsoevaluatetheequivalenceof on-tological coverages. Two ontologicalcoverages4 and 46 areequivalent(denotedby 4�8�46 )if andonly if they encompasseachother(i.e., 49+-:4# and 46;+ -.4 ). This occursif eachtermin 4 hasa synonym in 46 andvice versa.For example,[Country(Brazil)] 8 [Coun-

try(BR)] becauseBRcanbeusedasasynonym of Brazil.Figure3.5presentsanentity-relationshipdiagramfor POESIAontologiesandtheontologi-

calcoveragesdefinedaccordingto suchontologies.It showsthataPOESIAontologyhasoneormoredimensions.Thedomain-specifictermsfor eachdimensionareorganizedin anarrange-mentof semanticallyconsistentterms.Thequalifiersof theseterms,i.e., theconceptsdefiningthe classesof terms,areorganizedin an arrangementof semanticallyconsistentconceptsforeachdimension.An ontologicalcoverageis atupleof termstakenfrom oneor moredimensionsof anontology.

3.4 The POESIA activity model

3.4.1 Overview

Thebasicconstructof themodelis theactivitypattern. It mayreferto any kind of dataprocess-ing task– computationaland/ormanual. Thesetasksareperformedin an openenvironment,comprisingseveralplatforms.In POESIA,activity patternsareimplementedasWebservices.

An activity patternhasa setof communicationports,calledparameters, to exchangedata

3.4. ThePOESIAactivity model 50

with otheractivity patternsanddatarepositories.Eachparameterof an activity patternrefersto a Web serviceencapsulatinga datasourceor sink for that particularpattern. Eachinputparameteris associatedwith outputsof anotheractivity patternor with a datarepository. Con-versely, eachoutputparameteris associatedwith inputsof anotheractivity patternor with adatarepository.

POESIAemploys aggregation,specialization,and instantiationof activity patternsto or-ganizeandreusethe componentsof processesasproposedin [157, 154]. Thesemechanismsdeterminehow processescanbe composedandadapted.Activity patterncompositionis de-pictedby a hierarchicalgraph,whereintermediatenodesarecompositepatternsandleavesareatomicor simplepatterns.Thelattermustbespecializedbeforethey aredecomposed.

A hierarchyof activity patterns,i.e., of Webservices,is calleda processframework. Eachactivity patternof aprocessframework is associatedwith anontologicalcoveragethatexpressesits utilization scopein orderto drive theselectionandreuseof components.A processframe-work mustberefined,adaptedto aparticularsituation,andinstantiatedbeforeexecution.POE-SIA providessomerulesto checkthesemanticconsistency of processframeworksandinstan-tiatedprocessesbasedon correlationsof the ontologicalcoveragesof their constituents.Forexample,theontologicalcoveragesof all thecomponentsof aprocessframework mustbecom-patiblewith (encompassor beencompassedby) theontologicalcoverageof thehighestactivityin thehierarchy.

Let usillustratethesenotionswith asimpleexample.Figure3.6presentsasimplifiedframe-work for agriculturalzoning. It shows that the major componentsof Agricultural ZoningareCalculateClimateAttributesandDetermineLandSuitability. Theformer, which is composedof CollectWeatherIndicatorsandConsolidateClimateData, collectsweatherdatafrom avari-etyof Webservicesandconsolidatestheminto theWebservicesof landclimateattributes.Theactivity patternDetermineLand Suitability takesthe climateattributes,alongwith otherdatarelevantfor onespecificcrop,to determinethemostappropriatelandsfor thatcrop.

This framework appliesto the zoning of any crop. To obtain instantiatedprocessesforspecificcrops,onemustadapttheconstituentactivities to thepeculiaritiesof thatcrop.For ex-ample,therelevantenvironmentalconditionsfor zoningcoffee(discussedin Section3.2.1)aredifferentfrom thosefor zoningrice. ThusDetermineLandSuitabilityandits two constituentsmustbespecializedfor eachcrop. In addition,aspecificactivity mustbedefinedto assesseachrelevantenvironmentalconditionfor eachcrop. On theotherhand,theactivities thatcalculateclimateattributesdonot requireadaptation,asonegeneralWebservicecansupplyclimatedatato severalspecificservicesfor determininglandsuitability for differentcrops.Theontologicalcoveragesassociatedwith the Web servicesenableautomatedmeansto checktheir compat-ibility for compositionwith respectto their utilization scopes.This helpsdomainexpertstoorganizeandcomposetheservicesnecessaryfor their applicationsandfactortheir solutionstoreducecostsaccordingto domain-specificconceptsandreasoning.

3.4. ThePOESIAactivity model 51

Figure3.6: Processframework for agriculturalzoning

3.4.2 Activity pattern

An activity pattern is an abstractionthat definesthe structureandbehavior of a collectionofinstancesof dataprocessingactivities implementedasWebservices,muchlikeaclassdoesforinstancesof objects[154]. Activity patternsalsoresemblesoftwaredesignpatterns[96] in thesensethat eachactivity patternis designedto solve a well-definedcategory of problemsin aparticularutilizationscope.Definition3.4.1depictsthestructureof anactivity pattern.

Definition 3.4.1Anactivity pattern < is a five-tuple:

=?>A@CB0DFE 4�GIH DIJKEML�>NE GO��P E P @CQSRUTwhere:>A@CB0D

is thestringusedasthenameof <4'GKH D�J is theontological coverageof <

i.e., expressesits utilizationscopeLV>is thelist of input parametersof <

GO��P is thelist of outputparametersof <P @6QWR describestheprocessingchoresthat < does>A@CB0D

, 4�GIH D�J ,LV>

, and GX��P representtheexternal interfaceor signatureof thepat-tern. P @CQSR specifiesthebehavioral semanticsof theactivity patternincludingthecompositionsemanticsandtheexecutiondependenciesbetweencomponentpatterns.

Figure3.7presentsthetextualspecificationof anactivity patternto determinelandsuitabil-ity for an arbitrarycrop whose

>A@CB0Dis DetLandSuitability , ontologicalcoverage,

3.4. ThePOESIAactivity model 52

#DEFINE RNA "http://www.agric.go v.br /rn a/p ub_d ocs "

ACTIVITY_PATTERN DetLandSuitability [Country(BR), Cons(RNA)]

INPUTSClimAttr: "RNA/clim_info.wsd";LandsInfo: "RNA/lands_info.wsd";CropInfo: "RNA/crops_info.wsd";

OUTPUTSZoning: "RNA/agric_zoning.ws d";

LOCALEnvCond: "RNA/env_cond.wsd";

BEGIN TASKCOMPOSITION

AssessEnvCond (IN: ClimAttr, LandInfo, CropInfo;OUT: EnvCond);

ClassifyParcels(IN: EnvCond; OUT: Zoning);EXECUTION DEPENDENCIES

AssessEnvCond PRECEDESClassifyParcels;END TASK;

END ACTIVITY_PATTERN;

Figure3.7: Activity patternto DetermineLandSuitabilityfor anunspecifiedcrop

4'GKH DIJ , is [Country(BR),Cons(RNA)] , i.e., Brazil, accordingto the methodologyofRNA,1 the

LV>and GX�P parametersarespecifiedas

LV>ZY �P Q and GO��P Y �P Q , and P @6QWRis composedof two activity patterns– AssessEnvCond andClassifyParcels – invokedwithin DetLandSuitability . Thesecomponentpatternsareassumedto bedeclaredelse-where.Figure3.7alsoshows a few specialkeywords.The#DEFINE clausespecifiesanaliasfor a URI thatis frequentlyusedin thepatternspecification.LOCALdeclarestheinternalvari-ablesof the pattern. The delimitersBEGIN TASKandEND TASKenclosethe specificationof the P @6QWR . COMPOSITIONenumeratesthe constituentpatternsof a compositepattern.EXECUTION DEPENDENCIESestablishesthe relative orderof executionof the constituentpatterns.EXECUTION DEPENDENCIESandTASK DESCRIPTIONareoptional. Anotherexampleof taskdescriptionis providedin Section3.4.4.

An activity patternimplementedasa Webserviceis uniquelyidentifiedby theURI of thesiteholdingit, its name,andits ontologicalcoverage.All thedataexchangedbyactivity patternscanbeviewedin XML. Eachparameteris associatedwith somedescriptionof thecapabilitiesof thecorrespondingWebservice– likethe.wsd (WebServiceDescription)filesreferencedin

1RNA standsfor RedeNacionaldeAgrometeorologia (NationalAgro-meteorologicalNetwork), a consortiumof Brazilianinstitutionslinkedto agriculturalresearch.

3.4. ThePOESIAactivity model 53

Figure3.7. Theservicedescriptionsmustprovide links to DTD or XML-schemaspecificationsthatdefinethe typesof all dataelementsthatcanbeexchangedvia therespective parameters.Links aredefinedasURIs.

Thedescriptionof eachactivity patternparameterincludesthedescriptionof theinterfaceoftheservicesthatcanbeboundto thatparameterto supportmoresophisticatedcommunicationthan just transferringpackets of semistructureddata. For example,the servicethat suppliesclimatedatato DetLandSuitability , denotedby the parameterClimAttr , allows thetarget to posequeries(e.g.,OLAP operators)specifyingfilters andgranularitiesfor thedatatobe transferred(e.g.,to get the averagetemperaturein a certainregion for eachmonth). Notethatdatafilters andgranularitiescanalsobeexpressedby ontologicalcoverages.This makesPOESIAontologiescentralnot only asa meansof organizingdataandservicesbut also fordefiningthe communicationinterfacesfor Web services.The designerof a processcanreferto publishedWeb serviceandschemadescriptionsor develop his own descriptionsto fulfillspecificdemands.This encouragesstandardizationandat the sametime confersflexibility toWebservicesanddatarepresentation.

The following subsectionspresentthe operationsfor composingactivity patterns(imple-mentedasWebservices)andsomerulesto checkthesemanticconsistency of thesecomposi-tions.Thespecificationsof activity patternsandtheircompositions(Figures3.7,3.10and3.12)arewrittenin alanguagethatwearedevelopingfor thispurpose.Thislanguagetakesadvantageof ontologicalcoveragesto describe,organizeandensuresemanticcorrectnessof Webservicecompositions.Someaspectsof our workflow specificationlanguage,suchas synchronizingmechanisms,areoutsidethescopeof this work. In thefuture,we cansubstituteour languagefor somestandardfor Webservicescomposition(e.g.,WSFL[255], BPEL4WS[250]). Weplanto extendsucha standardwith ontologicalcoveragesandassociatedrulesto expressthecom-positionof Webservices,by aggregationandspecializationof the respective activity patterns,emphasizingthecorrelationsof theservices’utilizationscopes.

3.4.3 Activity pattern aggregation

In POESIA, a complex activity patternis definedas an aggregation of a set of componentactivity patterns.A componentactivity patterncanitself be a complex activity patternor anelementaryactivity pattern.Figure3.8 shows theactivity patternDetermineLand Suitability,which is anaggregationof theactivity patternsAssessEnvironmentalConditionsandClassifyParcels.

Whendecomposingan activity patterninto its constituents(or, conversely, composinganactivity patternfrom the components),we have to make surethat thereis no conflict amongnamesandontologicalcoveragesof the activity patternsinvolved andthat all parametersareconnected.

3.4. ThePOESIAactivity model 54

Figure3.8: An aggregationof two activity patterns

Definition 3.4.2Activitypattern < is anaggregationof theactivitypatterns[�\ E�]^]^]_E [a` = ��b:c Tif thefollowingconditionsareverified(let c#dfe Ehg df�jiMek- g for each condition):

1. lCmonqpsrutwv�x�y{zq|%}~ rFt�v�xIy�mon?|I�0����&x��#y{zq|%}~ ����%x�#y�mon�|2. lCm n�� m���p�rutwv�x�y�m n |�}~ rFt�v�xIy�m���|��0����&x�#y�m n |�}~ ����%x�#y�m���|3. lCmonqp������x��#y{zq|S� ~ ����&x�#y�mon�|�������%x�#y�mon�|S� ~ �����x��#y{zq|4. lw�F����r�y{z�|jp��%mon����������h�o�����F����r�y�mon�|5. lw�F�u�� ¢¡Cy{z�|jp���mon����������h�o�����F�u�� ¢¡Cy�mon{|6. lCmon � � �O��r�y�mon�|£p¤� �O��r�y{zq|I��y¥�&m � �_���¤���¦�V���o� �u�� ¢¡Cy�m � |h|7. lCm n�� � �F�� §¡�y�m n |jp�� �F�� §¡�y{zq|���y¥�&m��W�����¤���¦�V����� �O��rZy�m¨�©|h|We call < an aggregated(or composite)activity patternandeach [ n a constituent(or com-

ponent)activitypattern.

Definition3.4.2statesthatanactivity pattern< is definedasanaggregationof � componentactivity patterns[�\ E�ª�ª�ª�E [a` if they satisfytheabove-mentionedsevenconditions.Condition1saysthatthenameandtheontologicalcoverageof eachconstituentpattern[ n mustbedifferentfrom the nameandcoverageof theaggregatedactivity pattern.Condition2 specifiesthat thenameandcoverageof a constituentactivity patterncanuniquelydistinguishitself from otherconstituentpatternsof < . Condition 3 statesthat the ontologicalcoverageof the compositepattern< mustencompassthecoverageof eachconstituentpattern[ n or viceversa,i.e., thein-tersectionof their utilizationscopesis not null. Condition4 ensuresthatevery input parameterof < is connectedto aninput parameterof someconstituent[ n . Similarly, condition5 ensuresthateachoutputparameterof < is connectedto anoutputparameterof some[ n . Finally, condi-tions6 and7 statethatall parametersof constituentpatternsmustbeconnectedto a parameterof otherconstituentor theaggregatedpattern.

3.4. ThePOESIAactivity model 55

3.4.4 Activity pattern specialization

Thedescriptorsof anactivity patterncanberefinedwhenspecializingthatactivity patternfor aparticularsituation.Figure3.9illustratesaspecializationof theactivity patternClassifyParcelsfor thecropC. arabica.

Figure3.9: A specializationof ClassifyParcels

Thespecializationof anactivity patterncanbeformally definedby relationshipssimilar tothoseusedto definetheaggregationabstraction.

Definition 3.4.3Activity pattern [ is a specializationof theactivity pattern < (conversely < isa generalizationof [ ) if thefollowingconditionsareverified:

1. rFt�v�xIy{z�|&}~ rutwv�x�y�m�|K������%x�#y{z�|&}~ ����%x��#y�mq|2. �����&x��#y{zq|¢� ~ ����&x��#y�m�|3. lw�F����r�y{z�|jp��;� �O��rZy�m�|��_���¤���h�o�����K«� 4. lw�F�u�� ¢¡Cy{z�|jp��S� �F�C ¢¡�y�m�|��_���¤���¦�V���o�K«� Wecall < thegeneralizedactivitypatternof [ and [ a specializedactivitypattern(version)

of < .

Condition1 of definition3.4.3statesthatthenameand/orontologicalcoverageof thegen-eralizedactivity pattern< mustbedifferentfrom thoseof its specializedversion [ . Condition2 statesthat the ontologicalcoverageof < mustencompassthatof [ . The notation ¬)­®¬a inconditions3 and4 meansthateachparameter¬a of [ mustreferto aWebservicethatis arefine-mentof theWebservicereferredto by thecorrespondingparameter¬ of < . This refinementofWebservicescanreferto their capabilitiesor datacontents.Theexactrelationshipbetweenthe

3.4. ThePOESIAactivity model 56

genericandtherefinedparametersis definedin thedescriptionof thecorrespondingWebser-vices.Ontologicalcoveragescanbeassociatedwith theseWebservicesto expressandcorrelatetheirutilizationscopes.

#DEFINE IAPAR "http://www.pr.gov.br/ iap ar/ pub_ doc s"

ACTIVITY_PATTERNClassifyParcels [Crop(Coffee).Group(ara bic a),

Country(BR).Region(CS) .St ate( PR),Cons(RNA).Inst(IAPAR)]

REFINES ClassifyParcels [Country(BR), Cons(RNA)]

INPUTSEnvCond->WDI: "IAPAR/wdi.wsd";EnvCond->AvgAT: "IAPAR/avg_at.wsd";EnvCond->ProbHeat: "IAPAR/prob_heat.wsd";EnvCond->ProbFreeze: "IAPAR/prob_freeze.wsd ";

OUTPUTSZoning->Zon_Coffee: "IAPAR/zoning_coffee.w sd" ;

BEGIN TASKDESCRIPTION

OVERLAYIF WDI <= 150 THEN "OK" ELSE "Water restriction";IF ProbHeat <= 30 THEN "OK"

ELSE "Thermal restriction";IF AvgAT <= 24 THEN "OK" ELSE

IF WDI <= 100 THEN "OK"ELSE "Thermal restriction";

IF ProbFreeze <= 25 THEN "Low risk of freeze" ELSEIF ProbFreeze <= 50 THEN "Medium risk of freeze";

ELSE "High risk of freeze";END TASK;

END ACTIVITY_PATTERN;

Figure3.10:ClassifyParcelsfor Coffeaarabica in Parana

Figure3.10shows thespecializedversionof theactivity patternClassifyParcelsfor Coffeaarabica, accordingto themethodologyof Parana Agricultural Institute(IAPAR) [75], a mem-berof RNA. TheclauseREFINES indicatesthat this patternis onespecializationof thepat-ternClassifyParcels with awiderscopeexpressedby [Country(BR),Cons(RNA)] .Eachparameterdeclaredin thespecializedversionis explicitly relatedto thecorrespondingoneof the generalizedpattern. The notationEnvCond->WDI indicatesthat the parameterWDI

of thespecializedversionis derivedfrom theparameterEnvCond (theexpectedenvironmen-tal conditions)of the generalizedversionof ClassifyParcels . The other input parame-

3.4. ThePOESIAactivity model 57

tersof the specificversionof ClassifyParcels alsoderivesfrom the genericparameterEnvCond . The outputparameterZonCoffee of the specializedversionis a refinementoftheparameterZoning of thegeneralizedactivity pattern.TheTASK DESCRIPTIONclauseoverlayslogical conditionsinvolving the measurementsof the relevant environmentalcondi-tionsfor thecrop.

3.4.5 The combinedrefinementmechanism

Theaggregationandspecializationof activity patternscanbecombinedto defineacomplex ac-tivity patternwhoseconstituentsdependon theutilization scopeto which thecomplex patternis specialized.Thedefinitionof sucha complex activity patternmustconformto boththecon-ditionsof aggregationandtheconditionsof specialization.Figure3.11illustratesa refinementof theactivity patternAssessEnvironmentalConditionsfor C. arabica.

Figure3.11:Combiningspecializationandaggregation

Specializationandaggregationof activity patternsareintertwined. The specializationde-tails theparametersandconstituentsof a patternfor a particularutilization scope,establishingaflat view ataparticularabstractionlevel to expressthecooperationof theconstituentpatterns.Problemsrelatedto parameterpassing– typechecking,parameteruniqueness,anddisambigua-tion – aresolvedby definingparameterscopesjustasin programminglanguages:aparameter’sscopeis local to thespecificationof activity patternwhereit is defined.

Figure 3.12 shows the specializedversionof AssessEnvCond (AssessEnvironmentalConditions). The input parameterClimAttr appearsin both the generalizedand the spe-cializedversion. The LandsInfo parameterof the generalizedversionunfoldsin Relief

andWaterRetSoil in thespecialization.CropInfo unfoldsin CropCoef andWater-

Demands. TheoutputEnvCond of thegeneralizedversionunfoldsin WDI, AvgAT, Prob-

Heat , andProbFreeze . TheLOCALparameterWaterBal is usedto transferdatabetween

3.4. ThePOESIAactivity model 58

ACTIVITY_PATTERNAssessEnvCond [Crop(Coffee).Group(Co ffe a arabica),

Country(BR).Region(CS ), Cons(RNA)]

REFINES AssessEnvCond [Country(BR), Cons(RNA)]INPUTS

ClimAttr: "RNA/clim_info.wsd";LandsInfo->Relief: "RNA/relief.wsd";LandsInfo->WaterRetSoi l: "RNA/water_ret_soil.wsd ";CropInfo->CropCoef: "RNA/coffee_water_coef. wsd" ;CropInfo->WaterDemands : "RNA/coffee_water_dem.w sd";

OUTPUTSEnvCond->WDI: "RNA/wdi.wsd";EnvCond->AvgAT: "RNA/avg_at.wsd";EnvCond->ProbHeat: "RNA/prob_heat.wsd";EnvCond->ProbFreeze: "RNA/prob_freeze.wsd";

LOCALWaterBal: "RNA/water_bal.wsd";

BEGIN TASKCOMPOSITION

EstWaterBal (IN: ClimAttr,WaterRetSoi l,Cr opCoef ;OUT: WaterBal);

AssessWaterDeficit (IN: WaterBal,WaterDemands;OUT: WDI);

AssessAirTemp(IN: ClimAttr; OUT: AvgAT,ProbHeat);AssessFreezeRisk (IN: ClimAttr,Relief; OUT: ProbFreeze);

EXECUTION DEPENDENCIESEstWaterBal PRECEDESAssessWaterDeficit;(AssessWaterDeficit AND AssessAirTemp

AND AssessFreezeRisk)PRECEDESClassifyParcels;

END TASK;END ACTIVITY_PATTERN;

Figure3.12:AssessEnvironmentalConditionsfor Coffeaarabica in Brazil’s Center-South

3.4. ThePOESIAactivity model 59

Figure3.13: Hierarchiesof activity patternsfor determininglandsuitability for Coffeaarabia:(a)decompositionhierarchy;(b) multi-fold hierarchyor processframework

EstWaterBal andAssessWaterDeficit . The binding of theseparametersexpressesthedataflow illustratedin Figure3.1. TheclauseEXECUTION DEPENDENCIESstatesthatEstWaterBal precedesAssessWaterDeficit , andClassifyParcels initiatesafterall theotherconstituentshavefinished.

3.4.6 Processframework

In POESIA,activity patternscanbe definedin termsof otheractivity patternsthroughaggre-gationandspecializationof activity patterns.As a result,ahierarchyof activity patternscanbeformed.Wecall suchahierarchyaprocessframework of therootactivity pattern.Figure3.13ashows a processframework to determineland suitability for Coffea arabica, presentingonlycompositionsof activity patterns.Figure3.13bextendsFigure3.13aby addingthehierarchiesof specializationsof someactivity patternsin the hierarchy. We saythata hierarchylike thatshown in Figure3.13bis multifold becauseeachof its activity patterns(nodes)canhave twokindsof immediatesubordinates:its constituentpatternsandits specializedversions.

Definition 3.4.4A processframework is a directedgraph ¯6°�±³² E¤D ²³´ satisfyingthefollowingconditions:

1. ±³² is thesetof verticesof ¯2.D ² is thesetof edgesof ¯

3.4. ThePOESIAactivity model 60

3. µF¶F·U±³²A¸�¶ is anactivity pattern

4. °¹¶ E ¶�º»´�· D ².¼ ¶�º�½_¾�¿qÀ�ÁÃÂ�ÁÃÄÆÅ�¿�Á*¶ Ç�¶oº�ÀMȳÅ^½_Â�ÉoÊ¥Â�Ë�É�Áæ¾s¿Ì¶

5. ¯ is acyclic

6. ¯ is connected

Definition 3.4.4establishesthe structuralpropertiesof a processframework – a directedgraph ¯6°�±³² E¤D ²Æ´ whosenodesrepresenttheactivity patternsandwhosedirectededgescorre-spondto the aggregationandspecializationrelationshipsamongthesepatterns.Condition4statesthat thereis a directededge °�¶ E ¶�º»´ from vertex ¶ to vertex ¶�º in ¯ if andonly if ¶oº is aconstituentof ¶ or ¶�º is aspecializationof ¶ . Condition5 statesthatnosequenceof aggregationsand/orspecializationsof patternsin ¯ canleadfrom onepatternto itself. Thisrestrictionis nec-essarybecauseaggregationandspecializationcanintermingle. In sucha case,anaggregationmaybreakthegradualnarrowing of the utilization scopesachievedby specialization.Condi-tion 6 guaranteestheconnectivity of theactivity patternsparticipatingin theprocessframework¯ .

Adaptation of a processframework

A processframework capturesthepossibilitiesfor reusingandcomposingWebservicesto buildconsistentprocessesfor differentsituationsin termsof utilization scopes,datadependencies,andexecutiondependenciesamongcomponents.Theadaptationof a processframework for aparticularscopeconsistsin choosing(anddevelopingif necessary)componentsto composeaprocesstailoredfor thatscope.

Definition 3.4.5A processspecification ÍK°?±³Î E¤D Îq´ associatedwith a utilization scopeex-pressedby an ontological coverage Ï is a subgraph of a processframework satisfyingtheproperties:

1. µ�°¹¶ E ¶ º ´�· D Î�¸�¶ º ½_¾�¿qÀ�ÁÃÂ�ÁÃÄÆÅ�¿�Áж

2. µF¶F·U±ÑÎ�¸°_Ò ÓK¶�º³·U±ÑÎÔÀ�Äƽ©ÕKÁÖÕÑÉ�Á§°

¹¶ E ¶�º»´w· D Îq´W× ¶6¦À&É�ÁؾsÙ̦½

3. µF¶F·U±ÑÎ�¸oÏ�ÚI± D�Û °�¶�´�Ü ÝÞÏDefinition 3.4.5statesthata processspecificationÍ is a subgraphof a processframework.

Condition1 statesthat Í is a decompositionhierarchy, i.e., all its edgesrefer to aggregationsof activity patterns.Condition2 statesthatall theleavesof Í areatomicpatterns,otherwiseÍ

3.4. ThePOESIAactivity model 61

would bemissingsomeconstituentsfor its execution.Condition3 ensuresthattheontologicalcoverageof eachpatternparticipatingin Í encompassesthecoverageÏ associatedwith Í , i.e.,the intersectionof the utilization scopesof all the constituentsof Í areequivalentor containtheutilizationscopeof Í .

Refinementandadaptationof processframeworks canalternatein practice. Frameworks,specificprocesses,or individual activity patternscanalwaysbe reusedto producenew or ex-tendedframeworks. Additionally, when adaptinga framework, the developmentof activitypatternsto contemplatespecificneedsalsocontributesto enrich the repertoireof specializedpatternsof a framework.

Processinstantiation

Note that all the elementsof the POESIAmodelpresentedabove areat the conceptuallevel.Thus,after adaptinga processframework to producea processspecificationfor a particularsituation,this processhasto beinstantiatedfor execution.Instantiatinga processspecificationÍ consistsin assigningconcreteWebservicesto handletheinputsandoutputsof eachactivitypatternof Í , allocatingsitesto executethecorrespondingtasksanddesignatingagents(humansor programswith theappropriateabilitiesandroles)to performthem.

The locationof theconcreteresourcesassignedto executea processis independentof thelocationsof their descriptions.Theselectionof theconcreteresourcesto performtheprocessduring its instantiationconfersan extra level of executionindependenceto POESIA. Onceparticularresourceshave beenassigned,the specificformatsand protocolsusedto connectthemcan be defined. This may be doneby using the binding mechanismsof Web servicesspecificationlanguageslikeWSDL [254].

POESIA metamodel

Figure3.14shows the POESIAmetamodel,which is an extensionof the workflow referencemodelof theWfMC [123]. It summarizes,in bold,our extensions:(1) associateanontologicalcoveragewith eachactivity pattern;and(2) associatea resourcedescriptionwith eachport (pa-rameter)of eachactivity pattern.A resourcedescriptionalsoincludesanontologicalcoverageto describeits utilizationscope.This allows theorganizationof a repertoireof activity patternsaccordingto their utilization scopesandhelpsto determinethe servicesfor reusein specificsituationsandtherulesto connectthem.

3.5. Implementationissues 62

Figure3.14:ThePOESIAprocessdefinitionmetamodel

3.5 Implementation issues

A numberof issuesareimportantin the implementationof thePOESIAapproachto Webser-vicescomposition:(1) correctnessof thecompositionsemantics,(2) mechanismsfor compos-ing Web servicesthroughontologyconstructionandontologyreasoning,and(3) an efficientandscalableimplementationarchitecture. In this section,we discusshow POESIA handlestheseissues.

3.5.1 Checkingspecifications

Hierar chy of activity patterns

The aggregationsandspecializationsof activity patternsmust be checked for the propertiesexpressedin definitions3.4.2 and3.4.3. The direct graphscorrespondingto processframe-worksmustbeacyclic andconnectedasstatedin definition3.4.4.Furthermore,theconditionsexpressedin definition3.4.5mustbecheckedwhenadaptinga framework for a particularuti-lization scope.

Figure 3.15 illustratesa processfor zoning C. arabica in Parana State. All the activitypatternsin this structure,startingwith its root, have compatibleontologicalcoverages.Theontologicalcoverageof Agricultural Zoningencompassesthatof CalculateClimateAttributes,

3.5. Implementationissues 63

Figure3.15:ZoningCoffeaarabica in Parana State

DetermineLandSuitability, andsoon. Theactivity patternEstimateWaterBalancehasawidercoverageincludingcoffeeandorange,i.e., thesamepatternfor calculatingthewaterbalanceisusedfor bothcrops.

Executionand data dependencies

Thecollectionof executiondependenciesamongactivity patternscanberepresentedin adepen-dency graph.Figure3.16presentsthedependency graphfor theprocessframework for zoningC. arabica. It shows that the executionof the activity patternConsolidateClimateAttributescanbe initiatedonly aftersuccessfullyfinishingtheexecutionof IntegrateWeatherIndicatorsor Extract WeatherIndicators, which provide data(from weatherstationsor remotesensing,respectively) for updatingtheclimateattributes.WhenConsolidateClimateData hasdoneitswork, EstimateWater Balance, AssessAir Temperature, andAssessFreezingRiskcanexecutein parallel. Theconclusionof EstimateWater Balancetriggerstheexecutionof AssessWaterDeficit. ClassifyParcelscanonly startexecutingafterasuccessfulexecutionof all thepreviousactivities.

A similar dependency graphfor the datadependenciesis inferredfrom the connectionofparametersamidprocessframeworks.Thesetwographsmustbecompatible.Individually, thesegraphsmustbeacyclic andconnected.Propertiesrelative to thestructureandthedynamicsoftheexecutionanddatadependenciesamongactivity patternscanbeevaluatedwith algorithmsbasedonPetriNet formalisms.For example,[235] proposesanalgorithmto translateworkflow

3.5. Implementationissues 64

Figure3.16:Executiondependenciesamongactivity patternsfor zoningCoffeaarabica

graphsinto WF-Nets,a classof Petri Netstailoredto workflow analysis.The verificationofthepropertiesof WF-Netsallows theautomaticdetectionof designerrorsin thecorrespondingworkflow specifications.The absenceof deadlocksin a workflow, for instance,is associatedwith thesoundnesspropertyof thecorrespondingPetriNet. Roughlyspeaking,thesoundnesspropertystatesthatfor every reachablestateof thePetriNet theremustbea sequenceof stepsleadingto thefinal state.

3.5.2 ComposingWebservices:an implementation perspective

A POESIA Web servicecanaccessa collectionof existing Web servicefunctioningasdatasourcesfor its processesandpublish its own processesanddatasetsasWeb services.EachPOESIA-enabledWebsiteorganizesits servicedescription,composition,andinterconnectionapparatusaccordingto therepresentationlayersof theSemanticWeb[80, 215]. In thebottomlayer, XML wrapping,sourcedataareconvertedinto XML, thusproviding a syntaxstandardfor semistructureddatain theextensionallevel. TheXML-relatedstandardsconferversatilityandexpressionpower for representingandinterrelatingdocumentson theInternet.Thesecondlayeris theschemasandprocesseslayer. It usesDTDsor XML schemato representdatasetsattheintentionallevel to factortheproblemsrelatedto dataheterogeneity. POESIAframeworksappearat the top of the secondlayer andprovide specificcriteria basedon utilization scopesto selectservicesand checkthe semanticconsistency of their connections.The third layeris the semanticdescriptionlayer, which describesthe services,at a higherabstractionlevel,usingRDFstatementsandprocessdescriptionstandardslikeDAML-S [61,14]. Theseresourcedescriptionsmustconformto metadatastandardsandvocabularies,includingdomain-specificones.The vocabulary usedin the first, second,andthird layersis definedin the fourth layer,which maintainsa dictionary. The top layersof the SemanticWeb infrastructure– namely,logic, proof,andtrust– arenotcontemplatedat this moment.

POESIAservicesin differentsitescanbelogically arrangedin successiveabstractionlevels.Figure3.17illustratessuchasituation.Theprocessspecificationstoredin serverA is composed

3.5. Implementationissues 65

Figure3.17:Themulti-tier distributedinfrastructurefor compositionof Webservices

of two cooperatingactivity patterns,X and Y. Activity patternX accessesthe Web servicesdescribedby B1 andB2 to take its inputs, processthem, and pushits outputsinto the Webservicedescribedby B3 (considerthatB1, B2, andB3 arepublishedin server B). ThenY takesits datainputsfrom theWebservicesdescribedby C1, C2, C3 (all publishedin C), andB3 togeneratetheoutputspushedin theWebservicedescribedby A2 (maintainedandpublishedbyA itself).

3.5.3 Ar chitecture

Figure3.18presentsthearchitectureof apeer-to-peersitesupportingPOESIAservices,outlin-ing thecommunicationwith externalsitesandservicebrokers.TheServicesSpecificationToolallowsthedomainexpertto build solutionsfor particularneeds.Thistool supportsbrowsingtheresourcesavailablelocally or remotelyin orderto discover componentsto reuse.Thedescrip-tionsandformal specificationsof thelocal servicesarestoredin theLocal Servicesrepository.Oneservicemayencapsulateoneor moredatasets.TheLocal Data repositorymaintainsthedataandmetadataassociatedwith local services.All theconstituentsof a servicespecificationstoredin thesiteareindexedby oneontologyof theLocalOntologiesrepository. TheExternalResourcesLocator providesaccessto the descriptionsof external resources.The Catalog ofExternalResourcesfunctionsasa cachefor the descriptionsof externalresourcesfrequently

3.5. Implementationissues 66

accessed.Eachlocalserviceandontologycanbepublishedandusedby externalWebservices.

Figure3.18:Thearchitectureof aPOESIA-enabledpeerto peerWebsite

TheServicesExecutionEngineinterpretstheservicespecificationsto properlymanagethecorrespondingfragmentsof distributedprocesses.A servicecanbeactivatedlocally or by someexternalconnection.A locally runningservicecanalsoactivateremoteservicesto obtainitsinputs or sendits outputs. The External ConnectionsManager controlsthe communicationwith remotecomponentsandusersat run time. It relieson theExternalResourcesLocator toretrieve thedescriptionsof externalresourceswhenever necessary. The thicker doublearrowsconnectingtheLocalDatarepositorywith theServicesExecutionEngine, andthelatterwith theExternalConnectionsManager, which is linkedto theExternalResourcesGateway, representthe dataexchangebetweena local serviceandremoteresourcesduring the executionof thedistributedprocesses.A POESIAsite alsohastwo kindsof human-computerinterfaces.TheUser Interfaceallows the domainexpertsto specifyandactivateservices;the AdministrationInterfaceservesconfigurationpurposes.

The architectureof a POESIA-enabledWeb site contemplatestwo typesof external re-sources:RemoteSitesandServiceBrokers, thoughit doesnot rule out connectionswith otherkinds of resources.A RemoteSitehasthe internalstructuredescribedfor our POESIAsite.ServiceBrokers arespecialsitesthatcatalogthedescriptionsof the resourcesavailableacrosstheWebto supportthediscoveryandselectionof resources.

3.6. Relatedwork 67

3.6 Relatedwork

TheSemanticWeb[80, 215] intendsto extendthecapabilitiesof thecurrentWebto copewithproblemssuchasfindingpreciseinformationin thevastamountof resourcesavailableandsup-porting interinstitutionalapplicationslike electroniccommerce.Themeansfor achieving thisare: standardsfor expressingmachine-processablemetainformation(e.g.,RDF, DAML+OIL),developmentanddisseminationof terminologiesusing thesestandards(e.g.,domainontolo-gies),andnew toolsandarchitecturesbasedon this apparatusto build applicationsempoweredwith semanticsandautomatedreasoningcapabilities.POESIAreliesontheinfrastructureof theSemanticWebto implementcertaintechniques,basedon domainexpertise,to organize,select,andreusedataandservicesin theWeb.

ThePOESIAapproachto composeWebservicesthroughactivity aggregationandspecial-izationwasinspiredby theneedsof ourapplicationdomainandis foundedby earlierwork donein transactionalactivity modelingby Liu [157, 154], wherea setof mechanismsareproposedandformalizedfor specificationandreuseof activities. Other researchareasdirectly relatedto POESIAaretheuseof metadataandontologiesfor Webservicesdescription,discoveryandcomposition[14, 61, 198, 38, 166, 37], andworkflow techniquesfor scientificprocessesandWebservicecomposition[123,234, 235]. Descriptionsof themeaning,properties,capabilities,andontologicalrelationshipsamongWebservices,expressedin languageslikeDAML services[14,61],supportmechanismsto discover, select,activate,compose,andmonitorWebresources.Relatedwork coversvariousaspects,rangingfrom theoreticalstudiesto implementationefforts,from architectureissuesto conceptualmodels[124,260].

Concretely, Paolucciet al. [198] show that thecapabilitiesof registriessuchasUDDI andlanguageslike WSDL are not enough to support services discovery. They employDAML-S for this purposeand presentan algorithm to matchservicerequestswith the pro-file of advertisedservicesbasedon the minimum distancebetweenconceptsin a taxonomytree. CardosoandSheth[38], on the otherhand,presentmetricsto selectWeb servicesforcomposingprocesses.Thesemetricstake into accountfunctionalandoperationalfeaturessuchasthepurposeof theservices,quality of service(QoS)attributes,andthe resolutionof struc-tural andsemanticconflicts. McIlraith et al. [166] useagentprogrammingto definegenericproceduresinvolving theinteroperationof Webservices.Theseprocedures,expressedin termsof conceptsdefinedwith DAML-S, do not specifyconcreteservicesto performthetasksor theexactway to useavailableservices.Suchproceduresareinstantiatedby applyingdeductioninthecontext of a knowledgebase,which includespropertiesof theagent,its user, andtheWebservices.Finally, Bussleret al. [37] sketchanarchitecturefor WebservicesattainingSemanticWebaspirations.

The groundingof Web servicesinvolvesseveral abstractionlayersbetweenthe semanticspecificationandtheimplementation[221]. Currentlythereis a myriadof proposalsfor speci-

3.6. Relatedwork 68

fying Webservicescompositionin intermediatelayers,suchasWSFL (IBM), BPML (BPMI),XLANG (Microsoft), BPEL4WS(BEA, IBM, Microsoft), WSCI (BEL, Intalio, SAP, Sun),XPDL (WfMC), EDOC(OMG), andUML 2.0 (OMG). Theseproposalsconcernthesynchro-nizationof the executionof Web servicesin processesrunningacrossenterpriseboundaries[234, 20]. They build on top of standardslike XML, SOAP, WSDL, and UDDI, providingfacilities to interoperateandsynchronizethe executionof Web servicesthat canusedifferentdataformats(e.g.,heterogeneousXML schemas)andcommunicationprotocols(HTTP, XMTP,etc.).Somechallengesfor thesetechnologiesareto (i) reducetheamountof low-levelprogram-ming necessaryfor the interconnectionof Web services(e.g.,throughdeclarative languages),(ii) provideflexibility to establishinteractionsamonggrowing numbersof continuouslychang-ing Webservicesduringruntime,and(iii) devisemechanismsfor thedecentralizedandscalablecontrolof cooperativeprocessesrunningon theWeb.

To illustrate the differencesbetweenour approachand Web servicesynchronizinglan-guages,let us considertwo of them: WSFL andBPML. The Web ServicesFlow Language(WSFL) [255] is anXML languagefor thedescriptionof Webservicescompositions.WSFLconsiderstwo typesof Webservicescompositions.Flow modelsspecifytheappropriateusagepatternof a collectionof Web servicesandhow to choreographthe functionality providedbya collectionof Webservicesto achieve a particularbusinessneed.Global modelsspecifytheinteractionpatternof acollectionof Webservices,describinghow componentsof a setof Webservicesinteractwith eachother. POESIAcanbe seenasa value-addedmethodwith an em-phasisonusingdomain-specificontologiesto guideandfacilitatetheinteractionamongasetofWebservicesin termsof serviceutilizationscopes.

The BusinessProcessModeling Language(BPML) is specializedin supportingcontrolflows of businessprocesspatterns. BPML and POESIA sharethe sameobjectives of sup-porting Web servicecomposition.The main differences,however, lie in the mechanismsandmethodologyusedin theunderlyingframework. BPML promotestheuseof controlconstructssuchasmerge,split, multimerge,exclusivechoice,andsoforth to facilitatethecompositionofservices,whereasPOESIAcombinesthecontrollogic with domain-specificontologies,with anemphasison complex compositionsemanticsat boththedatalevel andworkflow activity level.

In summary, to thebestof our knowledge,currentproposalsfocusmainly on businesspro-cesses;thereis a lack of researchon supportingsemanticconsistency for Webservicesrefine-mentandreuse.ThePOESIAapproachcontemplatesthedemandsof somescientificapplica-tions. Furthermore,it addressesthe semanticconsistency issueby usingdomainontologies.POESIAcomplementsthe currenttechnologiesfor Web servicesdescription,discovery, andcomposition(includingapproachesbasedon ontologiesfor describingservices,likeDAML-S)in two ways.First, it providesmechanismsto selectWebservicesaccordingto their utilizationscopes(e.g.,servicesintendedfor particularregionsandclassesof products).Second,it enablesautomatedmeansto checkif compositionsof Webservicesaresemanticallycorrectwith respect

3.7. Conclusions 69

to thesescopes(e.g.,to determineif a Web servicefor estimatingthe waterbalanceof landscoveredwith bushescanbeproperlyincorporatedin a processto determinelandsuitability forcoffee).

3.7 Conclusions

Many scientificapplications,includingagroenvironmentalapplicationssuchasagriculturalzon-ing, arebuilt by composingheterogeneousdatasourcesandservices.Largedatasetsareorga-nizedaccordingto time andspacedimensions,e.g.,climatedatarely on time seriesof weatherdataandexpectedwatercontentin soil is measuredin spatialterms. Well-definedmetadatapreciselydescribingthemeaningof thesedatasetsarerequiredfor their correctcomposition.Agricultural zoningis anapplicationbuilt on scientificmodels(e.g.,thematchingof weatherdatawith theplantmodelof growth andwaterrequirementsover time) andhasvery high eco-nomic impact. For example,governmentagenciesandfinancial institutionsuseagriculturalzoningto make decisionson policiesandloanapprovalsfor farmersthatwantto plantspecificcrops.

In this paper, we introducedthePOESIAapproachto supportthe systematiccompositionof Web services.It is foundedon domainontologiesin which the propertiesof the semanticrelationshipsbetweentermsinducea partial orderamongthe termsfor eachdimensionof areality (e.g.,space,time, product). Currentontologyengineeringtools, suchasProtege andOntoEdit,canhelpto developsuchontologies.Usingtuplesof termsfrom theseontologiestoexpressandcorrelatethe utilization scopesof dataandservices,the POESIAactivity modeldefinesactivity patternsthatspecifytheWebservicecompositionandcommunicationchannelsthatlink theseservicestogether.

POESIAcomplementscurrentproposalsfor Webservicesdescription,selection,andcom-positionby usingdomainontologiesto (i) conceptuallyorganizevastcollectionsof services,(ii) uncover andselectdataandservicesaccordingto their utilization scopes,and(iii) checksemanticandstructuralconsistency propertiesof compositionsof Webservices.We illustratedthe POESIAapproachthrougha real applicationscenario:the agriculturalzoningof Coffeaarabica in theCenter-Southregionof Brazil.

On top of this foundation,we areinvestigatingfurtherextensionsof POESIA. Knowledgemanagementandkeepingtrackof dataprovenancein distributedprocessescanbemoreeasilysupportedwhenWebservicesarebuilt from well-definedontologiesandthroughwell-definedoperationsbasedon activity patterncomposition. Precisedocumentationof dataprovenancewill be useful in the evaluationof the quality andsuitability of resultsfor many applications.A richersetof semanticrelationshipscanalsobeconsideredto enhancePOESIAcapabilitiesfor expressingandmanagingthe utilization scopesof dataandservices.Anotherconcernisaspectsof the synchronizationof Web services.Theseissuesarebeingconsideredby several

3.7. Conclusions 70

Webservicessynchronizationlanguages(e.g.,WSFL,BPEL4WS,XPDL). POESIA’s strengthis in handlingsemanticaspectsof Webservicescompositionusingdomainontologies.We areinvestigatingextensionsto its activity modelto incorporatesynchronizationmechanismsusinganexisting proposal.On theonehand,our researchwill continueto beguidedby real-worldapplicationssuchasagriculturalzoning. On the otherhand,the generalityandabstractionofPOESIAmakesit usefulto many next-generationWebservice-basedapplications.

Acknowledgments

Thefirst authoris partiallysupportedby Embrapa,CAPES,andtheFinep/Pronex/IC/SAI95/97project.Theauthorsfrom Georgia Techarepartially supportedby two grantsfrom theOperat-ing SystemsandITR programs(CISE/CCRdivision) of NSF, by a contractfrom theSciDACprogramof DoE,andby a contractfrom thePCESprogram(IXO) of DARPA. All agriculturedatausedin thispaperwereprovidedby Brazilianexperts.Thanksto theanonymousreviewerswhocontributedto improvethis work.

Chapter 4

UsingDomain Ontologiesto Help TrackData Provenance

4.1 Intr oduction

Data provenance(alsocalleddatagenealogy or pedigree) is the descriptionof the origins ofa pieceof dataandthe processby which it wasproduced[33]. This problemhasbeenstud-ied in a varietyof settings,rangingfrom cooperative processeswith dataexchangein severalformats,to chainsof views over relationaldatabasesfor loadingdatawarehouses.The solu-tions proposedin the literatureusually involve somekind of annotationor the “inversion” ofthefunctions/queriesusedto transformdata.

The Internetposesnew challengesfor provenancetracking. The autonomyof the com-ponentsand the multi-institutional natureof Web applicationsresultsin a profusionof datacontents,demandingself-describingdatasets.Traditionalapproachesfor trackingdataprove-nance,relyingondetaileddescriptionsandtight controlof thedatatransformationflow, cannotbe easilyadaptedto the Web. Detailedinformationaboutdistributeddataprocessingon theWeb, suchas the queries/functionsusedto transformand move dataacrosssites,are oftenunavailable. A bettersolutionin this context is to build a generalframework for provenancetracking,including detailedanalysisof specificportionswhennecessaryandempathizingthesemanticsof dataandprocesses.

POESIA(Chapter3) (Processesfor Open-EndedSystemsfor InformationAnalysis)is anapproachfor multi-stepintegrationof semistructureddatain anopenanddistributedenviron-ment. Inspiredby the needsof scientificapplicationssuchasagriculturalplanning,POESIAcombinesontologies,workflowsandactivity modelsto providenovel facilitiesfor dataintegra-tion usingcooperativeservices.Thisapproachpursuesthevisionof theSemanticWeb[22, 215]andoffers someconcretesolutionsfor dataintegration,servicecompositionandprovenancetrackingon theWeb.

71

4.2. MotivatingExample 72

This paperfocuseson the POESIAontologicalapproachfor estimatingdataprovenance.Domainontologiesdepictthesemanticrelationshipsamongterms,groupedaccordingto differ-entdimensionsof onereality(e.g.,space,timeandproduct).Tuplesof terms,calledontologicalcoverages, expressthescopesof datasetsandgranularitiesof datavaluesin severaldimensions(e.g.,thespatialextents,periodsof timeandproductsthatadatasetor valuerefersto). These-manticrelationshipsbetweentermsinducesa partialorderamongontologicalcoverages.Thisorderis usedto correlatescopesandgranularitiesof data,enablinganestimationof dataprove-nance.Themajorcontributionof this paperis a framework for trackingdataprovenance,usingontologiesto expressdatacontentsandtheeffectof chainsof dataintegrationoperationsondatasets.This framework canachieve efficient andfine grainprovenancetrackingwith negligiblemaintenancecost.

Theremainderof thispaperis organizedasfollows. Section4.2presentsanagriculturalap-plicationusedasarunningexamplethroughoutthepaper. Section4.3outlinesthefundamentalsof POESIAontologiesneededfor provenancetracking. Section4.4 describesthe ontologicalmethodfor trackingtheprovenanceof aggregatedvalues.Section4.5 analyzestypical opera-torsfor dataintegrationandtheuseof ontologiesfor dataintegrationandprovenancetracking,from ageneralperspective. Section4.6discussesrelatedwork. Finally, section4.7summarizescontributionsandextensions.

4.2 Moti vating Example

Theprobleminvestigatedhereis thefollowing: givena dataitem, whatweretheoriginal dataitemsandthe chainof dataprocessingstepsthatproducedit? Let us examinea real life sce-narioconcerningdataintegrationin agriculturalapplications.Figure4.1(a)illustratesthecon-solidationof weatherdatathroughahierarchyof intraandinter-institutionalrepositories.Eachinstitutionhasasetof weatherstations(datacollectingdevices),scatteredacrossits operationalarea,to collectmeasurementssuchasmaximum,minimumandaveragetemperatureandtotalrainfall per hour. Thesedataaremaintainedin the repositoriesof the institutionsthat collectthem.Thespatialandtemporalscopesof theinstitutionaldatasets(i.e.,thelandparcelsandpe-riodsof time they cover) canoverlap.For example,institution I1 operatesin a limited region,while institutionI2 hasawiderspatialscope.InstitutionI3 encompassesunitsI3a andI3b .Thedatawarehouseof consortiumC1 consolidatesdatafrom I1 andI2 , C2 from I2 andI3 ,andC3 from C1 andC2. Thisprocessingschemeproducesdatasetswith successively broaderscopesanddensersampling.Thedatagranularityin theupperlevelscanbeeitherthesameorcoarserthanthegranularityof thesourcedata(e.g.,from anhourly to a daily basis).Thedataat the lower levels tendto be moredetailedandprecise(but not necessarilyaccurate),whilethe dataat the higher levelsusuallyconvey moreabstraction,sincethey refer to increasinglybroaderscopes.Typicaloperationsto producesuchaggregationsof thesourcedatacanbeseen

4.2. MotivatingExample 73

Figure4.1: Integratingdatasetsin many steps

asvariationsof thebasicdatacubeoperationssuchasslide,dice,roll-up, anddrill down.Figure4.1(b)givesa generalview of the step-wiseintegrationof weatherdata. First, the

raw datacollectedby theweatherstationsof eachinstitutionaregathered,reviewedandstoredastemporalseries.Then,aggregationof historicaldatafrom eachweatherstationgeneratestheclimateattributesfor that particularpoint on the earthsurface(e.g.,averagetemperatureandrainfall per month). Datawarehouses(suchasthosein C1, C2 andC3) offer unified accessto climateattributesoriginatedfrom severalsources,with aggregationandinterpolationfacili-ties for recoveringconsolidateddata– typically OLAP to selectandaggregatedataover timeandspace,andinterpolationsto producemapswith estimationsof the distribution of climatemeasurementsacrossthe lands. Finally, applicationssuchasagriculturalzoning(Chapter3)integrateandfusedatatakenfrom thesewarehouses,amongothersources,to derive otherrel-evantinformation.Most of theseapplicationsneedto understandnot only thesemanticsof thedataused,but alsotheirprovenance.

Figure4.2 shows the starschemaof the datawarehousesusedin casestudiesthroughoutthispaper. TheClimatedatawarehousehasadatatablewith thevaluesof maximum,minimumandaveragetemperatureandtotal rainfall, organizedby thedimensionsof territorial divisions,time,productsandorganizations.TheCropsproductionwarehousemaintainstheplantedarea,production,unit andmonetaryvalue,for eachcounty, monthandcrop produced.Notice thesimilarities betweenthe respective dimensionsof thesewarehouses.The following sectionsshow how to representthesedimensionsin anontologyandtheuseof suchanontologyto helptrackdataprovenance.

4.3. POESIAOntologiesandOntologicalCoverages 74

Figure4.2: Agricultural datawarehouses:(a)climateattributes;(b) cropsproduction

4.3 POESIA Ontologiesand Ontological Coverages

Figure4.3 shows thespacedimensiondescribedin a POESIAontology. Thedirectedacyclicgraph in the left, called an arrangementof concepts, formalizesthe semanticrelationshipsamongtheterritorialsubdivisionconcepts.TheedgesrepresentingPART OFrelationshipshaveablackcirclecloseto thespecificconcept,andtheedgesrepresentingIS A relationshipshaveadiamondcloseto thecomponentconcept.ThisgraphdenotesthataCountry is composedof asetof States or, alternatively, asetof Country Regions . A Country Region maybea Macro Region , anOfficial Region or anotherkind of region. Macro andOffi-

cial Regions arecomposedof States , but a region of typeMetro Area is composedof Counties . Eco Region andMacro Basin defineotherpartitionsof space,basedonecologicalandhydrologicalissues,respectively. Thearrangementof conceptsprovidesa gen-eral framework, being instantiatedby arrangementsof terms. The middle part of figure 4.3illustratesasubgraphof thearrangementof territorialsubdivisionconcepts.An arrangementoftermsinstantiatedfrom theseconceptsis representedby thedirectedacyclic graph(in this casea hierarchy)on theright side. TherearealsoSYNONYM relationshipsnot representedin thefiguredueto spacelimitations(e.g.,BRcanbeusedasasynonym to Brazil ). An instantiatedterm needto be qualifiedwith thecorrespondingconcept,in orderto avoid ambiguity. Thus,State(RJ) refersto thestateRio deJaneiro,while County(RJ) refersto thecountyof thesamename.

Similarstructuresdescribeconceptsandinstantiatedor instancializedtermsfor otherdimen-sions(suchastime andproducts).Thearrangementsof conceptsandtermsfor all therelevantdimensionsconstitutesa POESIAontology. A tupleof termsfrom a POESIAontology, calledanontological coverage, candescribethescopeof adatasetor thegranularityof anaggregatedvalue. For example,[State(RJ),Crop(orange),Year(20 02)] restrictsthescopetothe intersectionof the spatial,crop and temporalscopesdefinedby the termsState(RJ) ,

4.4. OntologicalEstimationof DataProvenance 75

Figure4.3: Thespacedimensionof aPOESIAontology:(left) arrangementof concepts;(mid-dleandright) a compatiblearrangementof terms

Crop(orange) andYear(2002) in a multidimensionalspace.The ontologicalcoverage[State(RJ),State(SP)] , on theotherhand,denotestheunionof thespatialscopesex-pressedby the two terms,becauseboth refer to the samedimension.To narrow the scopeina particulardimensiononehasto choosea morespecificterm (e.g.,go from State(SP) toCounty(Ubatuba) ).

Thesemanticrelationshipsrepresentedin POESIAontologiesinduceapartialorderamongontologicalcoveragesthat we call semanticencompassing: e.g.,countryBrazil encompassesstateRio deJaneiro,denotedby [Country(Brazil)] Ü Ý [State(RJ)] . Furthermore,[State(RJ)] Ü Ý [State(RJ),Year(2002)] and[State(RJ),State(SP)] Ü Ý[State(SP)] . Two ontologicalcoveragesareequivalentif they referto thesamescope(e.g.,[Country(BR)] ß [Country(Brazil)] . A datasetor itemcanbeassociatedwith anontologicalcoverageexpressingits scopeandanotheroneexpressingtheminimumamongthegranularitiesof its components.The scopeof a datasetor item mustencompassthe scopesof its componentsandits minimal granularity. The scopeof a datavalueis equivalentto itsgranularity. We canshow, for a limited setof semanticrelationshipsbetweenterms,that theencompassingrelationshipis reflexive and transitive. A more formal treatmentof POESIAontologies,with demonstrationsof their properties,canbefoundin Annex I.

4.4 Ontological Estimation of Data Provenance

Let usconsidertheunionof datasetsin datawarehouses.Theontologicalcoveragesdescribedin theprevioussectioncanexpressthescopeof thedatasourcesandof theresultingdatasets.Figure4.4(a)illustratesthedataflow for the consolidationof crop productiondata,involving

4.4. OntologicalEstimationof DataProvenance 76

Figure4.4: Theuseof POESIAontologies:(a) scopesof cooperatingservices;(b) thelevel ofgranularityof anaggregatedvalue

cooperatinginstitutionsandconsortia.Thescopesof thedatarepositoriesaredescribedby theontologicalcoveragesattachedto thenodes.For instance,institution I1 maintainsdataaboutthe productionof grainsin the center-southregion of Brazil during the year2002,while I2

is concernedwith the productionof fruits in the whole of Brazil during the sameyear. Theinformationflow, indicatedby thearrows,showsfor examplethatthedatasetof consortiumC1

consolidatesdatafrom I1 andI2 , in ascopeencompassingthoseof its sources:theproductionof food in Brazil during2002.

Theprovenanceof anaggregatedvaluein a nodecanbeestimatedby analyzingthescopesof thedatasourcesof thenode.Thepotentialsources,for eachdimension,arethosewhoseon-tologicalcoverageoverlaps(encompassesor is encompassedby) thecoverageof theaggregatedvaluein thatdimension.For example,considertheaverageproductionof orangein SaoPauloStateduring2002.Figure4.4(b)showshow theontologicalcoverageexpressesthegranularityof theaggregatedvalue,by indicatingspecifictermsin differentdimensionsof a POESIAon-tology. Eachtermwhosesemanticsoverlapstheontologicalcoverageof theaggregatedvalueis surroundedby a rectangle.

Figure4.5illustratestheidentificationof thepotentialdatasourcesfor differentdimensions.It showsthearrangementsof conceptsfor thespaceandproductdimensions,with pointersasso-ciatingthedatasourcesto thetermsusedto expresstheirscopes(e.g.,C3 is associatedwith BR

becauseits ontologicalcoveragerefersto Country(BR) ). Then,provenancetrackingin onedimensionreducesto collectingthe sourcesassociatedwith all theancestorsanddescendantsof the termsexpressingthecoverageof theaggregatedvaluein thatdimension.Figure4.5(a)highlightsthepotentialsourcesin thespacedimension.For instance,sourcesC3, C1, C2 andI2 are candidatesbecausetheir ontologicalcoveragesrefer to Country(BR) and Coun-

try(BR) Ü Ý State(SP) . I4b is alsoa potentialsourcebecauseits ontologicalcoverage

4.5. OntologicalNetsfor DataIntegration 77

Figure4.5: Potentialdatasourcesin differentdimensions:(a) space;(b) product

refersdirectly to State(SP) . If therewereothersourcesassociatedwith thedescendantsofState(SP) they shouldalsobetakeninto account.I3 andI4a arenot potentialsourcesbe-causethey refer to nodesoutsideof theclosureof ancestorsanddescendantsof State(SP) .Figure4.5(b)shows thesamemethodappliedto theproductdimension.A similar analysiscanbedonefor thetimedimension.

The potentialsourcesfor an aggregatedvalue are thosefiguring as candidatesin all di-mensionscontributing to its ontologicalcoverage. Figure4.6(a)illustratesthe conclusionofthe ontologicalestimationof the dataprovenance.The tableon the left sideshows that onlyC1, C3 and I2 figure as potential sourcesin all dimensions. Figure 4.6(b) highlights therelevant flow for the aggregatedvalue considered.The granularityof that value, expressedby [State(SP),Crop(orange),Year(20 02)] , canbeusedto selectthespecificdataitemswhich may have beenusedto calculatetheaggregation. This methodgivesonly an es-timationof thedataprovenancebecausetheoverlappingof thescopesof thedatasourcescanleadto alternativepathsfor supplyingaparticulardatavalue.

4.5 Ontological Netsfor Data Integration

An ontological net for data integration is an infra-structurefor consolidatingandfusing datathroughdistributedcooperativeprocesses,wherethedescription,discoveryandcompositionofdatasetsandservicesarebasedon domainontologies.In orderto betterexplain this concept,let usanalyzethebasicoperatorsfor dataintegrationin cooperative geographicalapplicationsandtherole of domainontologiesin this context, from ahigherlevel perspective.

4.5. OntologicalNetsfor DataIntegration 78

Figure4.6: Appraisingdataprovenance:(a)contrastingdimensions;(b) estimateddataflow

4.5.1 Data Integration Operators

The POESIAapproachclassifiesthe operatorstypically usedfor integratingdatain coopera-tive geographicalapplicationsin threecategories: combinationof datasets,filtering dataandtransformingdatavalues. Figure4.7 presentssomeexamplesof the operatorsfor combiningdatasets.Theunion operatorcollectsdataitemsfrom two datasourcesinto a compositedataset,whoseschemamatchesthoseof the sources.In figure 4.7(a),dataaboutthe productionof fruits in Brazil during2002is unitedwith anotherdatasetabouttheproductionof fruits inthe Center-Southregion of the countrybetween1997and2001,generatinga datasetwhichcoverstheproductionof fruits in Brazil from 1997to 2002. Themerge operatorrelaxesthesemanticsof theunionoperatorby allowing slightly differentsemi-structureddatasourcesanduserinterventionto solve conflicts. Figure4.7(b)shows anexampleof merging two heteroge-nousdatasets,into a semi-structureddataset,whoseschemais a compositionof the sourceschemas.Theunion andmerge operatorsproducedatasetswhosescopeencompassesthoseof thedatasources.Theresultmaycontaindatawith thegranularitiespresentin bothsources.Additionally, POESIAontologieshelpto identify conflictson merging datasetsin theabsenceof a commonkey. Data items from differentsources,but with equivalentutilization scopesarecalledsemanticallyidentifiablematches. Thesematchesareconvertedinto oneitem in thetarget,usingheuristicsand,if necessary, userinterventionto solveconflicts.For example,onecandetectdiscrepantvaluesbetweendataitems(from differentsources)referringto thesameproduct,at thesameplaceandtime, by looking for equivalenceof their ontologicalcoveragesin all thesedimensions.The heuristicsto choosethemostaccuratevalueamongthe matchescanbe,for example,usingthevaluecomingfrom thedatasourcewith betterreputationor thevaluethatfits betterin thetypicaldistribution for thatvalue.

The intersection operatoremploys heuristicsto producedataitemsin the target for

4.5. OntologicalNetsfor DataIntegration 79

Figure4.7: Combiningdatasets

4.5. OntologicalNetsfor DataIntegration 80

Figure4.8: DataFiltering

eachpair of matchingitems from the two datasources. The schemaof the target can bethe union or the intersectionof the sourceschemas,dependingon the matchingdataitems.Figure 4.7(c) shows the intersectionof two heterogeneousdatasetsaboutcrop production.Thematching operatoris similar to the intersection,but allows userinterventionto analyzematchesanddefinethetargetschema.For example,onecanidentify thatTotal rainfall

in Source 1 matchesPrecipitation in Source 2, definethecorrespondingtargetat-tribute andchoosethe datavaluesto put in the target. Figure4.7(d) shows the matchingoftwo heterogenoussourcesof weatherdata.For intersection andmatching , thescopeofthetargetdatasetis theintersectionof thoseof thedatasources,andtheminimumgranularityprovidedby thetargetis themaximumamongtheminimumgranularitiesof thesources.

Thedifference andthesubtraction operatorsreturnthedataitemsof thefirst datasourcewhich do not havea matchin thesecondsource.Theresultingschemaderivesfrom theschemaof thefirst datasource.Thedifferencebetweentheseoperatorsis thatsubtraction

allows heterogeneousschemasanduserintervention. Figures4.7(e)and4.7(f) illustrate theapplicationof theseoperatorsto climatedatasets.For bothoperatorsthescopeandthemini-mumgranularityof thetargetis givenby subtractingthescopeandminimumgranularityof thesecondsourcefrom thoseof thefirst one.

Figure4.8 illustratestheoperatorsfor filtering datasets:projection andselection .Theseoperatorskeepthe semanticsof the correspondingrelationaloperators,i.e., projectingattributesor selectingdataitemsaccordingto somepredicate,respectively. Projectionpreservesthe scopeof the sourcein the target (figure 4.8(a)),while selectionmay not. If the selectingpredicatestipulatesfiltering on a term definedin a POESIAontology, the restrictedscopeofthetargetcanbedeterminedby thatterm(figure4.8(b)).However, it is not straightforwardforfiltering on valuesof thedatatable(figure4.8(c)).

Figure4.9presentstheoperatorsthattransformdatavalues.Theaggregation calculates

4.5. OntologicalNetsfor DataIntegration 81

Figure4.9: Transformingvalues

coarsegrainmeasurementsfrom datain finnergranularities.Figure4.9(a)illustratestheaggre-gationof crop productiondatafor eachmonthandcountyinto the respective valuesfor eachyearandstate.The interpolation estimatesthecontinuousdistribution of measurementsfrom discretesamples.Figure4.9(b)illustratestheinterpolationof averagerainfall samplestoproducea mapexpressingthe distribution of this measurementacrossthe lands. The con-

version employs userdefinedfunctionsto convert data(e.g., from onemeasurementunitinto another).Figure4.9(c) illustratestheconversionof rainfall measurementsfrom inchestomillimetersandmeasurementsof averagetemperature,for thesamescope,from FahrenheittoCelsiusdegrees.Finally, the fusion operatorcombinesvaluesfrom differentdatasources,whoserespectivescopesmatcheachother, into anothermeaningfulmeasurement,accordingtouserdefinedfunctions.Figure4.9(d)illustratesthesynthesisof thefreezingrisk from themin-imumtemperatureandaltitude.All theseoperatorspreserve thescopesof thedatasets,thoughonly aggregation andinterpolation impactthedatagranularity.

4.5.2 Data Reconcilingthr oughArticulation of Ontologies

POESIAontologieshelptheintegrationprocesswith respectto datascopesandgranularitiesasdiscussedin section4.5.1. Generalandapplicationontologieshelp to investigatethesemanticcorrespondencesamongheterogeneousdataitemsandindex librariesof dataconversionfunc-tions. Somedecisionsmadewhenintegratingdatamustbe annotated,in orderto explain therelevantdetailsof dataprovenancethatcannotbecapturedby ontologicalcoveragesalone.

Let usconsidertheintegrationof two heterogenousdatasetsof weathermeasurementsfromdistinct institutions,in a particularportionof a cooperative process.Theschemafor thesemi-structureddataof eachdatasetcanberepresentedasadirectedgraph(e.g.XML). ThePOESIAapproachenrichesthesegraphswith metadatadescribingthedataelements,andusesontologies

4.5. OntologicalNetsfor DataIntegration 82

to expressthe propertiesof theseelementsandinterrelatethem. Theseenrichedschemasarethemselvesspecificontologies. Thus,ontologiesarticulation[183] canbe usedasa basistointegratedatasources.Figure4.10 illustratesthis approach.The two graphsat thebottomofthefiguredescribethe datasources,thegraphat the top representsthe target datasetandthedottedanddashedlinks betweennodesof thesegraphsrepresentthearticulationrules,i.e., thedataflows from thesourcesto thetarget.Thesearticulationsshow, for example,thatthevaluesof latitudeand longitudefrom the sourcein the left-bottomcornerof the figure, representedin degrees,minutesandsecondsmustbeconvertedinto degreesanddecimalsof degreesto beinsertedinto thetarget.

Figure4.10:Reconcilingheterogeneousdatasetsby ontologiesarticulation

4.5.3 SemanticWorkflows

Semanticworkflowsarecooperative processrunningon ontologicalnets.Theseprocessesem-ploy dataintegrationoperators,accordingto ontologiesarticulations.POESIAontologiescon-tributeto renderageneralview of whatis goingonin theseworkflows,by expressingthescopesandgranularitiesof thedatainvolved. Figure4.11(a)illustratestheintegrationof weatherdatafrom differentinstitutions.Eachserviceis characterizedby its scopeandtheminimumgranu-larity it supportsfor datarecovery. For example,theINMET (NationalInstituteof Meteorology)collectedweatherdatasamplesacrossBrazil in theperiodbetween1931and2002. Themini-mumtime granularityfor thedatasuppliedby INMET is month.Theultimaterecipientof datain this cooperative processis the RNA Warehouse (National AgrometeorologyNetwork),which canprovide weatherandclimatedataaboutvirtually any placein Brazil. Thetemporalscopeof theweatherdatasuppliedby theRNA Warehouse is 1892to 2002andtheminimum

4.6. RelatedWork 83

granularitysupportedis day. Thegranularityfor eachdataitem dependson thesourcesof thatitem. Figure4.11(b)illustratestherole of theRNA Warehouse on supplyingclimatedatatodeterminelandsuitability for differentcrops. Thescopeof thesub-processesfor determininglandsuitability for coffeeandricemustbecompatiblewith thecoveragesof therespectivesub-setsof climateattributesrecoveredfrom theRNA Warehouse (see(Chapter3) for details).

Figure4.11: Ontologiesasa framework for estimatingdataprovenance:(a) scopesandmini-mumgranularitiesof cooperatingservices;(b) theuseof the integrateddataby differentpro-cesses

4.6 RelatedWork

The traditional solutionsfor tracking dataprovenance,someof which considergeneraldataformatsandprocessing,employ metadatato annotatetheprocessinghistory[146, 31, 149, 23].However, thesesolutionsdo not scalewell to large datasets,long processingflows andfinegrainedprovenance.Many otherstudieson dataprovenanceare limited to views definedbyquery operationson databases,calling this restrictedproblem lineage tracing. Woodruff etal. [251] introducetheconceptof inversequery, which mapsanoutputto thedataitemsusedto producethatoutput. They definetheclassof functionsadmittinginversionandtheconceptof weakinversion to estimatethe lineagefor a wider classof functions.However, they do notshow how to determinetheinversequeries,but expectthedatatransformationdefinerto providethem.

Cui et al. [55, 57] definethe lineageof theresultof a relationaldatabasequeryasthemini-malsetof tuplesnecessaryto producethatresult.They presentanalgorithmfor tracinglineageover chainsof aggregate-select-project-joi n views. Their approachis basedontheinversionof theview definitionandrequiresmaterializationsof original relationsandinter-mediateviews. [56] generalizestheirpreviousresultsfor graphsof generaltransformationsused

4.7. Conclusions 84

for loadingdatawarehouses.Nevertheless,their methodsarebuilt uponsomeconstraintsandspecificinformationaboutthesourcesandtransformationsemployed,andrequireconsiderablestoragefor intermediateresults.

Buneman[33] distinguishesbetweenwhyprovenanceandwhere provenance. The formerrefersto the dataitems which have someinfluenceon the result (e.g., which determinethelogical value of a predicateusedto selecttuples). The latter refersto the items effectivelyusedto synthesizethe result (e.g.,multiple valuessummedup to obtainan aggregatedvaluelike average). He providesa framework to track both kinds of dataprovenancefor specificclassesof select-project-join-union queriesin a datamodelgeneralizingrelationalandhierarchicaldatarepresentationssuchasXML. Galhardaset al. [95] presentsomedatalineagefacilitiescoupledto a datacleansingschemebasedin a graphof transformationswithexceptionsmanagementto supportthe refinementof the cleaningcriteria. Fan [77] providesalgorithmsto tracedatalineagein automaticallyreversiblesequencesof schemaconversions,employing thehyper-graphbasedhigh level datamodelandthe functionalquerylanguageoftheAutomedsystem.

Therefore,currentapproacheseithersupportjustcoarsegrainprovenancetrackingor rely ondetaileddescriptionsof thedatasourcesandthedatatransformationsapplied(e.g.,schemasandqueryexpressions),makingthemunfit in many situationsfor cooperativesystemsovertheWeb.Furthermore,theseapproacheslack abstractionmechanismsto enablea generalunderstandingand exploration of the information flow. To the bestof our knowledge,domainontologies[233, 110,182] hasnot beenyet exploitedasa framework for trackingdataprovenance.Thispaperhasshown thatsuchasolutioncaneliminatesomeof theseshortcomings.

4.7 Conclusions

Dataprovenancetrackingis becomingincreasinglyimportantasmoreon-linedatasourcesbe-comeavailable.Thispaperhasshownhow domainontologiesareusedin POESIAasabasisfortrackingdataprovenancein cooperativeprocessesinvolving dataintegration.POESIAemploystuplesof domainspecifictermsdefinedin multidimensionalontologiesto correlatethe scopeandgranularitiesof thetargetdatawith thoseof thedatasources,enablingtheestimationof thedataprovenance.Additionally, POESIAontologieshelp to semanticallyidentify matchesonheterogeneousdatasources,i.e.,dataitemsfrom differentsourcesreferringto thesamescope.It helpsto detectandsolveconflictsamongheterogeneousdatasources,andallow trackingthedatatransformationflow acrosschainsof dataintegrationoperators.

Thebenefitsof this ontologicalmethodfor estimatingdataprovenanceare(1) a frameworkfor understandingdataprovenancebasedondomainspecificconcepts;(2) supportfor finegrainprovenancetracking;(3) precisionandconcisenessfor expressingthescopesandgranularities;(4) coupling with a generalapproachfor dataintegration and servicescomposition;(5) the

4.7. Conclusions 85

cost for maintainingthe infra-structurefor provenancetracking is sharedwith facilities forcataloging,discoveringandintegratingdataandservices.

This researchis focusedon the conceptualdefinition andformalizationof the ontologicalapproachfor multi-stepdataintegrationandprovenancetracking. Ongoingwork includestheimplementationof prototypesto validatethe POESIAapproachfor scientificapplicationsinagriculture,andconjugatingtheontologicalschemewith othermethodsfor provenancetrack-ing.

Acknowledgments

The authorsfrom CampinasUniversity arepartially supportedby Embrapa,CAPESand theFinep/Pronex/IC/SAI95/97project. Theauthorsfrom Georgia Techarepartially supportedbytwo grantsfrom the OperatingSystemsandITR programs(CISE/CCRdivision) of NSF, bya contractfrom the SciDAC programof DoE, anda contractfrom the PCESprogram(IXO)of DARPA. The applicationscenariosanddatausedin this work wereprovided by Brazilianexpertsin agriculture.Specialthanksto DanielAndradeandFlavio Silva from theIC-UnicampDB-groupfor somevaluablediscussionsonpreliminaryversionsof this material.

Chapter 5

Applying SemanticWebTechnologyinAgricultural Sciences

5.1 Intr oduction

TheSemanticWeb[22, 215, 80] foreseesanew generationof Webbasedsystems,takingadvan-tageof semanticdescriptionsof dataandservicesto enhancetheroleof computersonsupport-ing severalhumanactivities. Suchmachineprocessabledescriptions,conformingto metadatastandards,areexpectedto boostinteroperabilityandenableautomaticreasoningin cooperativeprocessesinsideandacrossorganizationalboundaries.Nevertheless,therearemany openques-tions relative to the applicability, adequacy andmaturity of the SemanticWeb technologyforrealworld applications.

In theInternetera,scientificcommunitieshavebeencreatingandaccessingamyriadof datasetsandcomputationalservices,in a diversityof fieldssuchasearthsciences,bio-informaticsandmedicine.Severalapplicationsrequiretheintegrationof theseheterogeneousdatasourcesandthe compositionof theseservices.Consequently, thereis a growing demandfor accurateand efficient meansto search,recover and interconnecttheseresources.The development,adaptationanduseof SemanticWebtechnologiesfor scientificpurposesis apromisingroutetofulfill theseneeds.

Much researcheffort hasbeendirectedto SemanticWeb issues[80, 124, 63], includingthoseinvolving scientificapplications[224, 160, 174, 102, 43]. However, very few domain-specificstudieshave beenreportedto describetheengineeringchallenges,thedomain-specificusages,andthe impactof ontologystructureandontologysizeon systemdesignandperfor-mance.

POESIA(Processesfor Open-EndedSystemsfor InformationAnalysis) (Chapter3) pur-suesthevision of theSemanticWebto bring aboutsolutionsfor resourcesdiscoveryandcom-position,interoperabilityof informationsystemsandtraceabilityof processes.Inspiredby the

86

5.2. Motivation:AgriculturalZoning 87

needsof scientificapplicationssuchasagriculturalplanning,POESIAcombinesdomainon-tologies,workflows andactivity modelsto provide novel facilities for multi-stepintegrationandprocessingof semi-structureddatain an openanddistributedenvironment. The founda-tionsof POESIAare(1) WebServicesto encapsulatedatasetsandprocesses;and(2) domainontologiesto organize,recover anddrive thecompositionof theseservices,accordingto theirutilization scopes(i.e., the situationsin which they canbe used). POESIA’s mechanismsfororganizingandcomposingWebservicesusingdomainontologies,includingrulesto assurethesemanticconsistency of theresultingprocesses,appearin (Chapter3). Theuseof thesedomainontologiesto track dataprovenanceand supportdataintegration in POESIA is describedin(Chapter4).

This paperfocuseson theengineeringchallengesof developingandusingdomainontolo-giesin POESIA.Thoughthecasestudyrefersto aparticularscientificapplication– agriculturalzoning– theapproachis extensibleto otherdomains,andusefulin awideclassof applications,that requiredataintegrationandcooperative work on the Web. In particular, the paperpointsout theobstaclesmet in loadingandutilizing domainontologiesin applicationprograms,anddescribesthesolutionsadopted,which wereimplementedin a prototype.Thesesolutionsin-volve theextractionof ontology views– i.e., applicationrelevantpartsof anontology. Ratherthan forcing applicationsto dealwith large, cumbersomeontologies,the notion of ontologyviews is adoptedto discoverandcomposeWebresources,andmanagingtheresultingcoopera-tive processes.Theexperimentsreportedin this papergive aninsighton the limitationsof thecurrentSemanticWeb technologyto dealwith ontologies,whenfacedwith real world appli-cationsusinglarge datasets.Theseexperimentsshow that thecombinationof SemanticWebstandardsandtoolswith conventionaldatamanagementtechniquesprovidesbetterscalabilitythanthesolutionsbasedonly on theSemanticWeb.

Theremainderof this paperis organizedasfollows. Section5.2describestheneedsof sci-entificapplicationsover theWeb,andparticularlyof agriculturalzoningprocesses.Section5.3describeshow the POESIAapproachaddressestheseneeds.Section5.4 presentsthe designandimplementationof the ontologyfor the agriculturerealm. Section5.5 outlinestheuseofthis domainontologyto supportservicesdiscoveryandotherfacilitiesin POESIA.Section5.6reportssomeimplementationexperiencesinvolving theconstructionof ontologyviewsandtheuseof theseviews to supportSemanticWebapplications.Finally, Section5.7discussesrelatedwork andSection5.8concludesthepaper.

5.2 Moti vation: Agricultural Zoning

This researchhasbeenmotivatedby the needsfor versatiletools to supportscientificappli-cationson the Web, and more specificallythe developmentof decisionsupportsystemsforagriculture.Oneexampleof anapplicationin this domainis agricultural zoning– a scientific

5.2. Motivation:AgriculturalZoning 88

processthat classifiesthe land in a given geographicregion into parcels,accordingto theirsuitability for a particularcrop, and the besttime of the year for key cultivation tasks(suchasplanting,harvesting,pruning,etc). Thegoalof agriculturalzoningis to determinethebestchoicesfor a productive andsustainableuseof the land,while minimizing therisksof failure.It requireslooking at many factorssuchasregional topography, soil properties,climate,croprequirements,socialandenvironmentalissues.

Typically, this kind of applicationinvolvesintricatedataprocessingactivities acrossdiffer-entorganizations.Agricultural zoningrelieson datafrom a varietyof heterogeneoussources,includingsensorsthatcollectdataonphysicalandbiologicalphenomena(e.g.,weatherstations,satellites,andlaboratoryautomationequipment).Thesedatamaybestoredin legacy databasesor files in severalformats.

An agriculturalzoningprocessis built by cooperationof expertsfrom many scientificandengineeringdisciplines.Agronomistscontributewith plantingtechniquesandcropmanagementmodels. Biologistsprovide crop growth andnutrientrequirements.Statisticiansprovide riskmanagementanalysisfor potentialcrop failures(e.g.,dueto severeweather). Thesepeople,working in inter-institutionalteamsfor particularenterprises,bring togethertheir expertiseinseveralfieldsto producecooperative processesusinga varietyof computationalplatformsanddataanalysistools.

Figure5.1presentsanexampleof outputof anagriculturalzoningprocess.It showsthesuit-ability mapfor plantingshortcyclevarietiesof soybeans,consideringaspecificclassof soils,intheBrazilianstateof Goias.Themapin Figure5.1(a)classifiesthelandsof thestateaccordingto theirsuitabilityfor sowingsoybeansin thebeginningof October, andthemapin Figure5.1(b)for sowing in thebeginningof November. Thesemapsresultfrom inter-institutionalcoopera-tivework asdescribedpreviously. In orderto producethem,expertshadto combinedataontheclimate,soilsandtopographyof thatstate,andtheenvironmentalneedsof thesoybeanplantsalongtheir developmentcycle.

Experiencesin somesectorsof theBrazilianagriculturein thelastfew yearscorroboratetheeconomicadvantagesof adoptinga scientificapproachto agriculturalzoning[58]. However,the currentagriculturalzoningprocessesarelabor-intensive, andconsequentlyexpensive andslow to developandrun. This is a seriousproblem,sinceit is anextremelyimportantissueforacountrywith avastterritory andmany commercialcropssuchasBrazil.

The problemsof sucha dataprocessingapparatusappliedto cooperative scientificappli-cationslike agriculturalzoningbecomemoreapparentfrom the perspective of the SemanticWeb:

1. Thereis a growing demandto publish,browseandinterconnectdatasetsandprocesseson theWeb.

2. Web-basedsystemslack semanticsupportfor discovering,selectingandinterconnecting

5.2. Motivation:AgriculturalZoning 89

Figure 5.1: Suitability maps for planting soybeansin Goias (Sources: Embrapa/CNPSo,DNAEE andINMET)

the available resources.In order to facilitate thesetasks,the resourcesshouldbe de-scribedaccordingto domainspecificknowledge.Suchsemanticdescriptionscouldalsocontribute to datacleansing,integrationandaggregation,which occurin multiple stepsacrossdistributedcooperativeprocesses.

3. Theprocessesthroughwhich datapassarerarelydocumented.Evenwhendocumented,the specificationsproducedareeithernot genericenoughto give a generalview of theprocessesor not formal enoughto allow theautomaticrepetitionof theseprocesseswithdifferentdata.

4. Thereshouldbesomemeansto trackdataprovenanceacrosstheseprocesses,i.e.,deter-minetheoriginal datasourcesandthewaydatawereobtainedandprocessed.

Thefollowing sectionoutlinesthePOESIAapproachfor copingwith theseproblems,whichis basedoncombiningontologieswith workflows. Wepointout thattheseissuesarenotpartic-ular to agriculturalzoning.Indeed,they arecommonto awide rangeof domains,asmentionedin the introductionof this paper. Our solutioncanbe generalizedto otherdomains,providedthattheappropriateontologiesareused.

5.3. SolutionContext 90

5.3 Solution Context

5.3.1 The POESIA Approach

Thefoundationof thePOESIAapproach(Chapter3) is theuseof a domainontologyfor mul-tiple purposesin inter-enterpriseprocessesthat gather, integrate,transformandanalyzedata.Figure5.2 illustratesthecentralrole of thedomainontologyin sucha cooperative process.Adomainontologydepictsthe semanticrelationshipsamongtermsof a knowledgedomain. InPOESIA,termsaregroupedaccordingto differentdimensionsof onereality; in theagriculturedomain,geographicspaceandcropsareexamplesof dimensions.Tuplesof terms,calledonto-logical coverages, expresstheutilizationscopesof WebServicesthatencapsulatedatasetsanddataprocessingactivities (e.g.,the spatialextent andthe cropsfor which a particularserviceis intended).Ontologicalcoveragesserve asconcisedescriptorsof resourcesbasedon domainspecificknowledge. Thesemanticrelationshipsamongthe termsof theontology, particularlyrelationshipsof the type IS A andPART OF, inducea partial orderamongontologicalcover-ages,therebyensuringthepossibilityof:

Figure5.2: Themultiple rolesof adomainontologyin thePOESIAapproach

à automationof meansto supportthediscovery andcompositionof WebServices(Chap-ter3);

à estimationof dataprovenanceacrossdistributedcooperativeprocesses(Chapter4);

à detectionof correspondencesamongheterogeneousdataitems,for dataintegrationpur-poses,basedonsemanticrelationshipsbetweentheir ontologicalcoverages(Chapter4);

5.3. SolutionContext 91

5.3.2 POESIA OntologiesasWebServices

POESIAontologiescanbepublishedandlookedupthroughWebServices.An ontologyserverencapsulatesontologiesfor differentdomains(e.g.,agriculture,biology, biotechnology),andprovidesaccessandadaptationmeansto allow severalapplicationsto usetheseontologies.Thesharingof ontologiesamongapplicationprogramsenableenactmentof cooperativeworkflowsthatuseresourcesdistributedacrosstheWeb.

Figure5.3 illustrateshow ontologiesmay be encapsulatedwithin an ontologyserver, andhow this server canbeusedto managedataandservicesin cooperative processesfor differentapplicationareas. The Supply Chain Ontology is a subsetof the Logistics On-

tology . Theseontologiesreferto theproductionanddistributionof goodsto satisfyany kindof need(e.g.,food,energy, water).TheAgriculture Ontology , in turn,hassomeinter-sectionwith thespecializationof theformerontologiesto theagriculturerealm.Eachof thesethreeontologiesis referredto by severalworkflows, for therespective applicationdomains.Agiven workflow, on the otherhand,canonly be associatedwith a givenontology, which willallow it to adequatelymanagetheresourcesnecessaryfor its execution.Theinteroperabilityofontologiesandworkflowsdesignedfor differentdomainsis beyondthescopeof thispaper, andleft to futurework.

Figure5.3: Usingdomainontologiesto handleworkflows in POESIA

The rest of this paperdescribesthe design,developmentanduseof an ontology for theagriculturerealm,providing aconcreteexampleof thebasicfacilitiesto build POESIAapplica-tions. It providesaninsightof someimplementationissues,with respectto theSemanticWebstandardsandtoolsavailablenowadays.

5.4. An Ontologyfor theAgricultureRealm 92

5.4 An Ontology for the Agricultur eRealm

5.4.1 The Ontology Design

As partof theeffort to implementandvalidatethePOESIAapproachin real life applications,we have beendevelopinganontologyto supportagriculturalzoning. This ontologyis dividedin severalfacets, congregating,interrelatingandproviding unifiedaccessto avarietyof themesrelevant to the agriculturerealm. Figure5.4 illustratesthe overall structureof this ontology,rootedat thing . The threetopmostfacetsareMeasurement Units , Agricultural

Topic andGeo-Entity . Datainstancesappearat thebottomlevel.

Figure5.4: Generalconceptionof theontologyfor theagriculturerealm

The Measurement Unit facetdescribesthe physical,chemical,biological and otherkinds of units appearingin agriculturaldata. It canbe adaptedfrom an existing ontologyofmeasurementunits. Oneparticularissuein this facetis the modelingof the relationshipsbe-tweencompatibleunits,to facilitatedataintegrationandconversionamongtheseunits.

The Agricultural Topic facet is divided in dimensionsfor particularagriculturalconcerns.Thesedimensionsareusedto specifyontologicalcoveragesdescribingtheutilizationscopesof datasetsandprocessesin theagriculturaldomain.Let usconsiderthesedimensionsin moredetail.Figure5.5depictstheAgricultural Product dimension.Therectanglesin this diagramrepresentclassesof objects. The edgesendingwith a diamondrepresentspe-cialization relationships(of type IS A) betweenclasses,i.e., theclassat the targetof suchanedge(indicatedby thediamond)is a subclassof theclassin thesourceof thatedge.Thedia-gramshows, for example,thatanAgricultural Product canbeRawor Processed .A Raw Product can be a Plant or an Animal , both of which have several subclasses.This hierarchyis in fact a directedgraph,becauseof multiple inheritance.The levelsarenotuniformfor eachkind of plantor animal.Thebottompartof Figure5.5detailsthehierarchyfor

5.4. An Ontologyfor theAgricultureRealm 93

commercialtypesof Coffee (Arabica andRobusta ) andcategoriesof Cattle (Dairy

or Meat cattle,i.e., for primarily producingmilk or meat,respectively).

Figure5.5: TheAgricultural Product dimension

Figure 5.6 depictsthe Organizations dimensionof the ontology for the agriculturerealm. Figure5.6(a)shows thatanOrganization canbea Consortium , an Institu-

tion (e.g.,company, association,governmentalbody)or a specificUnit of a Consortium

or Institution . A Consortium is composedof a numberof participatingInstitu-

tions andan Institution is composedof its Units . Theseaggregation relationships(of typePART OF)arerepresentedby edgeswith ablackcircleon thesideof theclassplayingtherole of component.Figure5.6(b)presentsa hierarchyof instancesof theclassespresentedin Figure5.6(a).This hierarchyshows, for example,thattheConsortium calledRNA(RedeNacionaldeAgrometeorologia – BrazilianAgro-meteorologicalNetwork) hasEmbrapa (Em-presaBrasileira dePesquisaAgropecuaria – BrazilianAgriculturalResearchCorporation)andUnicamp (Universityof Campinas)asits participants.CPAC, CNPTIA andCEPAGRIaretheacronymsof specificresearchcenterswithin theseinstitutions.

TheTerritory andTime dimensionsarealsorepresentedwith thebasicconstructspre-viously described. The Territory dimensionincludesseveral layersof geographicdata,suchaspolitical division (country, regions,states,etc.),ecologicalregions,hydrologicalbasinsandtypesof soil. TheGeo-Entity facet,basedon theGML standard[192], describeshowto representgeographicfeatures.

5.4.2 The Ontology on Protege

Theontologyfor theagriculturerealmhasbeendevelopedwith Protege [190], anopen-sourcegraphictool for ontologyediting andknowledgeacquisition. Figure5.7 presentsa snapshot

5.5. Exploiting OntologicalRelationships 94

Figure5.6: TheOrganization dimension

of this ontology on Protege, showing its overall structure(on the left) and somedetailsofthe interfacefor the Territory dimension,with the statesanddifferentkinds of regionalsubdivisionsin Brazil. Somedetailsof theSaoPauloStateappearin apop-upwindow centeredin thebottom.

Protegecanbeextendedwith plugins,enablingtheincorporationof new functionalitiesandthe developmentof ontologyspecificationsin a variety of formats. POESIA’s presentimple-mentationacceptsontologiesin theRDFformat[211]. Theadoptionof DAML+OIL [165] andOWL [196] is alsobeingconsidered.

5.5 Exploiting Ontological Relationships

5.5.1 Ontological Coveragesto Expressand Interr elateScopes

A POESIAontologycanbedefinedasa directedgraphwhosenodesrepresentconcepts(e.g.,Country ) or instancesof concepts(e.g., Country(Brazil) ) and whosedirectededgesrepresentsemanticrelationshipsbetweennodes(instantiation , specialization oraggregation ). Edgesgo from the generalto the instantiated,specializedor constituentconceptsor instances. Theserelationshipsinduce a partial order amongthe terms denot-ing ontologyconceptsand their instances(Chapter3). This order is determinedby the rel-ative positionsof the terms in the ontology graph. Let Á and Á º be two termsof an ontol-ogy á . We say that Á encompassesÁhº , denotedby Á�ÜÝâÁhº , if and only if thereis a path iná leadingfrom Á to Á º , i.e., a sequenceof instantiation,specializationandaggregation rela-tionshipsrelating Á to Áhº . The encompassrelationshipis transitive – if á hasa pathfrom Á to Áhº and anotherpath from Áhº to Áhº º then á hasa path from Á to Áhº º . In the ontologypresentedin the previous section,Plant Ü Ý Grain , Consortium(RNA) Ü Ý Institu-

tion(Embrapa) andPlant Ü Ý Coffee.Arabica.Variety(Tupi) . ThestringCof-

5.5. Exploiting OntologicalRelationships 95

Figure5.7: Theontologyfor theagriculturerealmonProtege

fee.Arabica.Variety(Tupi) representsthepathto reachthetermVariety(Tupi) .The pathto a term canbe omitted if thereis no possibility of ambiguity– e.g., thereis onlyoneCountry calledBrazil , but severalkindsof crops,suchasSoybeans , have a varietycalledTupi .

Considera POESIAontologywith a numberof facetsdescribingdifferentaspectsof onereality(suchasMeasurement Units andAgricultural Topics ). A facet is asub-graphof theontologygraphwhosenodeshavenoconnectionby instantiation,specializationoraggregationwith nodesof otherfacets.Thedimensions of a facetarethesub-graphswhoserootsarechildrenof thefacet’s root. An ontological coverage is a tupleof termstakenfrom thedimensionsof somefacetof a POESIAontology. For instance,theontologicalcoverage[Or-

ange, Country(Brazil)] of the Agricultural Topic facetis a tuple with termsfrom two dimensions– Agricultural Product andTerritory . Whenanontologicalcoverageis attachedto a Web Serviceit playsthe role of metadata,describingthe utilizationscopeof theservice.Theontologicalcoverage[Orange, Country(Brazil)] whenat-tachedto a Web Serviceof agriculturalproductiondata,indicatesthat datafrom that servicerefer to the productionof Oranges in Brazil . Figure 5.8 illustratesthe specificationofanontologicalcoveragein which theterm Institution(Embrapa) expressestheutiliza-tion scopein the Organization dimension,Orange in the Agricultural Product

5.5. Exploiting OntologicalRelationships 96

dimensionandCountry(Brazil).Region(SE) in theTerritory dimension.

Figure5.8: An ontologicalcoveragein theontologyfor theagriculturerealm

SemanticRelationshipsbetweenOntological Coverages

The encompassrelationshipbetweentermsgivesrise to correspondingrelationshipsbetweenontologicalcoverages.For simplicity, let usconsiderthatanontologicalcoveragehasexactlyoneterm for eachdimensionof a facet. Given two ontologicalcoverages,Ú�ÏãÝåäæÁ�ç E�è^è^è_E Áhé�êand Ú�Ï#º%Ý ä Áhº�ç E�è^è^è�E Áhºëé�ê ( ¿/ì5í ), where ÁhîX·�ÚIÏ and Áhºæïð·�Ú�Ï#º are termsfrom the sameontologyandfacet,Ú�Ï and ÚIÏ6º maybedisjoint or satisfyoneof thefollowing relationships.

Overlapping: ÚIÏñ¾�¶VÅ�ò�Ê¥É^ȳÀ&Ú�Ï º if andonly if thefollowing conditionsaresatisfied:

1. µuÁ§·UÚIÏó¸VÓ�Á º ·�ÚIÏ º suchthat Á�Ü ÝñÁ º Ç(Á º Ü ÝñÁ2. µuÁ º ·�Ú�Ï º ¸VÓ�Áw·�ÚIÏ suchthat Á�Ü ÝñÁ º Ç(Á º Ü ÝñÁ

Encompassing: ÚIÏ�ÜÝÞÚIÏ6º if andonly if thefollowing conditionsaresatisfied:

1. µuÁ§·UÚIÏó¸VÓ�Á º ·�ÚIÏ º suchthat Á�Ü ÝñÁ º2. µuÁhºÆ·�Ú�Ï#º³¸VÓ�Áw·�ÚIÏ suchthat Á�Ü ÝñÁhº

Equivalence: Ú�Ï�ß�ÚIÏ º if andonly if thefollowing conditionsaresatisfied:

1. µuÁ§·UÚIÏó¸VÓ�Áhº�·�ÚIÏ6º suchthat Á�Ü ÝñÁhº2. µuÁ º ·�Ú�Ï º ¸VÓ�Áw·�ÚIÏ suchthat Á º Ü ÝñÁ

5.5. Exploiting OntologicalRelationships 97

Overlapis bidirectionalandtheweakestof theserelationships.Theencompassrelationshipbetweenontologicalcoverages,on theotherhand,only acceptsencompassingrelationshipsbe-tweentermsin onedirection.Theequivalencerelationshiprequiresthateachpairof termstakenfrom thetwo ontologicalcoveragesreciprocallyencompasseachother. Finally, two ontologicalcoveragesaredisjoint if they do not overlapeachotherin at leastonedimension,i.e., thereisa term in oneof thecoveragesthatdoesnot encompassor is encompassedby any termof theotherontologicalcoverage.

The encompass, overlap andequivalencerelationshipsbetweenontologicalcoveragesarereflexive and transitive, andthe two latter arealsosymmetric. The transitivenessof thesere-lationshipsinducesa partial orderamongontologicalcoveragesreferring to the sameontol-ogy andsamefacet. Figure5.9 illustratesthis ordering. In the figure, ontologicalcoveragesareusedto describeservicesfor accessingagriculturalproductiondata. Thecoveragesin thefigure are definedwith respectto the Agricultural Topic dimensionof the ontology.TheOrganization dimensionwaseliminatedfor simplificationpurposes.Theontologicalcoverage[Plant, Country(Brazil)] encompassesthe coverage[Plant.Grain,

Country(Brazil)] . Only the former encompasses[Plant.Fruit.Orange,

Country(Brazil).State(RJ)] .Thecoverage[Plant.Grain, Country(Brazil)]

doesnotoverlap[Plant.Fruit, Country(Brazil).State(SP)] ,becausethesecov-eragesreferto differentkindsof crops.Thecoverages[Plant.Grain, Country(Brazil)]

and[Plant, Country(Brazil).Region(NE)] overlap,thoughneitherencompassestheother.

Figure5.9: Usingtherelationshipsamongontologicalcoveragesfor servicesdiscovering

5.5. Exploiting OntologicalRelationships 98

Ontologicalcoverageshelpto describescopeandgoalsof a service.Supposeonewantstofind theservicesproviding dataabouttheproductionof orangesin theBrazilianSouth-Eastre-gion. Suchservicesarethosewhoseontologicalcoverageoverlaps[Plant.Fruit.Orange,

Country(Brazil).Region(SE)] , whereSE is the acronym of the South-Eastregion.Thedashedlineslinking this ontologicalcoverageto thoseof theWebServicesnumbered1, 3and4 indicatesthatthosearetheservicesthatsatisfythesearchcriteria.For otherdetailsaboutthespecification,comparisonanduseof ontologicalcoveragesseeChapter3.

5.5.2 RepresentingOntological Relationships

Giventhesemanticrelationshipsdefinedin Section5.5.1,we now turn to analyzinghow theycanbeexpressedusingSemanticWebformalisms.Here,we useDAML to representsemanticrelationshipsbetweentermsof theontologyfor theagriculturaldomain.EventhoughPOESIApresentlyusesRDF, this paperusesDAML just to avoid cumbersomeRDF statementsthatcouldhinderunderstanding.

Figure5.10:Theontologyfor theagriculturerealmin DAML

5.5. Exploiting OntologicalRelationships 99

Figure 5.10(a)shows an extract of the Agricultural Product dimension. It cor-respondsto a hierarchyof classeswhereeachsubclassis linked to its parentclassby theIS A relationship.Figure5.10(b)shows the Organization dimension,with the hierarchyof classesappearingon the left side. The right sidepresentsthe propertiesusedto representthePART OFrelationshipsbetweeninstancesof theseclasses.For example,propertyInsti-

tutionsOfConsortium is usedto indicatethe institutionsthat participatein a particularconsortium.Figure5.10(c)presentssimilarconstructsfor theTerritory dimension.Finally,Figure5.10(d)definesthe encompassrelationshipin DAML – a transitive propertythat hasboth IS A andPART OF assub-properties.The IS A propertyis alsoa sub-propertyof thepredefinedpropertysubClassOfof DAML.

5.5.3 Defining Ontology Views

In reallife, domainontologiescanbecomevery large,andapplicationswill seldomneedto useanentireontology. Thus,weproposethenotionof view, which is asubsetof anontologythatisneededby anapplication.Dif ferentPOESIAapplicationscanrequiredistinctviewsof thesameontology, characterizedby distinctsubsetsof theontologyconceptsandsemanticrelationships,andrespectiveinstances.Suchviewscanfacilitateknowledgevisualizationandmanipulationinapplicationprograms.Ontologyviews canbespecifiedwith a templatebasedmethod.Classesandsemanticrelationshipsto be includedin theview aremarkedwith tags.Thepossibletagsare:

DIM CLASS is associatedwith an ontologyclassreferring to a dimensionof the ontology,to denotethat the dimensionmust be taken into accountin the view (e.g., the dimen-sionsAgricultural Product , Organization andTerritory of theagricul-tural topic facet).

ROOT CLASS marksthe root classesof a dimension(e.g., Country and Ecological

Region asrootsfor theTerritory dimension).

SHOW CLASS indicatesanintermediateclassto beshown in theview.

SHOW RELATIONSHIP labels a relationshipbetweeninstancesto be consideredin theview.

Wedevelopedanalgorithmto generateanontologyview, following thehierarchyof classesand the semanticrelationshipsamongtheir instances,and using thesetagsto decideon theclasses,instancesandrelationshipsto put in theview. Figure5.11presentsanontologyviewobtainedby thismethod.Thisview is displayedin auserinterfacewedevelopedwith theTree-bolic implementationof thehyperbolictree[27]. This interfaceallows oneto browsetheview

5.6. EngineeringConsiderationsandSystemsEvaluation 100

Figure 5.11: A view of the ontology for the agriculturerealm embeddedin an applicationprogram

andchooseontologicalcoverages.Theroot of thetreeshows theURI of thesite thatprovidesthe ontology. This snapshotdetailsthe Territory dimensionwith the Brazilian regions,states,andfiner territorial divisions. The Agricultural Product dimensionappearsattheright of theroot,while theOrganization dimensionis practicallyhiddenat theleft sideof theroot. Onecannavigatefrom theroot to theleavesof thetree,to explorethearrangementof conceptsandinstantiatedtermsin a view. The useof hyperbolictreesto browseontologyviewshasprovedto beuserfriendly, despitethehighnumberof nodesto representall thetermsin theontology(morethan15000in someexperiments).

5.6 EngineeringConsiderationsand SystemsEvaluation

5.6.1 Ar chitecture and DesignTradeoffs

Thefollowing issuesneedto besolvedin orderto implementtheontology-drivenfacilities inPOESIAapplications:

1. how to giveefficientsupportto computesemanticrelationshipsbetweenontologicalcov-erages;

5.6. EngineeringConsiderationsandSystemsEvaluation 101

2. how to constructontologyviews tailoredfor particularapplicationdomains.

Thesetwo problemsarerelated:structuralrestrictionsimposedon ontologyviews enablemoreefficient algorithmsfor comparingontologicalcoveragesthanusing,for example,infer-enceengineslike Jess[91] or Algernon[122] to processa full ontologyfor this purpose.In atree-likeview, determiningif aterm Á encompassesanotherterm Á º reducesto determiningif thestring representingthepathfrom the root ¾ to Á is theheadof thestring representingthepathfrom ¾ to Á º . In aDAG-likeview, onecanusegraphsearchalgorithmsto determineif thereis apathfrom Á to Áhº . Most of theontologyviews usedin agriculturalzoningapplicationshave thenumberof edges(semanticrelationships)proportionalto thenumberof nodes(terms).Thisen-ablescomputingsemanticrelationshipsbetweenontologicalcoverageswith linearcomplexity.

Givenour option for views, our engineeringsolutionto handleontologiesin POESIAin-volves threeaspects:(i) adopta proceduralapproachto ontology management,backed bydatabasesto attainpersistenceandscalability; (ii) projecttheontologyinto views tailoredforparticularapplications,therebyreducingthenumberof termsandrelationshipsto behandled;(iii) restricttheontologyviewsusedin applicationsto directedacyclic graphs(DAGs)or trees,in orderto facilitatethe implementationof visualizationandnavigation tools andenableeffi-cientalgorithmsto checkrelationshipsbetweenontologicalcoverages.

Figure5.12:Developingandusingontologiesin POESIAapplications

Figure5.12 illustratesthemajorcomponentsinvolved in thedevelopmentof POESIAon-tologiesandtheir usein applications.Prot ege supportsthedevelopmentof domainontolo-giesanduploadstheminto RDFandRDFSfiles. Instancesof ontologyconceptscanbestoredand loadedfrom databases , in order to speed-upthe loadingof large ontologydatasets.OntoCover is a Java library we have developedto load ontologies,build ontology viewsandhandletheseviews in application programs . OntoCover providesthefollowingfunctionality:

5.6. EngineeringConsiderationsandSystemsEvaluation 102

à loadontologiesfrom RDFandRDFSfiles into databasesandvice-versa;

à assembletree-like views of ontologiesin applicationprograms,taking instancesof con-ceptsfrom RDF filesor databases;

à graphicallybrowsetheseontologyviews in applicationprograms;

à selectontologicalcoverages(tuplesof ontology terms)andcheckoverlap,encompassandequivalencerelationshipsbetweentheseontologicalcoveragesin a tree-like view ofanontology.

5.6.2 Constructing Ontology Views

OntoCoverusestheJenatoolkit [132] version2.0to parseRDF/RDFSspecificationsof ontolo-giesdevelopedwith Protege andto handletheir statements(resource-property-valuetriples).An RDFS(RDF-Schema)file delineatesthe hierarchiesof classesof a domainontology. AnRDF file, on the other hand,specifiesinstancesof thoseclassesand semanticrelationshipsamongthoseinstances.JenaloadsRDF/RDFStext files in memoryor in a databasemanage-mentsystem(DBMS) andallowsnavigationin theRDFtriplesthroughanapplicationprograminterface(API) or the RDQL query language,an implementationof SquishQL[177]. TheDBMS providespersistenceandscalabilityfor largeontologyspecifications.

Weconstructanontologyview byusingJenain twosteps:(1) loadtheRDFSof theontologyin RAM; and(2) manipulateRDFSaccordingto thetagsdescribedin Section5.5.3,consideringthreealternativesfor gettinginstancesof theontologyconceptsto completetheview:

RAM: useJenato parseRDFspecificationsfrom files into anauxiliarydatastructurein RAM,which is manipulatedvia theJenaAPI to build thetree;

DB RDF: usetheJenaAPI to handleinstancedatastoredasRDFtriplesin PostgreSQL[206];

DB Conventional: take instancesdirectly from aconventionalPostgreSQLdatabase.

The databaseschemausedby Jenato storeRDF triples in the DBMS – for the DB RDF

strategy – is presentedin [248].Figure5.13illustratesthedatabaseschemausedby theDB Conventional strategy for

the instancesof territorial divisions in the DBMS. In this figure, rectanglesrepresenttablesand the links betweenrectanglesrepresent1:N relationships,with a black circle indicatingcardinalityN. This schemadenotes,for example,that a Country is politically divided intoOfficialRegions , eachOfficialRegion into its constituentStates , andsoon. Theterritory canalsobedividedaccordingto ecologicalissuesin MacroEcoRegions andtheir

5.6. EngineeringConsiderationsandSystemsEvaluation 103

Figure5.13:Thelegacy databasefor territorial divisions

specificSubEcoRegions . Alternatively, onecandivide theterritory accordingto hydrologi-cal,pedologicalandanumberof othercriteria.

The RAMstrategy to generatean ontologyview is expectedto give the bestperformance.However, it hasscalabilitylimitations,dueto theextensive useof memory. DB RDFandDB

Conventional , on theotherhand,combinetheflexibility of knowledgemanagementin on-tologieswith the capabilitiesof a DBMS for handlinglarge datavolumes. They avoid RAM

scalabilityproblems,without compromisingfunctionality. Our experimentsreportedon Sec-tion 5.6.3,show thatDB RDFtakestoo long,especiallyfor largedatasets,probablydueto theidiosyncrasiesof storinginstancesin RDFtriplesandhandlingthemwith Jena.DB Conven-

tional givesbetterperformancethanDB RDF. We never keepRDFSspecificationsof ourontologiesin theDBMS,becausethesespecificationsaretypically toosmallto beadvantageousto do so.

5.6.3 Experimental Evaluation

We have conductedseveralexperimentsfor implementingOntoCover andhandlingontologiesin POESIA.Thegoalof theseexperimentswasto compareimplementationalternativesin termsof ontologyview management,from anapplicationpointof view. Basically, weinvestigatedtheperformanceof differentalternativesin termsof responsetime,givena user’s requestconcern-ing relationshipsbetweenontologicalcoverages.Theresultsof preliminaryexperimentsclearlyshowedtheadvantagesof usingontologyviewsasopposedto inferenceengines.Therefore,wefocusedfurther experimentson comparingthe alternativesdescribedin Section5.6.2to buildontologyviews. In thefollowing, we reportall theexperiments,anddetailstheresultsrelativeto theconstructionof ontologyviews.

Ourexperimentsusedtheontologydescribedin Section5.4. Instancesfor theTerritory

dimensionof thisontologywereprovidedIBGE (InstitutoBrasileiro deGeografiaeEstatıstica– Brazilian Instituteof GeographyandStatistics). IBGE’s dataset includesinstancesfor allOfficial Regions , States , Meso-Regions andMicro-Regions (of the states),Counties andDistricts of Brazil. This datasethasaround5000countiesand10000districts, that we usedto generatean ontologygraphwith more than 15000nodes,to allow

5.6. EngineeringConsiderationsandSystemsEvaluation 104

experimentswith largevolumesof data.

ViewsversusInferenceEngines

A queryonthisontologyusingAlgernon[122] inferenceengineto determineif agivenState

encompassesa givenDistrict took severalminutes,on Windows98, in a2.0GHzPentiumIV machine,with 512 megabytesof RAM. This is just oneexampleof the scalabilityprob-lemsof thecurrentlyproposedSemanticWebtechnology, in particularof rule-basedinferenceengines.Theseproblemsarehardto circumventfor ontologieswith arbitrarysemanticrelation-shipsandcomplex structures.Thewholeontologyfor theagriculturaldomain,for example,hasmultipleapplicationsandincludesinverserelationshipsthatgiveriseto cyclesin theontology’sgraph-likestructure.

Whenthe ontologyis reducedto a view in the form of a DAG or tree,the algorithmsforcomparingontologicalcoveragesrun fast(linear time in the input size). Thus,all subsequentexperimentwerebasedonviews.

View Construction

Giventheengineeringoptionfor views,thebottleneckhasbeenthememoryandtimenecessaryfor loadingontologyspecificationsandextractingtheviews. Therefore,we focusedour exper-imentson this part of the solution. We conducteda seriesof experimentswith Jenaversion1.6.1andJena2. We foundout thatJena2.0outperformsversion1.6.1by 40%in averageandreducesthe memoryuseby almost2/3 for keepingRDF/RDFSin RAM. For this reason,weonly reportheretheresultsof theexperimentswith Jena2.

Figure 5.14 presentsthe resultsof someexperimentson constructingtree-like views ofchunksof theontologyfor theagriculturalrealm,with incrementsof 1000nodes.TheY-axisrepresentsthe time to build theview (Figure5.14(a))or thememoryuse(Figure5.14(b)),foreachontologychunkwhosenumberof nodesappearin theX-axis. We comparethestrategiesdescribedin Section5.6.2; namelyRAM, DB RDFandDB Conventional . For the RAMstrategy we considerthe time to parseRDFSandRDF, plus the time to build the treeby han-dling theseRDF specificationsloadedin memory. DB RDFandDB Conventional , on theotherhand,take advantageof the efficiency of a DBMS to managelarge datasetsin persis-tent memory. Thesestrategiesonly load RDFSasa whole in memory, andquery individualinstancesof theontologychunkin a PostgreSQLdatabasemodeledasRDF triples(DB RDF)or in aconventionalway(DB Conventional ). Thememoryuseis thepeakof memoryallo-cationfor loadingthenecessaryRDF/RDFStriplesandbuild theview. TheseexperimentsrunonLinux (RedHat 8), in a1.6GHzPentiumIV machine,with 512megabytesof RAM.

Therunningtimemeasurementspresentedin Figure5.14(a)show thatDB Conventional

is thefasteststrategy. RAMis slightly slower thanDB Conventional for largedatasets,be-

5.6. EngineeringConsiderationsandSystemsEvaluation 105

Figure5.14:Comparingalternativeschemesfor generatingontologyviews

causeof the burdenof parsingRDF files, as opposedto efficiently taking instancesfrom adatabasevia queriesthatuseits indexes. DB RDFis by far the slowestalternative. This badperformanceis probablydueto theway RDF breaksthedataabouteachinstance– oneRDFtriple for eachfield value– leadingto additionallevelsof indirection.Anotheradvantageof DBConventionalovertheothertwo strategiesis thatit ordersthesiblingnodesin theontologyviewby their labels,usinga secondaryindex for thefield supplyingthe labelvalues.This orderingfacilitatesbrowsingandlocationof specificitemsof theontologyview in theuserinterface.

Whencomparingthe memoryconsumptionof the threestrategies,Figure5.14(b)showsthat indeedtheRAM strategy consumesthe largestamountof memory. In contrast,both DBConventionalandDB RDF strategiesare moreeconomical,becausethey do not requiretheconstructionof intermediatedatastructuresin memoryandtakeadvantageof adatabaseto loadlarge setsof instances.DB Conventionalis slightly moreeconomicalthanDB RDF, perhapsdueto Jena’smemorymanagementstrategiesfor housekeeping.

Fromtheexperimentalresultsin Figure5.14,weobservetwo cleartrendsin thethreeimple-mentationsof ontologyview supportin OntoCoverusingJena2. First,DB RDFis expensivetobuild theontologyview (Figure5.14(a)).Second,RAM consumessignificantamountof mainmemory(Figure5.14(b)).Fortunately, theDB conventionalapproachis bothfastandeconomi-calwith memoryconsumption.Otherimplementationsmaysolvetheinstancesloadingproblemandthemainmemoryconsumptionproblemusingsomealternativetechnique,but theproblemswe foundseemto beinherentto RDF andmainmemorymanagement.For currentlyavailablesoftwaretools,usingconventionaldatabasesto storeandaccesslargeontologiesseemsto beagoodchoice.

5.7. RelatedWork 106

5.7 RelatedWork

Thereis plenty of researchnowadaysintendedto apply SemanticWeb ideasin a variety ofdomains. On the otherhand,few studiesusethe experienceacquiredin real world applica-tions to evaluatethe viability of SemanticWeb proposals,anddevise methodsto provide thescalabilityandefficiency requiredin practice.Usually, solutionsto handlelargerepositoriesofmetadataconjugateknowledgerepresentationandmanipulationtechniqueswith conventionalWebtechnologyanddatabasemanagementsystems[132, 248, 74,162, 133, 30].

RDF [147,80] is themajorformatfor machine-processablemetadatain theSemanticWeb.Thebasicconstructof theRDF modelis thestatement– a triple of theform subject-predicate-object, wheresubjectrefersto a resource(anything that canbe denotedby a URI), predicateis a propertyof that resource,andobject is the valueof that property. The object can be aliteral (e.g.,a string)or anotherresource.RDFS(RDF-Schema)extendsRDF with classesandpropertiesto specifydomainvocabulary andobjectstructures,i.e., definespecificclassesandpropertiesfor a particulardomainor application.OtherlanguagessuchasDAML+OIL [165]andOWL [196] extendtheRDF/RDFSvocabulary to enrichontologies’expressiveness(e.g.,expressdisjunctionof classesandotherconstraints).Thus,for metadataanalysispurposes,onecanconsiderjust RDF triples.

RDFcanbeexpressedusingXML syntax.However, XML querylanguagessuchasXQuery[258, 2] arenot suitablefor RDF, becausethey arebasedon theXML treestructureandignoretheRDF model.Hence,several languagesandtoolshave beendevelopedspecificallyto queryRDF. Jena[132, 248] andits improved versionJena2 is a populartoolkit for handlingRDFtriples. [74] and[162] proposeotherimprovementsto acceleratequerieson RDFtriples.

Nevertheless,procedurallanguagesfor handlingRDFtriplesandtheircomponentsarecum-bersome.For many applications,suchasbuilding ontologyviews in POESIA(Section5.6.2),a template-baseddeclarative languagewould be more appropriate. RQL (RDF Query Lan-guage)[133] is a declarative languagefor queryingRDF directedgraphs, in which resourcesand objectsare representedby labelednodesand propertiesby labelededges. RQL adaptsfunctionality of querylanguagesfor semi-structuredandXML data[2], to provide functionalconstructs,in thestyleof OQL [40], for uniformly queryingRDF andRDFS.Sesame[30] is aserver-basedarchitecturefor storingandqueryinglargequantitiesof metadatain RDF/RDFS,with supportfor RQL andconcurrency control. Sesamecanbedeployedon top of a varietyofstoragedevices,suchastriple stores,relationalandobject-orienteddatabases.

However, it is possibleto handleRDF/RDFSin anevenhigherabstractionlevel. Jess[91]andAlgernon[122] areexamplesof inferenceenginesableto handlemetadatain RDF/RDFSandrelatedformats.Thesetoolscanbepluggedto anontologyeditorsuchasProtege [190] orsimply processRDF/RDFSexportedby suchaneditor. They regardRDF/RDFSstatementsasrulesformalizingdeclarative knowledge,andapply inferenceto derive otherknowledge. Our

5.7. RelatedWork 107

experimentsshowedthattheperformanceof Algernonis insufficientfor ourapplications.Thus,asdiscussedin Section5.6, we decidedto combinetools for handlingRDF at the statementslevel. Thetheoreticalresultsenablingour implementationappearin Chapter3.

From the perspective of SemanticWeb applications,the key point is to take advantageofknowledge,representedin standardslike RDF, to leverageautomatedmeansto describe,orga-nize,discover, selectandcomposeWebresourcesfor thesolutionof avarietyof problems.Themostusualapproachis to definesemanticmarkupbasedon someontology, andusethemtointegrateandprovide unifiedaccessto dataandservices,typically via Webportals.Therearemany examplesof this approachin theliterature[121, 216,111,13].

Someexperimentalsystemspossessdistinctive features.Edutella[188] is a Peer-to-PeerinfrastructureusingRDF metadatato facilitateaccessto educationalresources.In Edutella,eachpeerholdsa setof resourcesandhasanRDF repositoryof resourcedescriptions,to allowqueryingits contentsat the storagelayer (e.g.,SQL) or userlayer (e.g.,RQL). Peerscanbeheterogeneousin their internalorganizationandthe query languagethey provide. The com-mondatamodelandtheexchangelanguageof Edutellaenablesa standardinterfacefor posingqueriesto specificpeersor communitiesandfind resourcesacrossthe network. Piazza[115]is an infrastructureto provide interoperabilityof datasourcesin the Web, by mappingtheircontentsat the domainlevel (RDF) andthe documentstructurelevel (XML), andaddressingtheinteroperationbetweentheselevels.Themappingsarespecifieddeclaratively for smallsetsof nodes.A queryansweringalgorithmchainsthesemappingstogetherto obtainrelevantdatafrom acrossthe network. Otherworks focuson the interoperabilityof scientificdatareposi-torieson the Web [160, 224]. Finally, the grid – a platform for coordinatedresourcesharingthroughthe Internet,increasinglyusedfor scientificdataprocessing– andthe SemanticWebhave mutualcharacteristicsandgoals[102]. Both operatein a global,distributedanddynamicenvironment,andbothneedcomputationallyaccessibleandsharablemetadatato supportauto-matedinformationdiscovery, integrationandaggregation.

POESIAis similar to someof theseinitiatives,in thesensethat they favor cooperationofpeers,usingSemanticWeb apparatusto boostinteroperability, insteadof trying to coercethepeersto auniqueintegrationschema.Yet,to thebestof ourknowledge,POESIAis theonly ap-proachthatemploysthepartialorderingof resourcedescriptors– namelyontologicalcoveragesandtheir semanticrelationships– to organize,discover, andreuseresourcesin a particulardo-main. POESIAalsoincludesmechanisms,basedon ontologicalcoverages,workflows andac-tivity models,to semanticallyorientthecompositionof WebServicesin cooperativedistributedprocesses(Chapter3) andhelpto tracetheinformationflow acrosstheseprocesses(Chapter4).

5.8. Conclusions 108

5.8 Conclusions

The SemanticWeb technologyhaspotentialto supportscientificapplicationsthat gatherandintegratedatafromseveralsourcesanduseavarietyof dataprocessingresources.It canimprovethe functionalitiesof currentsyntax-baseddataprocessing,andprovide enhancedfacilities insemanticawareopen-endedinformationsystems.

This paperhasoutlined the POESIAapproachfor dataintegration,cooperative datapro-cessingand information analysis. It consideredparticular implementationissuesfor a newgenerationof informationsystemsbasedon the SemanticWeb – the loading,adaptationanduseof domainontologiesin applicationsinvolving dataandservicesdiscoveryandcompositionon the Web. The main contributionsare (1) carryingout facilities adheringto the SemanticWebin ascientificapplicationfor theagriculturaldomain;(2) pointingout someshortcomingsof currently proposedstandardsand tools, when facedwith real life systemsand large datavolumes;(3) thedesignandimplementationof somesolutionsto overcometheselimitations.Thoughtheseresultswerepresentedin thecontext of a casestudyin agriculture,they applytoseveraldomainsandawideclassof ontology-basedsystems.In orderto applyPOESIAto otherdomains,two basicrequirementsmustbe met: the availability of domainontologies;andthecooperationof domainexpertsto specifytheir workflows anddefinetheappropriateontologyviews.

The OntoCover packagefor generatingontologyviews, browsing theseviews andcopingwith ontologicalcoverageshasbeencompletelyimplementedandincorporatedin WOODSS(Workflow-basedDecisionSupportSystem),a tool thatappliesscientificworkflows to processgeographicdatafor decisionmaking purposes[214]. The associationof ontologicalcover-ageswith workflow activities anddatain WOODSSprovidesa testbedfor theuseof POESIAsemanticdescriptionsto organizethe resourcesrequiredby cooperative processesinvolvinggeographicdata– e.g., in environmentalplanningor biodiversity studies. This approachhasbeendevelopedin conjunctionwith expertsin agriculture.Completeimplementationandvali-dationinvolve many otherissues(e.g.,Webservicesimplementation,choreographingservicesin cooperativeprocesseson theWeb),andareleft to futurework.

ThePOESIAapproachcouldbeappliedto theagriculturerealmbecausedomainexpertsinthisareawereableto establishtheontologicalagreementsnecessaryto describeandinterrelatedataandprocessingactivities of cooperative processes.In caseswherethis is not possible,it is necessaryto establishsemanticconnectionsbetweenthe ontologiesusedto describetheresourcesemployed in differentpartsof a cooperative process.This requiresfurther researchonontologiesintegration,andarticulationof processesframeworksusingdifferentontologies.

Anotherextensionfor theSemanticWebresearchis to developanalgebrafor handlingon-tologies,with facilities for declaratively expressingandgeneratingontologyviews, aswell asmerging andintegratingontologies.A richer setof semanticrelationshipscould alsobe con-

5.8. Conclusions 109

sideredto extendthe POESIAapproach.RQL andother languagesfor queryingRDF in thesemanticlevel mustalsobeexaminedto expresstheontologyviews and/ordeterminetermen-compassingin the POESIAapproach.Still, otherresearchthemesincludeevaluatingvariousstandardsandtoolsarisingfrom theSemanticWebresearch(e.g.,DAML+OIL, OWL) to im-plementPOESIA;developingcatalogsto supportservicesdiscovery andcompositionfoundedby domainontologies;andapplyingthePOESIAapproachin otherdomains,suchasecology,biotechnology, sociology, economyandbusiness.

Acknowledgments

Theauthorsfrom CampinasUniversityarepartiallysupportedby Embrapa,CAPES,CNPqandthe MCT/PRONEX-SAI andCNPqWebMapsprojects. The authorsfrom Georgia Techarepartially supportedby two grantsfrom theOperatingSystemsandITR programs(CISE/CCRdivision) of NSF, by a contractfrom the SciDAC programof DoE, a contractfrom thePCESprogram(IXO) of DARPA, a facultyawardanda SURgrantfrom IBM. Theapplicationsce-nariosusedin this work wereprovidedby Brazilianexpertsin agriculture.TheusualthankstoDanielAndrade,who is alwaysreadyto helpin severalissuesin our lab. Specialthanksto pro-fessorsAna CarolinaSalgado,CaetanoTrainaJunior, Celio CardosoGuimaraesandEdmundoMadeira,whoprovidedseveralsuggestionsfor theimprovementof this work.

Capıtulo 6

Conclusoes

“So let usnot beblind to our differences–but let usalsodirectattentionto our commoninterests

andto themeansbywhich thosedifferencescanberesolved.Andif wecannotnowendour differences,

at leastwecanmake this world safefor diversity.”

JohnF. Kennedy, 1963

A Websemanticavisaestendero papeldoscomputadoresno suportea diversasatividadeshumanas,atravesdedescritoressemanticosdosrecursosdisponıveisemrede.Estateseapresen-tou resultadosaderentesa Websemanticaparaauxiliar a localizacaoderecursos,a integracaodedadose a determinac¸aodesuaproveniencia,emprocessosobtidosmediantea composic¸ao,semanticamenteconsistente,de servicos Web. A abordagemPOESIA,centradaem umaon-tologiadedomınio, modelosdeatividadese workflows, fornecefacilidadescomplementaresaoutrosresultadosparaa integracaodedadose servicosemaplicacoesWeb,particularmentenocampocientıfico.

6.1 Contrib uicoes

A principaiscontribuicoesdestetrabalhosao:

1. descricao dos requisitosestruturaise funcionaisde uma aplicacao cientıfica – zonea-mentoagrıcola – em quegrandesvolumesde dadosheterogeneossao correlacionadossobcondicoesespaciaise temporais,emprocessoscomplexosnaWeb;

2. umarcabouc¸o teorico,baseadoemontologiasdedomınio,modelosdeatividadesework-flows,paraadescricao,organizac¸ao,recuperac¸aoecomposic¸aodedadoseservicos;

110

6.2. Extensoes 111

3. regrasparaverificara consistenciasemanticadecomposic¸oesderecursos,combaseemconceitosespecıficosdodomınio deaplicacao;

4. combinac¸aodeumaontologiadedomınio e descricoesdefluxosdedadosparaavaliar aprovenienciadedadosemprocessosdistribuıdosnaWeb;

5. criteriosparaauxiliar a integracao de dados,fundamentadosno usode coberturason-tologicasparaexpressaro escopoeagranularidadedosdados;

6. validacao parcial do arcabouc¸o teorico, atraves da implementac¸ao de alternativasparalidar com grandesvolumesde dadosem um domınio especıfico. Em particular, estesexperimentosdeimplementac¸ao,descritosnoCapıtulo 5, indicaramqueastecnicasefer-ramentasatualmentedisponıveisparaaWebSemanticanaoconseguemgerenciargrandesvolumesdedadosdemaneirasatisfatoria.

Estascontribuicoesforampublicadasou submetidasparapublicacaoresultandoemum ar-tigo emrevista internacionalindexada[83] e quatroartigosemconferencias[86, 82, 84, 236],alemdeum artigoparaconferenciainternacionaleumrelatorio tecnicorecentementesubmeti-dos.

6.2 Extensoes

OstrabalhosfuturosnaabordagemPOESIAincluem:

Generalizacaodasontologias: A abordagemPOESIAebaseadaempropriedadesderelacoessemanticasentreos termosdeumaontologiadedomınio, especificamenteequivalencia,agregacao e especializac¸ao. Tais propriedadesdefinemumaordemparcialentreos ter-mose a estruturac¸aodaontologiae dosframeworksdeprocessossoba formadegrafosacıclicosdirecionados.Essascaracterısticas,porsuavez,permitema implementac¸aoefi-cientedasfacilidadespropostasparaPOESIA.A inclusaodeoutrasrelacoessemanticaspodeenriquecera abordagem.Por exemplo,a relacao dedisjuncao podeexpressarqueduasregioesgeograficas(taiscomodoisestados)naosesobrepoem.Extensoesaoarca-bouco teoricodaabordagemPOESIAprecisamseranalisadascomcuidado,paragarantiramanutenc¸aodaconsistenciadaabordagem.

Implementacaocomdiferentestecnologias: Osexperimentosrealizadosnestedoutoradoli-mitaram-sea implementac¸ao dosmecanismosnecessariosa manipulac¸ao de ontologiase coberturasontologicas. Essasimplementac¸oesutilizam RDF pararepresentara on-tologia de domınio e ferramentasproceduraisparaa carga e utilizacao dasontologiasem aplicacoes. Linguagensmaisexpressivasparaa representac¸ao de ontologias(como

6.2. Extensoes 112

OWL), linguagensdeclarativasparamanipulac¸aodeconhecimento(comoRQL)epadroesdemetadadosparaareasespecıficas(comoGML) devemserconsideradosemimplemen-tacoesfuturas.Al emdisso,pacotesparao desenvolvimentodeservicosWebtemevoluıdorapidamente,eprecisamseravaliadosparaa implementac¸aocompletadePOESIA.

Validacao emdiversasareasde aplicacao: Estetrabalholimitou-seadefinicaodosrequisitosdeaplicacoesemagricultura,particularmentedo zoneamentoagrıcola. O proximo passoeavalidacaodaabordagemPOESIAjuntoa especialistasdeoutrosdomınios,utilizandoe aperfeicoandoosprototiposdesenvolvidosnessetrabalho.POESIAtempotencialparaaplicacaoemdomınioscomoecologia,bioinformatica,sociologia,economiaenegocios.

Outrasextensoestranscendema abordagemPOESIAeconstituemdesafiosparapesquisa:

Geracaode ontologias: A construc¸ao de ontologiase uma tarefa laboriosae sujeita a er-ros, omissoese imprecisoes. Destaforma, metodose ferramentasparaautomatizaraconstruc¸aodeontologiasapartirdetextos,dadossemi-estruturadoseestruturadospodemcontribuir parabaixaroscustoseelevaraqualidadedasontologias[94, 62,181, 182,184].

Interoperabilidadede ontologias: O desenvolvimentodeontologiasparadiferentesdomıniose aplicacoesleva a problemasdeinteroperabilidadeentreontologias.A abordagemPO-ESIA so pode ser aplicadaa agriculturaporqueos especialistasdessedomınio foramcapazesde estabeleceracordosparaa definicao de um referencialontologico comum.Noscasosemqueissonaofor possıvel, deve-sedefinir conexoesentreontologiasdistin-tas. Propostasde solucao paraesseproblemaincluemalgebrase modelosbaseadosemgrafosparaacomposic¸aoearticulacaodeontologias[79, 183, 131, 247,245,246].

Sincronizacao deprocessoscooperativosna Web: A tecnologiaatualdesincronizac¸aodepro-cessosbaseia-seprincipalmenteem“orquestrac¸ao” detarefas,i.e.,controlecentralizado,mesmoquea execucao sejadistribuıda. Processoscooperativosna Web requerem“co-eografia”de atividadesautonomas,baseadana integracao de protocolosparagarantiraexecucaoharmonicadeworkflows interorganizacionais.Algumaslinguagensdecompo-sicaodeservicosWeb[234,116, 20, 255, 87, 204, 250,175]e tecnicasdemodelagemdeprocessos[116,235,186, 93, 157] visamcontemplaressesrequisitos.

Referencias

[1] S. Abiteboul. Queryingsemi-structureddata. Em Proc. ICDT Conf., volume1186ofLNCS, pp.1–18.Springer-Verlag,1997.

[2] S.Abiteboul,P. BunemaneD. Suciu.DataontheWeb– fromrelationsto semistructureddataandXML. MorganKaufann,SanFrancisco,CA, 2000.

[3] T. Abrahame J. F. Roddick. Survey of spatio-temporaldatabases.GeoInformatica,3(1):61–995,1999.

[4] B. Adelberg. NoDoSE- a tool for semi-automaticallyextractingsemi-structureddatafrom text documents. Em Proc. Intl. Conf. on Managementof Data (SIGMOD), pp.283–294.ACM Press,1998.

[5] C. D. Aguiar. Heterogeneousdatabaseintegration into urban planningapplications.Dissertac¸aodeMestrado,Departmentof ComputerScience,StateUniversityof Camp-inas,Brazil, 1995.(in portuguese).

[6] L. Ahmedi, P. J. Marron e G. Lausen. Ontology-basedaccessto heterogeneousXMLdata,2001.

[7] L. Ahmedi,P. J. Marron e G. Lausen.Ontology-basedqueryingof linkedXML docu-ments,2002.

[8] A. Ailamaki, Y. E. Ioannidise M. Livny. Scientificworkflow managementby databasemanagement.Em Proc. Conf. on Statisticaland ScientificDatabaseManagement, pp.190–199.IEEE ComputerSociety, 1998.

[9] J. Albrecht. Geospatialinformationstandardsa comparative studyof approachesin thestandardisationof geospatialinformation.Computers & Geosciences, 25:9–24,1999.

[10] G. Alonso e A. E. Abbadi. Cooperative modelingin appliedgeographicresearch.EmCoopIS, pp.227–234,1994.

113

Referencias 114

[11] G. Alonso e C. Hagen. Geo-opera:Workflow conceptsfor spatialprocesses.Em Ad-vancesin SpatialDatabases- 5th Intl. Symp.onLargeSpatialDatabases(SSD), volume1262of LNCS, pp.238–258.Springer-Verlag,1997.

[12] I. Altintas,S.Bhagwanani,D. Buttler, S.Chandra,Z. Cheng,M. Coleman,T. Critchlow,A. Gupta,W. Han,L. Liu, B. Ludascher, C. Pu,R. Moore,A. ShoshanieM. A. Vouk. Amodelingandexecutionenvironmentfor distributedscientificworkflows. EmProc.Intl.Conf. on ScientificandStatisticalDatabaseManagement(SSDBM), pp.247–250.IEEEComputerSociety, 2003.

[13] B. Amann,C. Beeri, I. Fundulakie M. Scholl. Ontology-basedintegrationof xml webresources.Em Proc. Intl. SemanticWebConf. (ISWC), volume2342of LNCS, pp.117–131.Springer-Verlag,2002.

[14] A. Ankolekar, M. H. Burstein,J. R. Hobbs,O. Lassila,D. Martin, D. V. McDermott,S.A. McIlraith, S.Narayanan,M. Paolucci,T. R. Paynee K. P. Sycara.DAML-S: Webservicedescriptionfor the semanticweb. Em Proc. Intl. SemanticWeb Conf. (ISWC),volume2342of LNCS, pp.348–363.Springer-Verlag,2002.

[15] F. Baader, D. McGuinness,D. Nardi e P. Patel-Schneider. TheDescriptionLogic Hand-book: Theory, Implementation,andApplications. CambridgeUniversityPress,2003.

[16] T. Bandholtz.Sharingontologyby webservices:Implementationof asemanticnetworkservice(SNS) in the context of the germanenvironmentalinformationnetwork (gein).Em Proc. Intl. WorkshoponSemanticWebandDatabases(SWDB), pp.189–201,2003.

[17] T. Barclay, D. R. Slutze J.Gray. TerraServer: A spatialdatawarehouse.Em Proc. Intl.Conf. on Managementof Data (SIGMOD), pp.307–318.ACM Press,2000.

[18] C. Batini, M. L. e S.B. Navathe.A comparativeanalysisof methodologiesfor databaseschemaintegration.ACM ComputingSurveys, 18(4):323–364,1986.

[19] C. Behrense V. Kashyap.The”emergent” semanticweb: An approachfor derivationofsemanticagreementson theweb. EmProc.SemanticWebWorkingSymposium(SWWS),2001.

[20] B. Benatallah,Q. Z. Shenge M. Dumas. The self-servenvironmentfor web servicescomposition.IEEEInternetComputing, 7(1):40–48,2003.

[21] S. Bergamaschi,S. Castanoe M. Vincini. Semanticintegrationof semistructuredandstructureddatasources.SIGMODRecord, 28(1):54–59,1999.

Referencias 115

[22] T. Berners-Lee,J. Hendlere O. Lassila. The semanticweb. ScientificAmerican, May2001.

[23] P. A. Bernsteine T. Bergstraesser. Meta-datasupportfor datatransformationsusingmicrosoftrepository. IEEE DataEngineeringBulletin, 22(1):9–14,1999.

[24] E. Bertino. A view mechanismfor object-orienteddatabases.Em Proc. Intl. Conf. onExtendingDatabaseTechnology (EDBT), volume580of LNCS, pp.136–151.Springer-Verlag,1992.

[25] A. Bonifati e S. Ceri. Comparative analysisof five XML query languages.SIGMODRecord, 29(1):68–79,2000.

[26] K. A. V. Borges. Geographicaldatamodeling: Extensionof OMT for spatialapplica-tions. Dissertac¸aodeMestrado,Departmentof ComputerScience,FederalUniversityofMinasGerais,Brazil, 1997.(in portuguese).

[27] B. Bou. Treebolica java appletfor hyperbolichenderingof hierarchicaldata. http:

//treebolic.sortilege.net/en asof September2003.

[28] D. Box,D. Ehnebuske,G.Kakivaya,A. Layman,N. Mendelsohn,H. F. Nielsen,S.Thattee D. Winer. W3C’sSimpleObjectAccessProtocol(SOAP). http://www.w3.org/

TR/SOAP/ (asof October2003).

[29] D. Brickley eR.V. Guha.RDFvocabularydescriptionlanguage1.0: RDFschema,2003.http://www.w3.org/TR/rdf- schema/ (asof October2003).

[30] J.Broekstra,A. KampmaneF. vanHarmelen.Sesame:A genericarchitecturefor storingandqueryingRDFandRDFschema.EmProc.Intl. SemanticWebConf. (ISWC), volume2342of LNCS, pp.54–68.Springer-Verlag,2002.

[31] P. Brown eM. Stonebraker. BigSur:A systemfor themanagementof earthsciencedata.Em Proc.VLDBConf., pp.720–728.MorganKaufmann,1995.

[32] P. Buneman.Semistructureddata.Em 16thACM Symposiumon Principlesof DatabaseSystems(PODS’97), pp.117–121,1997.

[33] P. Buneman,S.Khannae W. C. Tan. Why andwhere:A characterizationof dataprove-nance.EmProc.Intl. Conf. onDataTheory(ICDT), volume1973of LNCS, pp.316–330.Springer-Verlag,2001.

[34] D. Buttler, M. Coleman,T. Critchlow, R. Fileto, W. Han,C. Pu,D. Roccoe L. Xiong.Queryingmultiplebioinformaticsinformationsources:Cansemanticwebresearchhelp?SIGMODRecord, 31(4):59–64,2002.

Referencias 116

[35] D. Buttler, L. Liu eC. Pu.A fully automatedobjectextractsystemfor theweb. EmProc.Intl. Conf. ondistributedComputingSystems(ICDCS). IEEE Press,2001.

[36] D. Buttler, L. Liu, C. Pu, H. Paques,W. Han e W. Tang. OminiSearch:A methodforsearchingdynamiccontenton the web. Em Proc. Intl. Conf. on Managementof Data(SIGMOD), pp.604.ACM Press,2001.

[37] A. M. C. Bussler, D. Fensel.A conceptualarchitecturefor semanticweb enabledwebservices.SIGMODRecord, 31(4):24–29,2003.

[38] J.CardosoeA. Sheth.Semantice-workflow composition.Report,LSDISLab,ComputerScienceDep.,Univ. of Georgia,2002.

[39] F. Casatie U. D. (editors). Specialissueon web services. IEEE Data EngineeringBulletin, 25(4),2002.

[40] R. G. G. Cattell,D. Barry, M. Berler, D. Jordan,C. Russel,O. Schadow, T. StaniendaeF. Velez.TheObjectData Standard - ODMG 3.0. MorganKaufmann,2000.

[41] M. C. Cavalcanti,F. A. Baiao,S. C. Rossle,P. M. Bisch,R. Targino, P. F. Pires,M. L.Campose M. Mattoso. Structuralgenomicworkflows supportedby webservices.EmProc.Intl. Conf. onDatabaseandExpertSystemsApplications(DEXA), pp.45–49.IEEEComputerSociety, 2003.

[42] M. C. Cavalcanti,M. Mattoso,M. L. Campos,F. Llirbat e E. Simon. Sharingscientificmodelsin environmentalapplications.EmProc.ACM SymposiumonAppliedcomputing(SAC), pp.453–457.ACM Press,2002.

[43] M. C. Cavalcanti,M. Mattoso,M. L. Campos,E. Simone F. Llirbat. An architecturefor managingdistributed scientific resources. Em Proc. Intl. Conf. on ScientificandStatisticalDatabaseManagement(SSDBM), pp.47–55.IEEEComputerSociety, 2002.

[44] S.Ceri,P. FraternalieS.Paraboschi.XML: Currentdevelopmentsandfuturechallengesfor the databasecommunity. Em Proc. Intl. Conf. on ExtendingDatabaseTechnology(EDBT), volume1777of LNCS, pp.3–17.Springer-Verlag,2000.

[45] D. Change D. Harkey. Client/ServerData Accesswith JavaandXML. JohnWiley &Sons,1998.

[46] S. Chaudhurie U. Dayal. An overview of datawarehousingandolaptechnology. SIG-MOD Record, 26(1):65–74,1997.

Referencias 117

[47] V. Christophides,S.ClueteJ.Simeon.OnwrappingquerylanguagesandefficientXMLintegration. Em Proc. Intl. Conf. on Managementof Data (SIGMOD), pp. 141–152.ACM Press,2000.

[48] S.Cluet,C. Delobel,J.Simeone K. Smaga.Your mediatorsneeddataconversion! EmProc. Intl. Conf. onManagementof Data (SIGMOD), pp.177–188.ACM Press,1998.

[49] G. Camara,M. A. Casanova,A. Hemerly, G. C. Magalhaese C. B. Medeiros.Anatomyof Geographical Informationsystems. StateUniversityof Campinas,Brazil, 1996. (inportuguese).

[50] G. Camara,A. M. V. Monteiro, J. A. Paiva, J. Gomese L. Velho. Towardsa unifiedframework for geographicaldatamodels. Journal of the Brazilian ComputingSociety,7(1),2000.

[51] G. Camara,R. Thome, U. Freitase A. Monteiro. Interoperabilityin practice:Problemsin semanticconversionfrom currenttechnologyto openGIS. Em Intl. Conf. on Interop-erableGIS, 1999.

[52] OMG’s CommonObjectRanguageesquestBroker Arquitecture(CORBA). http://

www.omg.org/CORBA (asof October2003).

[53] O. Corcho,A. Gomez-Perez,M. Fernandez-Lopeze M. Lama. ODE-SWS:A semanticwebservicedevelopmentenvironment. Em Proc. Intl. Workshopon SemanticWebandDatabases(SWDB), pp.203–216,2003.http://www.cs.uic.edu/˜ifc/SWDB/

(asof November2003).

[54] J.E. Corcoles,P. GonzA¡lez eV. L. Jaquero.Integrationof spatialXML documentswithRDF. Em Proc. Ibero AmericanConferenceon WebEngineering(ICWE), volume2722of LNCS, pp.407–410.Springer-Verlag,2003.

[55] Y. Cui eJ.Widom. Practicallineagetracingin datawarehouses.EmProc.Intl. Conf. onData Engineering(ICDE), pp.367–378.IEEE,2000.

[56] Y. Cui e J. Widom. Lineagetracing for generaldatawarehousetransformations.EmProc.VLDBConf., pp.471–480.MorganKaufmann,2001.

[57] Y. Cui, J. Widom e J. L. Wiener. Tracing the lineageof view datain a warehousingenvironment.ACM TODS, 25(2):179–227,2000.

[58] G. R. CunhaeE. D. A. (eds.).An overview of thespecialissueoncropzoningin Brazil.BrazilianJournalof Agrometeorology, 9(3),2001.(in Portuguese).

Referencias 118

[59] B. Curtis, M. Kellner e J. Over. ProcessModeling. Communicationsof the ACM,35(9):75–90,1992.

[60] A. S. da Silva, P. Calado,R. Vieira, A. H. F. Laendere B. A. Ribeiro-Neto. Keywork-BasedQueriesoverWebDatabases, pp.74–92.IRM Press,2003.

[61] The DARPA Agent Markup Language(DAML). http://www.daml.org/ (asofAugust2003).

[62] H. Davulcu, S. Vadrevu e S. Nagarajan.Ontominer:Bootstrappingandpopulatingon-tologiesfrom domainspecificwebsites.Em Proc. Intl. Workshopon SemanticWebandDatabases(SWDB), pp.259–276,2003.http://www.cs.uic.edu/˜ifc/SWDB/

(asof November2003).

[63] Y. Ding, D. Fensel,M. Klein e B. Omelayenko. The semanticweb: yet anotherhip?Data & KnowledgeEngineering, 41(2/3):205–227,junho2002.

[64] B. Dinter, C. Sapia,G. Hofling e M. Blaschka. The OLAP market: stateof the artand researchissues. Em ACM 1st Intl. Workshopon Data Warehousingand OLAP(DOLAP’98), pp.22–27,1998.

[65] Dublin CoreMetadataInitiative. http://www.dublincore.org/ (asof October2003).

[66] D. S. (ed.). Specialissueon managementof semistructureddata. SIGMOD Record,26(4),1997.

[67] A. Y. H. (editor). Specialissueon XML datamanagement.IEEE Data EngineeringBulletin, 24(2),2002.

[68] G.W. (editor).Specialissueonorganizinganddiscoveringthesemanticweb. IEEEDataEngineeringBulletin, 25(1),2002.

[69] R. M. (editor). Specialissueon integrationmanagement.IEEE Data EngineeringBul-letin, 25(3),2002.

[70] M. Egenhofer. Towardthesemanticgeospatialweb. EmProc.ACM GIS, 2002.

[71] G. Ehmayer, G. Kappele S. Reich. Connectingdatabasesto the web: A taxonomyofgateways. Em Proc. Intl. Conf. on DatabaseandExpertSystemsApplications(DEXA),pp.1–15.IEEE ComputerSociety, 1997.

[72] A. K. ElmagarmideC.Pu.Introduction:Specialissueonheterogeneousdatabases.ACMComputingSurveys, 22(3):175–178,1990.

Referencias 119

[73] R.ElmasrieS.B. Navathe.Fundamentalsof DatabaseSystems. Addison-Wesley, MenloPark,CA, 1994.

[74] F. Esposito,L. Iannone,I. Palmisanoe G. Semeraro.RDF core: A componentfor ef-fective managementof RDF models. Em Proc. Intl. Workshopon SemanticWeb andDatabases(SWDB), pp.169–187.VLDB endowment,2003.http://www.cs.uic.

edu/˜ifc/SWDB/ (asof November2003).

[75] P. H. C. et al. Climatic risk for coffeein Parana state.Brazilian Journal of Agrometeo-rology, 9(3):486–494,2001.(in Portuguese).

[76] J.Euzenat.Researchchallengesandperspectivesof thesemanticweb. IEEE IntelligentSystems, 17(5):86–88,2002.

[77] H. Fane A. Poulovassilis.Tracingdatalineageusingschematransformationpathways.Em Workshopon Knowledge Transformationfor the SemanticWeb KTSW/ECAI, vol-ume95,pp.64–79.IOSPress,2003.

[78] A. Farquhar, R. Fikese J.Rice. Theontolinguaserver: a tool for collaborativeontologyconstruction.Intl. Journalof HumanComputerStudies, 46(6):707–727,1997.

[79] D. Fensel. Ontology-basedKnowledgeManagement.IEEE Computer, 35(11):56–59,2002.

[80] D. Fensel,J.Hendler, H. LiebermaneW. W. (editors).SpinningtheSemanticWeb. MITPress,2003.

[81] C. Ferrise J. Farrel. What areweb services? Communicationsof the ACM, 46(6):31,2003.

[82] R. Fileto. Issueson interoperabilityof heterogeneousand geographicaldata. EmSimposioBrasileiro deGeoinformatica(GEOINFO), pp.133–140,2001.

[83] R.Fileto,L. Liu, C.Pu,E.D. AssadeC.B. Medeiros.POESIA:An ontologicalworkflowapproachfor composingwebservicesin agriculture.TheVLDBJournal, 12(4):352–367,2003.

[84] R. Fileto,L. Liu, C. Pu,E. D. AssadeC. B. Medeiros.Usingdomainontologiesto helptrackdataprovenance.Em Proc.BrazilianSymposiumonDatabases, pp.84–98,2003.

[85] R. Fileto e C. B. Medeiros.Thedesignof decisionsupportsystemsfor effective useofspatio-temporaldata.EmPh.D.StudentsWorkshopat EDBTConf., Konstanz,Germany,2000.

Referencias 120

[86] R. Fileto, C. A. A. Meira, A. S. Neto, J. Nakae C. B. Medeiros. An XML-centeredwarehouseto manageinformation of the fruit supply chain. Em TheWorld Conf. onComputers in AgricultureandNatural Resources(WCCA), 2001.

[87] D. Florescu,A. Grunhagene D. Kossmann.XL: An XML programminglanguageforweb wervicespecificationandcomposition. Em Proc. WWWConf., pp. 65–76.ACMPress,2002.

[88] D. Florescu,A. GrunhageneD. Kossmann.XL: A platformfor webservices.EmProc.Conf. on InnovativeData SystemsResearch (CIDR), 2003.

[89] D. Florescu,A. Y. Levy e A. O. Mendelzon. Databasetechniquesfor the world-wideweb: A survey. SIGMODRecord, 27(3):59–74,1998.

[90] F. T. Fonseca,M. Egenhofer, P. Agouris e G. Camara. Using ontologiesfor integratedgeographicinformationsystems.Trans.in GIS, 6(3):13–19,2002.

[91] E. Friedman-Hill. JESS– therule enginefor thejava platform. http://herzberg.

ca.sandia.gov/jess (asof August2003).

[92] A. L. Furtado,K. C. Sevcik eC. S.dosSantos.Permittingupdatesthroughviewsof databases.InformationSystems, 4(4):269–283,1979.

[93] A. Gal. Semanticinteroperabilityin information services: Experiencingwith Coop-WARE. SIGMODRecord, 28(1):68–75,1999.

[94] A. Gal,G.ModicaeH. Jamil.OntoBuilder:Fully automaticextractionandconsolidationof ontologiesfrom websources.Em Intl. Conf. onConceptualModeling, 2003.

[95] H. Galhardas,D. Florescu,D. Shasha,E. Simone C. Saita. Improving datacleaningquality usinga datalineagefacility. Em Proc. Conf. on Data Managementand DataWarehouses(DMDW), volume39 of CEUR-WS.org, 2001.

[96] E. Gamma,R. Helm, R. Johnsone J.Vlissides.DesignPatterns:Elementsof ReusableObject-OrientedSoftware. Addison-Wesley, Reading,Massachusets,1995.

[97] H. Garcia-Molina,Y. Papakonstantinou,D. Quass,A. Rajaraman,Y. Sagiv, J. D. Ull-man,V. Vassalose J. Widom. TheTSIMMIS approachto mediation:Datamodelsandlanguages.Journalof IntelligentInformationSystems, 8(2):117–132,1997.

[98] H. Garcia-Molina,J.D. UllmaneJ.Widom. DatabaseSystemImplementation. PrenticeHall, UpperSaddleRiver, NJ,2000.

Referencias 121

[99] F. Gingrase L. V. S.Lakshmanan.nd-sql: A multi-dimensionallanguagefor interoper-ability andolap. EmProc.VLDBConf., pp.134–145.MorganKaufmann,1998.

[100] F. Gingras,L. V. S.Lakshmanan,I. N. Subramanian,D. Papoulise N. Shiri. Languagesfor multi-databaseinteroperability. Em Proc. Intl. Conf. on Managementof Data (SIG-MOD), pp.536–538.ACM Press,1997.

[101] A. Gomez-Pereze O. Corcho. Ontologyspecificationlanguagesfor the semanticweb.IEEE IntelligentSystems, 17(1):54–60,2002.

[102] C.GobleeD. D. Roure.Thegrid: An applicationof thesemanticweb. SIGMODRecord,31(4):65–70,2002.

[103] C. A. Goble,D. L. McGuinness,R. Moller e P. F. Patel-Schneider. OilEd a reasonableontologyeditorfor thesemanticweb. Em Intl. DescriptionLogicsWorkshop, volume49of CEURWorkshopProceedings, 2001.

[104] C. A. Goble,R. Stevens,G. Ng, S. Bechhofer, N. W. Paton, P. G. Baker, M. Peim eA. Brass.Transparentaccessto multiple bioinformaticsinformationsources.IBM Sys-temsJournal, 40(2):532–551,2001.

[105] M. F. Goodchild,M. J.Egenhofer, R.FegeaseC. Kottman.InteroperatingGeographicalInformationSystems. Kluwer, 1997.

[106] L. Gravano,P. G. IpeirotiseM. Sahami.QProber:A systemfor automaticclassificationof hidden-webdatabases.ACM Transactionson InformationSystems(TOIS), 21(1):1–41,2003.

[107] J. Gray, A. Bosworth, A. Laymane H. Pirahesh.Datacube: A relationalaggregationoperatorgeneralizinggroup-by, cross-tab,andsub-total. Em Proc. Intl. Conf. on DataEngineering(ICDE), pp.152–159.IEEE,1996.

[108] M. Gruninguer. Applicationsof PSLto semanticwebservices.EmProc.Intl. Workshopon SemanticWeb and Databases(SWDB), pp. 217–230,2003. http://www.cs.

uic.edu/˜ifc/SWDB/ (asof November2003).

[109] M. Gruningere J. L. (eds.). Specialissueon ontologiesapplicationsanddesign.Com-municationsof theACM, 45(2):39–65,2002.

[110] N. Guarino.Formalontologyandinformationsystems.Em Proc. Intl. Conf. on FormalOntologiesin InformationSystems(FOIS), pp.3–15.IOSPress,1998.

Referencias 122

[111] R. Guha,R. McCool eE. Miller. Semanticsearch.EmProc.WWWConf., pp.700–709.ACM Press,2003.

[112] A. Gupta,H. V. JagadisheI. S.Mumick. Dataintegrationusingself-maintainableviews.EmProc.Intl. Conf. onExtendingDatabaseTechnology(EDBT), volume1057of LNCS,pp.140–144.Springer-Verlag,1996.

[113] A. GuptaeI. S.Mumick. Maintenanceof materializedviews: Problems,techniques,andapplications.IEEEDataEngineeringBulletin, 18(2):3–18,1995.

[114] L. M. Haas,P. M. Schwarz, P. Kodali, E. Kotlar, J. E. Rice e W. C. Swope. Discov-erylink: A systemfor integratedaccessto life sciencesdatasources.IBM SystemsJour-nal, 40(2):489–511,2001.

[115] A. Y. Halevy, Z. G. Ives,P. Mork e I. Tatarinov. Piazza:Datamanagementinfrastructurefor semanticwebapplications.EmProc.WWWConf., pp.556–567.ACM Press,2003.

[116] R. Hamadie B. Benatallah.A petri net-basedmodelfor webservicecomposition.EmProc.AustralasianDatabaseConf. (ADC), pp.191–200.AustralasianComputerSociety,2003.

[117] J.Hammer, H. Garcia-Molina,K. Ireland,Y. Papakonstantinou,J.D. UllmaneJ.Widom.Informationtranslation,mediation,andmosaic-basedbrowsingin theTSIMMIS system.Em Proc. Intl. Conf. onManagementof Data (SIGMOD), pp.483.ACM Press,1995.

[118] T. Harder, G. SautereJ.Thomas.Theintrinsicproblemsof structuralheterogeneityandanapproachto their solution.TheVLDBJournal, 8(1):25–43,1999.

[119] V. Harinarayan,A. Rajaramane J.D. Ullman. Implementingdatacubesefficiently. EmProc. Intl. Conf. onManagementof Data (SIGMOD), pp.205–216.ACM Press,1996.

[120] W. Hasselbring.Informationsystemintegration.Communicationsof theACM, 43(6):33–38,2000.

[121] S.HausteineJ.Pleumann.Is participationin thesemanticwebtoodifficulty? EmProc.Intl. SemanticWebConf. (ISWC), volume2342of LNCS, pp.448–453.Springer-Verlag,2002.

[122] M. Hewett. Algernonin java. http://smi.stanford.edu/peopl e/he wett/

research/ai/algernon/ (asof August2003).

[123] D. Hollingsworth. TheWorkflowReferenceModel. Workflow ManagementCoalition,January1995.

Referencias 123

[124] I. HorrockseJ.Hendler, editors.Intl. SemanticWebConf.(ISWC), volume2342of LNCS,Sardinia,Italy, June2002.Springer-Verlag.

[125] B. Husemann,J. Lechtenborger e G. Vossen. Conceptualdata warehousemod-eling. Em 2nd Intl. Workshop on Design and Managementof Data Warehouses,2000. http://sunsite.informatik.rwth- aache n.de /Publ icat ions/

CEUR-WS/Vol- 28/ (asof November2000).

[126] D. K. Hsiaoe M. N. Kamel. Heterogeneousdatabases:Proliferation,issues,andsolu-tions. IEEE Tran.on KnowledgeandDataEngineering, 1(1):45–62,1989.

[127] W. H. Inmon. Building theData Warehouse. JohnWiley andSons,New York, 1996.

[128] InternationalOrganizationfor Standardization.Standardgeneralizedmarkuplanguage(SGML), 1986. ISO 8879.

[129] Y. E. Ioannidis,M. Livny, A. Ailamaki, A. Narayanane A. Therber. ZOO: A desktopemperimentmanagementenvironment. Em Proc. Intl. Conf. on Managementof Data(SIGMOD), pp.580–583.ACM Press,1997.

[130] S.Jablonskie C. Bussler. WorkflowManagement.ModelingConcepts,ArchitectureandImplementation. InternationalThomsonComputerPress,1996.

[131] J. Jannink,P. Mitra, E. Neuhold,S. Pichai,R. Studere G. Wiederhold. An algebraforsemanticinteroperationof semistructureddata. Em IEEE Knowledge and Data Engi-neeringExchangeWorkshop(KDEX), pp.86–100,1999.

[132] Jenasemanticweb toolkit. http://jena.sourceforge.net/ (asof September2003).

[133] G. Karvounarakis,S. Alexaki, V. Christophides,D. Plexousakise M. Scholl. RQL: Adeclarativequerylanguagefor RDF. EmProc.Intl. World WideWebConf., pp.592–503.ACM Press,2002.

[134] V. Kashyape A. Sheth. Semanticheterogeneityin global information systems:Therole of metadata,context andontologies.Em M. P. Papazogloue G. Schlageter, editors,CooperativeInformationSystems, pp.139–178.AcademicPress,SanDiego,1998.

[135] V. Kashyape A. Sheth. Information Brokering Across HeterogenousDigital Data.Kluwer AcademicPublishers,2000.

[136] W. Kent. Themany formsof asinglefact. Em IEEE COMPCON, pp.438–443,1989.

Referencias 124

[137] W. Kim eJ.Seo.Classifyingschematicanddataheterogeneityin multidatabasesystems.IEEE Computer, 24(12):12–18,1991.

[138] S. Kleijnen e S. Raju. An openweb servicesarquitecture. ACM Queue, 1(1):39–46,2003.

[139] M. Klein, D. Fensel,F. van Harmelene I. Horrocks. The relationbetweenontologiesandxml schemas,2001. http://www.ep.liu.se/ea/cis/2 001/ 004/ (asofNovember2003).

[140] M. R. Koivunene E. Miller. W3C semanticweb activity, 2001. http://www.w3.

org/2001/12/semweb- fin/w3csw (asof November2003).

[141] H. Kreger. Fulfilling thewebservicespromise.Communicationsof theACM, 46(6):29–34,2003.

[142] R. Krishnamurthy, W. Litwin e W. Kent. Languagefeaturesfor interoperabilityofdatabaseswith schematicdiscrepancies.Em Proc. Intl. Conf. on Managementof Data(SIGMOD), pp.40–49.ACM Press,1991.

[143] A. Labrinidise N. Roussopoulos.Generatingdynamiccontentat database-backedwebservers:cgi-binvs.mod-perl.SIGMODRecord, 29(1):26–31,2000.

[144] L. V. S. Lakshmanan,F. Sadrie I. N. Subramanian.SchemaSQL- a languagefor inter-operability in relationalmulti-databasesystems.Em Proc. VLDB Conf., pp. 239–250.MorganKaufmann,1996.

[145] L. V. S. Lakshmanan,S. N. Subramanian,N. Goyal e R. Krishnamurthy. On queryspreadsheets.Em Proc. Intl. Conf. on Data Engineering(ICDE), pp. 134–141.IEEE,1998.

[146] D. P. Lanter. Designof a lineage-basedmetadatabasefor GIS. CartographyandGeog-raphyInformationSystems, 18(4):255–261,1991.

[147] O. Lassila e R. R. Swick. Resource Description Framework (RDF):Model and syntax specification, 1999. http://www.w3.org/TR/1999/

REC-rdf- syntax- 19990222 (asof November2003).

[148] D. Leee W. W. Chu. Comparative analysisof six XML schemalanguages.SIGMODRecord, 29(3):76–87,2000.

[149] T. Lee, S. Bressane S. E. Madnick. Sourceattribution for querying againstsemi-structureddocuments.Em Proc.Workshopon WebInformationandData Management,pp.33–39,1998.

Referencias 125

[150] A. Levy. More on data managementfor XML, 1999. http://www.cs.

washington.edu/homes/alon/widom - resp onse. html (as of November1999).

[151] W. Litwin e A. Abdellatif. Multidatabaseinteroperability. IEEE Computer, 19(12):10–18,1986.

[152] W. Litwin, L. Mark e N. Roussopoulos. Interoperability of multiple autonomousdatabases.ACM ComputingSurveys, 22(3):267–293,1990.

[153] L. Liu, D. Buttler, T. Critchlow, W. Han,H. Paques,C. PueD. Rocco.Bioseek:Exploit-ing source-capabilityinformationfor integratedaccessto multiple bioinformaticsdatasources.Em Intl. Symp.on BioInformaticsand BioEngineering(BIBE), pp. 263–274.IEEE ComputerSociety, 2003.

[154] L. Liu e R. Meersman.Thebuilding blocksfor specifyingcommunicationbehavior ofcomplex objects:An activity-drivenapproach.ACM TODS, 21(2):157–207,1996.

[155] L. Liu eC.Pu.Activityflow: Towardsincrementalspecificationandflexible coordinationof workflow activities. Em Intl. Conf. on ConceptualModeling(ER), volume1331ofLNCS, pp.169–182.Springer-Verlag,1997.

[156] L. Liu e C. Pu. Methodicalrestructuringof complex workflow activities. Em Proc. Intl.Conf. on DataEngineering(ICDE), pp.342–350.IEEE,1998.

[157] L. Liu e C. Pu. A transactionalactivity model for organizingopen-endedcooperativeactivities. Em Hawaii Intl. Conf. on SystenSciences(HICSS), 1998.

[158] L. Liu, C. PueW. Han.XWrap: An XML-enabledwrapperconstructionsystemfor webinformationsources.Em Proc. Intl. Conf. on Data Engineering(ICDE), pp. 611–621.IEEE Press,2000.

[159] D. B. Lomet e J. W. (eds.). Specialissueon materializedviews anddatawarehouses.IEEE DataEngineeringBulletin, 18(2),1995.

[160] K. S. M. Gertz. Integratingscientificdatathroughexternal,concept-basedannotations.EmProc.VLDBWorkshoponEfficiencyandEffectivenessof XML ToolsandTechniques(EEXTT), volume2590of LNCS, pp.220–240.Springer-Verlag,2002.

[161] D. S. Mackay. Semanticintegrationof environmentalmodelsfor applicationto globalinformationsystemsanddecision-making.SIGMODRecord, 28(1):13–19,1999.

Referencias 126

[162] A. Matono, T. Amagasa,M. Yoshikawa e S. Uemura. An indexing schemefor RDFandRDF schemabasedon suffix arrays.Em Proc. Intl. Workshopon SemanticWebandDatabases(SWDB), pp.169–187.VLDB endowment,2003.http://www.cs.uic.

edu/˜ifc/SWDB/ (asof November2003).

[163] R. Mattison. Data warehousing:strategies,technologiesand techniques. JohnWileyandSons,New York, 1996.

[164] E. M. Maximilien eM. P. Singh.Conceptualmodelof webservicereputation.SIGMODRecord, 31(4):36–41,2002.

[165] D. L. McGuinness,R. Fikes,J. Hendlere L. A. Stein. DAML+OIL: An ontologylan-guagefor thesemanticweb. IEEEIntelligentSystems, 17(5),Sep2002.

[166] S.A. McIlraith, T. C. Sone H. Zeng. Semanticwebservices.IEEE IntelligentSystems,16(2):46–53,2001.

[167] C. B. Medeirose F. Pires.Databasesfor GIS. SIGMODRecord, 23(1):107–115,marco1994.

[168] C. B. Medeiros,G. Vossene M. Weske. WASA - a workflow-basedarchitectureto sup-port scientificdatabaseapplications.Em Proc. Intl. Conf. on DatabaseandExpertSys-temsApplications(DEXA), volume978of LNCS, pp.574–583.Springer-Verlag,1995.

[169] C. B. Medeiros,M. Weske e G. Vossen.GEO-WASA - combiningGIS technologywithworkflow management.Em Israeli Conf. on Computer-BasedSystemsand SoftwareEngineering, pp.129–139,1996.

[170] J. Meidanis,G. Vossene M. Weske. Using workflow managementin dnasequencing.Em CoopIS, pp.114–123,1996.

[171] E. Mena,A. Illarramendi,V. Kashyape . P. Sheth.OBSERVER: An approachfor queryprocessingin globalinformationsystemsbasedon interoperationacrosspre-existingon-tologies.DistributedandParallel Databases, 8(2):223–271,2000.

[172] E. Mena,V. Kashyap,A. Illarramendie A. P. Sheth. Managingmultiple informationsourcesthroughontologies:Relationshipbetweenvocabulary heterogeneityandlossofinformation.EmKRDB, number4 inCEUR-WS.org, 1996.

[173] E. Mena,V. Kashyap,A. ShetheA. Illarramendi.Domainspecificontologiesfor seman-tic informationbrokeringon theglobalinformationinfrastructure,1998.

Referencias 127

[174] G. A. Mihaila, L. Raschide A. Tomasic.Locatingandaccessingdatarepositorieswithwebsemantics.VLDBJournal, 11(1):41–57,2002.

[175] T. Mikalsen,S. Tai e I. Rouvellou. Transactionalattitudes:Reliablecompositionof au-tonomouswebservices.EmProc.WorkshoponDependableMiddleware-basedSystems(WDMS), 2002.

[176] G. Miller. Thewebservicesdebate– .NET versusJ2EE.Communicationsof theACM,46(6):64–67,2003.

[177] L. Miller, A. SeaborneeA. Reggiori. Threeimplementationsof SquishQL,asimpleRDFquerylanguage.Em Proc. Intl. SemanticWebConf. (ISWC), volume2342of LNCS, pp.423–435.Springer-Verlag,2002.

[178] R. J. Miller. Using schematicallyheterogeneousstructures. Em Proc. Intl. Conf. onManagementof Data (SIGMOD), pp.189–200.ACM Press,1998.

[179] D. S.Milojicic, V. Kalogeraki,R. Lukose,K. Nagaraja,J.Pruyne,B. Richard,S.Rollinse Z. Xu. Peer-to-peer computing. Report HPL-2002-57, HP Labs, Palo Alto,CA, 2002. http://www.hpl.hp.com/techrepor ts/20 02/H PL- 2002- 57.

html (asof December2003).

[180] M. Minsky. A framework for representingknowledge.Em ThePsycology of ComputerVision, pp.211–277.McGraw-Hill, 1975.

[181] M. Missikoff, R. Navigli eP. Velardi. Integratedapproachto webontologylearningandengineering.IEEE Computer, 35(11):60–63,2002.

[182] M. Missikoff, R. Navigli eP. Velardi. TheUsableOntology:An Environmentfor Build-ing and Assessinga Domain Ontology. Em Proc. Intl. SemanticWeb Conf. (ISWC),volume2342of LNCS, pp.39–53.Springer-Verlag,2002.

[183] P. Mitra, G. Wiederholde M. L. Kersten. A graph-orientedmodel for articulationofontology interdependencies.Em Proc. Intl. Conf. on ExtendingDatabaseTechnology(EDBT), volume1777of LNCS, pp.86–100.Springer-Verlag,2000.

[184] G.A. Modica,A. GaleH. M. Jamil.Theuseof machine-generatedontologiesin dynamicinformationseeking.EmProc.Intl. Conf. onCooperativeInformationSystems(CoopIS),volume2172of LNCS, pp.433–448.Springer-Verlag,2001.

[185] Namespacesin XML 1.1.http://www.w3.org/TR/xml- names11/ (asof Octo-ber2003).

Referencias 128

[186] S. Narayanane S. A. McIlraith. Simulation,verificationandautomatedcompositionofwebservices.EmProc.WWWConf., pp.77–88.ACM Press,2002.

[187] M. Neiling, M. SchaaleM. Schumann.WrapIt: Automatedintegrationof webdatabaseswith extensionaloverlaps. Em Proc. Intl. Conf. on Web Databasesand Web Services,volume2593of LNCS, pp.184–198.Springer-Verlag,2002.

[188] W. Nejdl, B. Wolf, C. Qu, S. Decker, M. Sintek,A. Naeve, M. Nilsson,M. Palmer eT. Risch.EDUTELLA: aP2Pnetworking infrastructurebasedonRDF. EmProc.WWWConf., pp.604–615.ACM Press,2002.

[189] S. Nestorov, S. Abiteboul e R. Motwani. Extractingschemafrom semistructureddata.Em Proc. Intl. Conf. on Managementof Data (SIGMOD), pp. 295–306.ACM Press,1998.

[190] N. F. Noy, M. Sintek,S.Decker, M. Crubezy, R. W. Fergersone M. A. Musen.Creatingsemanticwebcontentswith protege-2000.IEEEIntelligentSystems, 16(2):60–71,2002.

[191] Ontologyinferencelayer(OIL). http://www.ontoknowledge.org/oil / (asofNovember2003).

[192] OpenGIS Consortium. Geographymarkup language(GML). http://www.

opengis.net/gml/02- 069/GML2- 12.h tml (asof October2003).

[193] R. Orfali e D. Harkey. Client/ServerProgrammingwith JavaandCORBA. JohnWiley& Sons,2 edition,1998.

[194] R. Orfali, D. Harkey e J. Edwards. TheEssentialDistributedObjectsSurvivalGuide.JohnWiley & Sons,1996.

[195] A. M. Ouksele A. P. Sheth.Semanticinteroperabilityin global informationsystems:Abrief introductionto theresearchareaandthespecialsection.SIGMODRecord, 28(1):5–12,1999.

[196] Web ontology language(OWL) version 1.0. http://www.w3.org/TR/2003/

WD-owl- ref- 20030221/ (asof November2003).

[197] M. T. Ozsue P. Valduriez. Principlesof DistributedDatabaseSystems. PrenticeHall,SanYsidro,CA, 1999.

[198] M. Paolucci,T. Kawamurae K. P. S. T. R. Payne. Semanticmatchingof webservicescapabilities. Em Proc. Intl. SemanticWeb Conf. (ISWC), volume 2342 of LNCS, pp.333–347.Springer-Verlag,2002.

Referencias 129

[199] Y. Papakonstantinou,H. Garcia-Molinae J. Widom. Objectexchangeacrossheteroge-neousinformationsources.EmProc. ICDT Conf., pp.251–260.IEEEPress,1995.

[200] C. Parente S. Spaccapietra.Issuesand approachesof databaseintegration. CACM,41(5):166–178,1998.

[201] P. Patel-Schneidere J. Simeon. Building the semanticweb on XML. Em Proc. Intl.SemanticWebConf. (ISWC), volume2342of LNCS, pp.147–161.Springer-Verlag,2002.

[202] P. Patel-Schneidere J. Simeon. Theyin/yangweb: Xml syntaxandrdf semantics.EmProc. Intl. Conf. onWorld WideWeb, pp.443–453.ACM Press,2002.

[203] G. R. B. Pinto,S. P. J. Medeiros,J. M. deSouza,J. C. M. Strauche C. R. F. Marques.Spatialdataintegration in a collaborative designframework. Communicationsof theACM, 46(3):86–90,2003.

[204] P. F. Pires,M. R. F. Benevidese M. Mattoso. Building reliableweb servicescomposi-tions. EmProc. Intl. Conf. on WebDatabasesandWebServices, volume2593of LNCS,pp.59–72.Springer-Verlag,2002.

[205] V. Poe,P. Klauere S.Brost. Building a Data warehousefor DecisionSupport. PrenticeHall, 1998.

[206] PostgreSQL.http://www.postgresql.org/ asof September2003.

[207] C. Pu,K. SchwaneJ.Walpole. Infosphereproject:Systemsupportfor informationflowapplications.SIGMODRecord, 30(1):25–34,2001.

[208] D. Quasse J. Widom. On-line warehouseview maintenance.Em Proc. Intl. Conf. onManagementof Data (SIGMOD), pp.393–404.ACM Press,1997.

[209] A. R, P. D. Smedt,W. Du, W. Kent,M. A. Ketabchi,W. Litwin, A. Rafii e M.-C. Shan.Thepegasusheterogeneousmultidatabasesystem.IEEEComputer, 24(12):19–27,1991.

[210] S.Raghavane H. Garcia-Molina.Integratingdiverseinformationmanagementsystems:A brief survey. IEEE DataEngineeringBulletin, 24(4):44–52,2001.

[211] W3C’s ResourceDescriptionFramework (RDF). http://www.w3.org/RDF/ (asof October2003).

[212] L. A. Rossetti. Agricultural zoning: Lesseningthe risks of agricultureandprovidingsustainableregionaldevelopment.Em Intl. Symp.on MakingSustainableRegional De-velopmentVisible, 2000.

Referencias 130

[213] M. Schlosser, M. Sintek, S. Decker e W. Nejdl. A scalableand ontology-basedp2pinfrastructurefor semanticwebservices.EmProc.Intl. Conf. onPeer-to-PeerComputing(P2P), pp.104–111.AustralasianComputerSociety, 2002.

[214] L. A. Seffino, C. B. Medeiros,J.V. RochaeB. Yi. WOODSS- aspatialdecisionsupportsystembasedonworkflows. DecisionSupportSystems, 27(1-2):105–123,1999.

[215] W3C’sSemanticwebActivity. http://www.w3.org/2001/sw/ (asof July2003).

[216] U. Shah,T. Finin eJ.Mayfield. Informationretrieval onthesemanticweb. EmProc.Intl.Conf. on Informationand Knowledge Management(CIKM), pp. 461–468.ACM Press,2002.

[217] S. Shekthar, S. Chawla, S. Ravada,A. Fetterer, X. Liu e C. tien Lu. Spatialdatabases-accomplishmentsandreseachneeds.IEEE Transactionson Knowledge andData Engi-neering, 11(1),1999.

[218] A. P. Sheth. Semanticissuesin multidatabasesystems- prefaceby the specialissueeditor. SIGMODRecord, 20(4):5–9,1991.

[219] A. P. ShetheJ.A. Larson.Federateddatabasesystemsfor managingdistributed,hetero-geneous,andautonomousdatabases.ACM ComputingSurveys, 22(3):183–236,1990.

[220] E. Sirin, J. A. Hendlere B. Parsia. Semi-automaticcompositionof webservicesusingsemanticdescriptions.EmProc.WorkshoponWebServices:Modeling, ArchitectureandInfrastructure(WSMAI), pp.17–24.ICEISPress,2003.

[221] T. Sollazzo,S. Handschuh,S. Staabe M. Frank. Semanticweb servicearchitecture–evolving web servicestandardstoward thesemanticweb. Em FLAIRSConf. – SpecialTrack onSemanticWeb, pp.425–429,2002.

[222] M. Stal. Web services:beyond component-basedcomputing. Communicationsof theACM, 45(10):71–76,2002.

[223] L. Stein.Creatingabioinformaticsnation.Nature, 417(6885):119–120,2002.

[224] E. Stolte,C. vonPraun,G. AlonsoeT. Gross.Scientificdatarepositories– designingforamoving target. EmProc.Intl. Conf. onManagementof Data (SIGMOD), pp.349–360.ACM Press,2003.

[225] M. Stonebraker. Implementationof integrity constraintsandviews by querymodifica-tion. Em Proc. Intl. Conf. on Managementof Data (SIGMOD), pp. 65–78.ACM Press,1975.

Referencias 131

[226] Y. Sure,J.AngeleeS.Staab. OntoEdit:Guidingontologydevelopmentby methodologyandinferencing.Em Intl. Conf. on CooperativeInformationSystems(CoopIS), volume2519of LNCS, pp.1205–1222.Springer-Verlag,2002.

[227] Y. Sure,M. Erdmann,J.Angele,S.Staab,R.StudereD. Wenke. Ontoedit:Collaborativeontologydevelopmentfor thesemanticweb. EmProc.Intl. SemanticWebConf. (ISWC),volume2342of LNCS, pp.348–363.Springer-Verlag,2002.

[228] N. Tryfona e C. S. Jensen.Conceptualdatamodelingfor spatiotemporalapplications.GeoInformatica, 3(3):245–268,1999.

[229] A. Tsalgatidoue T. Pilioura. An overview of standardsandrelatedtechnologiesin webservices.DistributedandParallel Databases, 12:135–162,2002.

[230] Universaldescription,discovery and integrationof web services(UDDI). http://

www.uddi.org/ asof October2003.

[231] J.D. Ullman. Informationintegrationusinglogicalviews.EmProc.ICDT Conf., volume1186of LNCS, pp.19–40.Springer-Verlag,1997.

[232] Namingandaddressing(URIs,URLs,... http://www.w3.org/Addressing/ (asof October2003).

[233] M. Uscholde M. Gruninger. Ontologies:principles,methods,andapplications.Knowl-edgeEngineeringReview, 11(2):93–155,1996.

[234] W. M. P. van der Aalst. Don’t go with the flow: Web servicescompositionstandardsexposed.IEEE IntelligentSystems, 18(1):72–85,2003.

[235] W. M. P. vanderAalst,A. HirnschalleH. M. W. Verbeek.An alternativewayto analyzeworkflow graphs.EmAdvancedInformationSystemsEngineering(CAiSE), volume2348of LNCS, pp.535–552.Springer-Verlag,2002.

[236] L. R. Venancio, R. Fileto e C. B. Medeiros. Aplicando ontologiasde objetosge-ograficosparafacilitaranavegacaoemGIS. EmSimposioBrasileiro deGeoinformatica(GEOINFO), 2003.

[237] V. VentroneeS.Heiler. Semanticheterogeneityasaresultof domainevolution.SIGMODRecord, 20(4):16–20,1991.

[238] A. Voisard,C. B. Medeirose G. Jomier. Databasesupportfor cooperative work docu-mentation.EmCOOP, 2000.

Referencias 132

[239] J.Wainer, G.Vossen,M. WeskeeC. B. Medeiros.Scientificworkflow systems.EmNSFWorkshopon WorkflowandProcessAutomation:Stateof theArt andFutureDirections,1995.

[240] J.WangeF. H. Lochovsky. Dataextractionandlabelassignmentfor webdatabases.EmProc. Intl. Conf. onWorld WideWeb(WWW), pp.187–196.ACM Press,2003.

[241] M. Weske,G.Vossen,C.B. MedeiroseF. Pires.Workflow managementin geoprocessingapplications.EmACM-GIS, pp.88–93,1998.

[242] J. Widom. Datamanagementfor XML: Researchdirections. IEEE Data EngineeringBulletin, 22(3):44–52,1999.

[243] G. Wiederhold.Views,objects,anddatabases.IEEE Computer, 19(12):37–44,1986.

[244] G. Wiederhold.Mediatorsin thearchitectureof futureinformationsystems.IEEECom-puter, 25(1):38–49,1992.

[245] G.Wiederhold.An algebrafor ontologycomposition.EmMonterey WorkshoponFormalMethods, pp.56–61,1994.

[246] G. Wiederhold. Interoperation,mediation,and ontologies. Em Intl. Symp.on FifthGeneration ComputerSystems(FGCS) – Workshop on HeterogeneousCooperativeKnowledge-Bases, pp.33–48,1994.

[247] G. Wiederholde J.Jannik.Composingdiverseontologies.Report,StanfordUniversity,1998.

[248] K. Wilkinson, C. Sayerse H. Kuno. Efficient RDF storageand retrieval in jena2.Em Proc. Intl. Workshopon SemanticWeb and Databases, pp. 131–150.Humboldt-Universitat,2003.

[249] J.Williams. Thewebservicesdebate– J2EEversus.NET. Communicationsof theACM,46(6):59–63,2003.

[250] P. Wohed,W. M. P. van der Aalst, M. Dumase A. H. M. ter Hofstede. Patternbasedanalysisof BPEL4WS.ReportFIT-TR-2002-04,QueenslandUniversityof Technology,Queensland,Australia,2002.

[251] A. G. Woodruff e M. Stonebraker. Supportingfine-graineddatalineagein a databasevisualizationenvironment. Em Proc. Intl. Conf. on Data Engineering(ICDE), pp. 91–102.IEEEComputerSociety, 1997.

Referencias 133

[252] M. F. WorboyseS.M. Deen.Semanticheterogeneityin distributedgeographicdatabases.SIGMODRecord, 20(4):30–34,1991.

[253] The W3C web servicesactivity. http://www.w3.org/2002/ws/ (asof October2003).

[254] Web servicesdescriptionlanguage(WSDL) 1.1. http://www.w3.org/TR/wsdl

(asof October2003).

[255] Web ServicesFlow Language(WSFL 1.0). http://www.ibm.com/software/

solutions/webservices/pdf/WSFL. pdf (asof October2003).

[256] W3C’s ExtensibleMarkupLanguage(XML) 1.o(secondedition). http://www.w3.

org/TR/REC- xml (asof October2003).

[257] XML Schemapart0: Primer. http://www.w3.org/XML (asof October2003).

[258] W3C’s XML Query Activity. http://www.w3.org/XML/Query (as of October2003).

[259] E. I. Y, M. Livny, A. Ailamaki, A. Ranganathan,A. Therber, M. Yuin, M. AndersoneJ.Norman.Managingsoil scienceexperimentsusingZOO.EmProc.Conf. onStatisticalandScientificDatabaseManagement, pp.121–124.IEEE,1997.

[260] N. Zhong,J. Liu e Y. Y. (eds.). SpecialIssue– In Searchof the WisdomWeb. IEEEComputer, 35(11):27–76,2002.

[261] Y. Zhuge,H. Garcia-Molina,J.HammereJ.Widom.View maintenancein awarehousingenvironment. Em Proc. Intl. Conf. on Managementof Data (SIGMOD), pp. 316–327.ACM Press,1995.

[262] Zoningcoffeein Brazil. http://orion.cpa.unicamp.br/caf e (in Portuguese– asof October2003).

Annex I

Formal Definitions and PropertiesforPOESIA

This annex presentsall the formal definitionsandproofsof theoremsthatenablethePOESIAontology-basedapproachfor resourcesdiscoveryandcomposition.It is organizedasfollows:

à SectionI.1 formally describesthe structureof a POESIA ontologyand the basiccon-ceptsrelatedwith suchanontology, suchaspathsandsemanticencompassingbetweenontologyterms.

à SectionI.2 definesontologicalcoveragesandthesemanticrelationshipsof encompassingandequivalencebetweencoverages.

à Finally, SectionI.3 presentstheconceptsof activity pattern(whichcanreferto simpleorcompositeservices),specializationandaggregationof activity patterns,processframe-work andspecificprocessesfor particularneeds.

Theconceptsof SectionI.3 arebasedon thenotionof ontologicalcoveragesandsemanticrelationshipsbetweenthesecoverages.Therulesusedto defineprocessframeworksandspecificprocessesensurethe semanticconsistency of the compositionsof services,accordingto theirassociatedontologicalcoverages.

134

I. FormalDefinitionsandPropertiesfor POESIA 135

I.1 POESIA Ontologies

Definition I.1 statesthat for a setof words ô to beconsideredconsistent,at mostoneseman-tic relationshipof theset õ (which includessynonym, IS A, andPARTOF, with their inverserelationships)occursbetweenany pair of words.

Thefollowing definitionsandtheoremsshow thatany setof semanticallyconsistentwordsô canberepresentedasagraphö'÷ , whosenodescorrespondto maximalsetsof wordsthataresynonym of eachother, andwhoseedgesaresemanticrelationshipsbetweensetsof synonyms.Thegraph ö'÷ is calledanarrangementof semanticallyconsistentwords(Definition I.3) if it isacyclic andconnected.

Definition I.1 (Set of Semantically ConsistentWords) Let ô be a finite setof words for auniverseof discourse ø . ô is a setof semanticallyconsistentwordswith respectto thesetofsemanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ E Õaú�ȳÅ�ò�¿�úVÙ E Õaúsȳ¾�¿�ú�Ù E Õ³¾�Ê¥¾�¿�ú�Ù E ÙûÅ�ò¨¾s¿�úVÙ�üiff:

µuýX·Uõ.¸¨þ ýþ º ×/Ò ÓXÿ ·UõñÀ�Äƽ�ÕIÁÖÕÑÉ�Á£ÿ ÒÝ0ý � þfÿûþ ºi.e., for anypair of words þ E þ�º · ô at mostonesemanticrelationshipýÌ·)õ leadsfrom þ toþ�º .

Definition I.2 (Maximal Setsof Synonymous)Givena setof semanticallyconsistentwords ôwith respectto thesetof semanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ E Õaú�ȳÅ�ò�¿�úVÙ E Õaúsȳ¾�¿�ú�Ù E Õ³¾�Ê¥¾�¿�ú�Ù E ÙûÅ�ò¨¾s¿�úVÙ�ü� É����;úV¿��ñô is a maximalsetof synonymousfrom ô iff thefollowingconditionsaresatisfied:

1.� É���;úV¿�ÒÝ�

2. µFþ ��þºa· � É����;ú�¿ð¸�þ À�úV¿q¾s¿�úVÙ/þº

3. µFþ ��þ º ·�ô0¸³°¥þ�· � É����WúV¿ � þ º Ò· � É����;ú�¿q´¢× �&°�þ�À^úV¿q¾s¿�úVÙ/þ º ´

I. FormalDefinitionsandPropertiesfor POESIA 136

Algorithm 1 – Generatethe collectionof maximal sets

Givena setof semanticallyconsistentwords ô for the universeof discourse ø , apply thefollowingsequenceof stepsto partition ô in a collectionof maximalsetsof synonymous.

1. Let ô Ý:ùsþ�ç�� è^è^è ��þ�éoü .Build thelist of unitary setsof words �'É�ò�ÁÖÀÝ�°Ãùsþ�ç�ü�� è^è^è ��ùsþ�éoü�´ .

2. If Ó������%º³·���É�òsÁÖÀ&À^Äƽ©Õ�ÁÖÕÑÉ�ÁqµFþ�·�����þ�º³·��&º³¸�þ À^úV¿q¾s¿�úVÙ�þ�ºthenremove � and � º from �'É�ò�ÁÖÀ andinsert °������ º ´ in �'ÉoòsÁÖÀ .

3. Repeatstep2 until

�&°�Ó������ º ·���É�òsÁÖÀ���þ�·�����þ º ·�� º À^Äƽ©ÕIÁÖÕ³É�Á*þ�À^úV¿q¾s¿�úVÙ/þ º ´Whentheexecutionof algorithm1 stops,thepartition of ô in thecollectionof maximalsets

of synonymousis availablein thelist ��É�òsÁÖÀ .

Theorem I.1 Let ô bean arbitrary setof semanticallyconsistentwordswith respectto thesetof semanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ���Õaú�ȳÅ�ò�¿�úVÙ���Õaúsȳ¾�¿�ú�Ù��¤Õ³¾�Ê¥¾�¿�ú�Ù��MÙûÅ�ò¨¾s¿�úVÙ�ü

Theset ô canbepartitionedin a collectionof maximalsetsof synonymous.

I. FormalDefinitionsandPropertiesfor POESIA 137

Proof:

Algorithm1 generatesthepartition of ô in a list of maximalsetsof synonymousdenominated�'É�ò�ÁÖÀ . Thecorrectnessof algorithm1 canbeverifiedin two phases.

Phase1: Algorithm1 partitions ô in a collection ��É�òsÁÖÀ of setsof synonymous.

Weproveit by inductionon thenumberof repetitionsof step2.

Base: After executingstep1 (and before the first executionof step2 of algorithm 1,��É�òsÁÖÀ�Ý/°hùsþ�ç¤ü�� è^è^è ��ùsþ�éoü�´ . Therefore:

1. � �"!$#&%('*),+-�ñÝ0ô2. µ.��·��'É�ò�ÁÖÀ#¸��/ÒÝ/3. µ.��·��'É�ò�ÁÖÀ�0�þ ��þº�·���¸�þ À�úV¿q¾s¿�úVÙ/þº

Step: Each executionof step2 of algorithm1 preservesconditions1, 2 and3 of thebase,becausestep2 can only replacea pair of setsof synonyms� and � º from �'É�ò�ÁÖÀwith theunion �����%º whenµuþ�·�����þº³·��%ºÆ¸�þ À�úV¿q¾s¿�úVÙ/þº .

Phase2: Each setof synonymouspresentin ��É�òsÁÖÀ by theendof theexecutionof algorithm1is maximal.

It is guaranteedby theconditionto stoptheloop in step3:

�&°�Ó������ º ·���É�òsÁÖÀ���þ�·�����þ º ·�� º À^Äƽ©ÕIÁÖÕ³É�Á*þ�À^úV¿q¾s¿�úVÙ/þ º ´

I. FormalDefinitionsandPropertiesfor POESIA 138

Theorem I.2 Givena setof semanticallyconsistentwords ô with respectto thesetof semanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ���Õaú�ȳÅ�ò�¿�úVÙ���Õaúsȳ¾�¿�ú�Ù��¤Õ³¾�Ê¥¾�¿�ú�Ù��MÙûÅ�ò¨¾s¿�úVÙ�ü

There existsa directedgraph ö'÷*°?±³÷1��2&÷Æ´ that organizesthe words of ô according to the se-manticrelationshipsin õ .

Proof: Let �'É�ò�ÁÖÀ bethecollectionof setsof synonymousobtainedby patitioning ô with algo-rithm 1. ö#÷*°�±Ñ÷1��2�÷Æ´ hasthefollowingconstitution:

à ±Ñ÷ is thesetof verticesof ö#÷à �Þ·U±³÷F¼ ��·���É�òsÁÖÀà 2&÷ is thesetof directededgesof ö'÷à °��3�4� º ´�·�2�÷F¼ °�µuþ�·��3�Mþ º ·�� º ¸�þ ÕaúsȳÅ�ò�¿�ú�Ù�þ º Ç1þ�ÕѾ�Ê¥¾s¿�úVÙ(þ º ´

Definition I.3 (Arrangement of SemanticallyConsistentWords) Let ô bea setof semanti-cally consistentwordswith respectto thesetof semanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ���Õaú�ȳÅ�ò�¿�úVÙ���Õaúsȳ¾�¿�ú�Ù��¤Õ³¾�Ê¥¾�¿�ú�Ù��MÙûÅ�ò¨¾s¿�úVÙ�üThearrangementof semanticallyconsistentwords of ô (or simplythe arrangementof wordsfrom ô ) is a graph ö'÷*°?±³÷1��2&÷Æ´ , with thesetof vertices±³÷ andsetof edges 2�÷ , such that thefollowingconditionsareverified:

1. µFþ�·�ôÞ¸VÓ���·U±Ñ÷�À�Ä�½©Õ�ÁÖÕÑÉ�Á£þ�·��2. µ.��·�±³÷10�þ ��þ º ·���¸�þfÀ�úV¿q¾s¿�úVÙ�þ º3. µ.��·�±³÷1��þ0·��3�Mþ º ·�ôÞ¸�þ º65·�� × ��°¥þ À�úV¿q¾s¿�úVÙ/þ º ´4. °��3�4�&º ´�·�2�÷.¼ °�µuþ�·��3�Mþº³·��%ºÆ¸�þfÕaúsȳÅ�ò�¿�ú�Ù�þº�Ç1þ�ÕѾ�Ê¥¾s¿�úVÙ(þº )5. ö#÷ is acyclic

6. ö#÷ is connected

I. FormalDefinitionsandPropertiesfor POESIA 139

DefinitionsI.4 to I.6 describetheencompassrelationshipbetweentermsof a POESIAon-tologyandTheoremI.3 shows thatthis relationshipis transitive.

Definition I.4 (Path) Let ô bea setof semanticallyconsistentwordswith respectto thesetofsemanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ���Õaú�ȳÅ�ò�¿�úVÙ���Õaúsȳ¾�¿�ú�Ù��¤Õ³¾�Ê¥¾�¿�ú�Ù��MÙûÅ�ò¨¾s¿�úVÙ�üLet ö'÷*°?±³÷1�42�÷Æ´ beanarrangementof semanticallyconsistentwordsfor thewordsof ô , where±Ñ÷ is thesetof verticesof ö#÷ and 2&÷ thesetof edgesof ö#÷ .A path fromvertex �6ç#·�±Ñ÷ to vertex �&éA·�±³÷ is a sequenceof directededgesof 2&÷ leadingfrom �6ç to �&é , with theform:

°��6ç7�4�98�´�� è^è^è �s°:�%é<;Ñç��4�&é�´where °��&î=���&î?>�çÖ´w·�2&÷j°Ãí�@fÂBA�¿ ´ .

Definition I.5 (Vertices Reachability) If there is a path from vertex �Cç to vertex �%é in thearrangementof semanticallyconsistentwords ö#÷ , thenwesaythat �%é is reachablefrom �Cç inö#÷ , denotedby

�6çDC �&é

Definition I.6 (SemanticEncompassing)Let ô bea setof semanticallyconsistentwordswithrespectto thesetof semanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ���Õaú�ȳÅ�ò�¿�úVÙ���Õaúsȳ¾�¿�ú�Ù��¤Õ³¾�Ê¥¾�¿�ú�Ù��MÙûÅ�ò¨¾s¿�úVÙ�ü

Let ö'÷*°?±³÷1�42�÷Æ´ bethearrangementof semanticallyconsistentwords for thewordsof ô , andþ ��þºÑ·�ô betwo arbitrary wordssuch that:

��°�þ6´w·�±³÷ � þ�·���°�þC´ � ��°�þ º ´�·U±Ñ÷ � þ º ·���°�þ º ´

Theword þ encompassestheword þ�º , denotedby þFEfþ�º , iff:

�I°¥þ6´WÝ/��°�þ º ´ � �I°¥þ6´DC �I°¥þ º ´

I. FormalDefinitionsandPropertiesfor POESIA 140

Theorem I.3 Let ô bea setof semanticallyconsistentwordswith respectto thesetof semanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ���Õaú�ȳÅ�ò�¿�úVÙ���Õaúsȳ¾�¿�ú�Ù��¤Õ³¾�Ê¥¾�¿�ú�Ù��MÙûÅ�ò¨¾s¿�úVÙ�üTheencompassrelationshipamongthewordsof ô is transitive.

Proof:

Let þG��þ�ºH��þ�º º³·�ô such that

þFEfþ º � þ º Efþ º º (1)

Let ö'÷*°?±³÷1��2&÷Æ´ bethearrangementof semanticallyconsistentwordsfor ô .

Let �I°¥þ6´7�4�I°¥þº ´7����°�þ�º º»´�·U±Ñ÷ such that:

þ�·���°�þ6´ � þ º ·���°�þ º ´ � þ º º ·���°�þ º º ´

Then,from(1) anddefinitionI.6:

þFEfþ º � þ º Eñþ º º ß

ß °��I°¥þ6´WÝI�I°¥þ º ´ÌÇJ�I°¥þ6´DC �I°¥þ º ´M´ �°��I°¥þ º ´WÝ/�I°¥þ º º ´ÌÇK��°�þ º ´DC �I°¥þ º º ´M´ãß

ß °:��°�þC´WÝ���°�þ º ´ � ��°�þ º ´SÝ���°�þ º º ´Ö´jÇ°:��°�þC´WÝ/�I°¥þ º ´ � ��°�þ º ´LC ��°�þ º º ´Ö´jÇ°:��°�þC´DC ��°�þ º ´ � ��°�þ º ´SÝ���°�þ º º ´Ö´jÇ°��I°¥þ6´DC �I°¥þ º ´ � �I°¥þ º ´DC ��°�þ º º ´Ö´ ×

× ��°�þ6´WÝI�I°¥þ º º ´ÌÇK��°�þC´DC ��°�þ º º ´ ß þMEfþ º º

I. FormalDefinitionsandPropertiesfor POESIA 141

Definition I.7 describesa termasa word that is anspecializationor an instanceof anotherword, calleda concept.Termsareorganizedin arrangementsof semanticallyconsistentwords(asdescribedin Definition I.3), calledarrangementsof semanticallyconsistentterms. Eacharrangementof termshasan associatedarrangementof semanticallyconsistentconcepts,toqualify its terms. Definition I.8 describesa domainspecificontology as a collection of ar-rangementsof semanticallyconsistentterms,with therespective arrangementsof semanticallyconsistentconcepts.Eachpairof arrangementsof wordsreferto adimensionof theontology.

Definition I.7 (Term) Let N and O betwo setsof semanticallyconsistentwordswith repecttothesetof semanticrelationships

õ0Ýóù�À�úV¿q¾s¿�úVÙ���Õaú�ȳÅ�ò�¿�úVÙ���Õaúsȳ¾�¿�ú�Ù��¤Õ³¾�Ê¥¾�¿�ú�Ù��MÙûÅ�ò¨¾s¿�úVÙ�ü

Considerthat N and O satisfythefollowingconditions:

1. µFÄA·�N ¸6Ä refers to a concept

2. µFþ�·�Of¸'ÓIÄ�·�N À�Äƽ�ÕIÁÖÕÑÉ�ÁAþ�Õaú�ÈѾ�¿�ú�Ù/Ä0Ç1þ�Â�¿qÀ�ÁØÉ�¿�½©Å%Ä

3. µFþ ��þ º ·�Oñ¸'ÓIÄP�MÄ º ·�N À^Äƽ©ÕIÁÖÕ³É�Áw¸(a) þfÕaú�ÈѾ�¿�ú�Ù.Ä�Ç�þ�Â�¿qÀ�ÁØÉ�¿�½©Å%Ä(b) þºoÕaúsȳ¾s¿�úVÙ/ÄѺ�Ç1þº¨Â�¿qÀ�ÁØÉ�¿�½©Å%ÄѺ(c) µûÿ ·Uõó¸CþfÿAþ�º#¼ Ä�ÿNijº

A termis a word Á�·�O . Term Á canbedenotedby Qa°{ÁÖ´ , whereq, calledthequalifierof Á , satisfythecondition:

QK·�N � °¥Á£Õaúsȳ¾s¿�úVÙRQ Ç ÁÐÂ?¿ À�ÁØÉ�¿q½_ÅSQ�´

Definition I.8 (Domain SpecificOntology) A domainspecificontology á is a collectionof ar-rangementsof semanticallyconsistentterms ù�ö çT � è^è^è �¤ö éT üX°�¿)ì(ís´ , where each arrangementö îT °Ãí3@fÂU@f¿q´ characterizesdimension of theapplicationdomain.

I. FormalDefinitionsandPropertiesfor POESIA 142

Definition I.9 (Specificationof a Path in an Arrangement of Terms) Let ö T °�± T �42 T ´ beanarrangementof semanticallyconsistenttermsrelatedto somearbitrary dimensionof a domainspecificontology á , where ± T is thesetof verticesof ö T and 2 T is thesetof edgesof ö T .Let öWV³°�±"V"�42XV�´ bethearrangementof thesemanticallyconsistentconceptsusedasthequali-fiersof thetermsorganizedin ö T , where ±&V is thesetof verticesof öWV and 2SV thesetof edgesof ö�V .Anspecificationof a pathin ö T is a sequenceof termsof theform:

Y ÝIQ�ç�°¥Á�çØ´7Z[Q$8�°¥Á\8_´�Z è^è^è Z]Q_éÑ°{Áhé�´satisfyingthefollowing conditions:

1. Ó��SV�·U±"V#À�Äƽ�Õ�ÁÖÕÑÉ�ÁPQ�ïC·��SV °¥¿Uì�í�0�í�@_^`@f¿ ´2. Ó�� T ·U± T À�Äƽ�Õ�ÁÖÕÑÉ�ÁPQ�ï�·�� T °¥¿Uì�í�0�í�@a^.A�¿q´3. °{Á�ïb�MÁ�ïc>�çÖ´�·U± T °�¿Uì�í�0�í�@a^.A ¿q´

Definition I.10 (UnambiguousReferenceto a Term) Let á bea domainspecificontologyand

�jÁÃò�° Y ´WÝedfQsç©°¥Á�çM´�Z]Qg8�°¥Á\8�´�Z è^è^è Z[Q©éÑ°¥Áhé�´(h

bethestring correspondentto thespecificationof path

Y ÝIQ�ç�°¥Á�çØ´7Z[Q$8�°¥Á\8_´�Z è^è^è Z]Q_éÑ°{Áhé�´leadingto theterm Q©éa°¥Áhé�´ in thearrangementof semanticallyconsistenttermsfor somedimen-sionof theontology á .

Thepath �jÁÃòa° Y ´ is anunambiguousreferenceto theterm Q_éÑ°{Áhé�´ iff �jÁÃò�° Y ´ is uniqueamongallthestrings �;ÁÃò�° Y º ´ producedfromanypath

Y º Ý�Q ºç °¥Á º ç ´�Z]Q º8 °¥Á º 8 ´�Z è^è^è Z[Q ºé °¥Á ºé ´

in any arrangementof semanticallyconsistenttermsin á , by usingthe samemethodas thatusedto produce�jÁÃòa° Y ´ from

Y.

I. FormalDefinitionsandPropertiesfor POESIA 143

I.2 Ontological coveragesand their relationships

The definitionsand theoremsin this sectioncan be summarizedas follows. Definition I.11saysthat an ontologicalcoverageis a tuple of termsfrom a POESIA ontology. DefinitionI.12 statesthat the universalcoverageis the emptytuple. Definition I.13 definesontologicalcoveragesencompassing.TheoremI.4 shows thattheuniversalcoverageencompassany othercoverage,andTheoremI.5 showsthattheencompassrelationshipamongontologicalcoveragesis transitive. Finally, equivalentontologicalcoverages,asstatedby Definition I.14, reciprocallyencompasseachother.

Definition I.11 (Ontological Coverage)Let

á�Ýóù�ö çT °:2 çT ��± çT ´7� è^è^è �¤ö éT °�± éT ��2 éT ´¤ü °�¿Uì�ís´

bea domainspecificontology, where ö ï T °�± ïT �42 ïT ´�°Øí�@i^j@0¿ ´ is thearrangementof semanti-cally consistenttermsfor dimension of á , ± ïT is thesetof verticesof ö ï T and 2 ïT is thesetofedgesof ö ï T .Anontological coverage takenfrom á is anm-tuple

ä Á�ç7� è^è^è �ÖÁlk¢ê/°�Ù9ìnm�´satisfyingthecondition:

µ�Áh)äæÁ�ç7� è^è^è �MÁlkSêиVÓKö ï T °�± ïT ��2 ïT ´�·Uáo04��·U± ïT À�Ä�½©Õ�ÁÖÕÑÉ�Á*Áhî*·��

Definition I.12 (Universal Coverage) The universal coverage (denotedby p ) is the emptytuple. p doesn’t restrict theapplicationscopein anydimensionof theontology for theappli-cationdomainit refers to.

Definition I.13 (Ontological CoveragesEncompassing)Let

Ï:Ý.äæÁ�ç7� è^è^è Áhé�êIÉ�¿LqñÏ º Ý.äæÁ º ç � è^è^è Á ºk êÌ°¥¿6��Ù9ì�ís´

I. FormalDefinitionsandPropertiesfor POESIA 144

betwo ontological coveragestakenfromthesamedomainspecificontology á .

Let O bethesetof semanticallyconsistenttermsorganizedin thearrangementof terms ö T foranarbitrary dimensionof á .

Wesaythat Ï encompassesÏ º (denotedby ÏrE Ï º ) iff:

µ�Áh�Ï/¸oÓIÁ ºï ·�Ï º À�Äƽ�ÕIÁÖÕÑÉ�Á Áhîj·�O � Á ºï ·�O � ÁhîPE Á ºï

Theorem I.4 Theuniversal coverageencompassesanyontological coverage.

Proof:

It followsdirectlyfromdefinitionsI.12 andI.13.

FromdefinitionI.12:

Ò Ó�Áw·�p

Then,for anyontological coverage Ï#º takenfromanarbitrary domainspecificontology á .

Ò Ó�Áh�pâÀ^Äƽ©Õ�ÁÖÕÑÉ�ÁjÓ�Á ºï ·�Ï º À�Äƽ©Õ�ÁÖÕ³É�ÁÐÁhîPE Á ºï (2)

Therefore, accordingto (2) anddefinitionI.13 theuniversalcoverage( p ) encompassesanyontological coverage, including p itself.

Theorem I.5 Theencompassrelationshipamongontological coveragesdefinedwith respecttoa givendomainspecificontology is transitive.

Proof:

Let Ïç , Ïs8 and Ïst bearbitrary coveragessuch that

ÏçsEñÏs8 � Ïs89EñÏst (3)

I. FormalDefinitionsandPropertiesfor POESIA 145

Let ÁhUÏwî ( í3@fÂU@nu ) bean arbitrary termof thecoverage Ïwî .

Fromequation(3) wehavethefollowing deductions:

ÏçsEñÏs8w× µuÁ�çw·�Ïçw¸VÓ�Á\8·�Ïs8¢À^Äƽ©Õ�ÁÖÕÑÉ�ÁÐÁ�çvE Á\8 (4)

Ïs8wEñÏstw× µuÁ\8�·�Ïs8�¸VÓ�Á\t·�Ïst¢À^Äƽ©Õ�ÁÖÕÑÉ�ÁÐÁ\89E Á\t (5)

Then,fromequations(4) and(5), andthetransitivenessof theencompassrelationshipbe-tweenwordsor terms(theoremI.3), wecandeducethat

µFÁ�çw·�Ïç�¸VÓ�Á\t�·�ÏstSÀ�Äƽ©ÕKÁÖÕÑÉ�ÁÐÁ�çsE Á\t (6)

Now, supposethat

��°�ÏçsEñÏst¤´ (7)

Wecandeducefrom(7) that

ÓIÁ�ç�·�Ïç*À^Äƽ©ÕIÁÖÕ³É�Á Ò Ó�Á\t�·�ÏstSÀ^É�ÁÃÂhÀyx�úVÂ�¿�z#Á�çvE Á\t (8)

Which is a contradictionwith (6) derivedfrom(3).

Definition I.14 (Ontological CoveragesEquivalence)Given two ontological coverages, Ïand Ï6º , wesaythat Ï is equivalentto Ï#º , denotedby Ï:ß�Ï#º , iff:

ÏREñÏ º � Ï º EñÏ

I. FormalDefinitionsandPropertiesfor POESIA 146

I.3 ServicesComposition in POESIA

This sectionpresentsthe formal definitionsrelatedwith thecompositionof services,thatalsoappearin Chapter3, for summarizationpurposes.Definition I.15 describesa simpleor com-positeserviceasanactivity pattern,with aname,anassociatedontologicalcoverage,inputandoutputports,anda taskdefinition. DefinitionsI.16 to I.19 describethe rulesfor thesemanti-cally consistentcompositionof activity patterns,in termsof the interconnectionof their inputandoutputports,andthesemanticrelationshipsamongtheir associatedontologicalcoverages.Aggregationandspecializationarethebasicoperationsfor composingactivity patternsin pro-cessesframeworksandto adapttheseframeworks(by takingappropriatespecializedversionsoftheir activity patterns),whenbuilding processesfor specificneeds.For extensive descriptionsof thesedefinitions,seeSection3.4.

Definition I.15 Anactivity pattern { is a five-tuple:

°:|~} � 2`��Ï�ÚI± 2 Û �c��|��¤ÚOø Y � Y }��U�U´where:

|~} � 2 is thestringusedasthenameof {Ï'ÚK± 2 Û is theontological coverageof {

i.e., expressesits utilizationscope��| is thelist of input parametersof {ÚOø Y is thelist of outputparametersof {Y }W��� describestheprocessingchoresthat { does

Definition I.16 Activitypattern { is an aggregationof theactivitypatterns��ç7� è^è^è �4�aéX°¥¿�ì:ís´if thefollowingconditionsareverified(let í�@fÂf�l^.@f¿60MÂÒÝ_^ for each condition):

1. ��� îL�y�.�v��� ���L�S�� �`�s�_�G� � î:�G�/�w�9�X�9�����L�S�� �w�9�S�w��� � î��2. ��� î*� � ï ���.�v��� � � î �s�� �`�s�_�G� � ï � �/�9�w�X�w��� � î �s�� �9�w�S�w��� � ï �3. ��� î �<�w�9�9�9�����L�U� � �9�w�X�w��� � î � ���w�9�S�w��� � î �U� � �w�9�9�9�����L�4. �v�`�   �����D�6��¡ � î-¢�£-¤4¥o¦l¥�§<¦ �`�   ��� � î �5. �v�`� �o¨B©����D�6�<¡ � î ¢�£-¤4¥o¦l¥�§<¦ �`� �o¨B©�� � î��6. ��� î*� � º �ª  ��� � î �«� � º �ª  �����L�G�r�¬¡ � ï�¢7£�¤�¥o¦=¥�§�¦ � º � �o¨B©�� � ï �l�7. ��� î*� � º � �o¨­©o� � î �6� � º � �o¨­©o���L� �r�¬¡ � ï�¢�£-¤�¥o¦=¥�§�¦ � º �ª  ��� � ï �l�

I. FormalDefinitionsandPropertiesfor POESIA 147

Definition I.17 Activitypattern � is a specializationof theactivitypattern { (conversely { is ageneralizationof � ) if thefollowingconditionsareverified:

1. �`�s�_�G���D�X�� �.�v��� � � �����9�w�S�w�����D�X�� �9�w�S�9��� � �2. �9�9�X�9�����L�B� � �w�9�X�9��� � �3. �v�`�   �����D�6��¡ � º �ª  ��� � � ¢7£�¤�¥o¦l¥�§<¦ ��®w� º4. �v�`� �o¨B©����D�6�<¡ � º � ��¨B©o� � � ¢7£�¤�¥o¦=¥�§�¦ ��®w� º

Definition I.18 A processframework is a directedgraph ¯6°?±Æ²L��2�²³´ satisfyingthe followingconditions:

1. ±³² is thesetof verticesof ¯2. 2�² is thesetof edgesof ¯3. µF¶F·U±³²A¸�¶ is anactivity pattern

4. °¹¶¯��¶�º»´�·�2�².¼ ¶�º�½_¾�¿qÀ�ÁÃÂ�ÁÃÄÆÅ�¿�Á*¶ Ç�¶oº�ÀMȳÅ^½_Â�ÉoÊ¥Â�Ë�É�Áæ¾s¿Ì¶

5. ¯ is acyclic

6. ¯ is connected

Definition I.19 A processspecification ÍK°?±³ÎP��2&Îq´ associatedwith a utilization scopeex-pressedby an ontological coverage Ï is a subgraph of a processframework satisfyingtheproperties:

1. µ�°¹¶¯��¶�º»´�·�2&Î�¸�¶�º�½_¾�¿qÀ�ÁÃÂ�ÁÃÄÆÅ�¿�Áж

2. µF¶F·U±ÑÎ�¸°_Ò ÓK¶�º³·U±ÑÎÔÀ�Äƽ©ÕKÁÖÕÑÉ�Á§°

¹¶¯��¶�º»´w·�2�Îq´W× ¶6¦À&É�ÁؾsÙ̦½

3. µF¶F·U±ÑÎ�¸oÏ�ÚI±32 Û °�¶�´�Ü ÝÞÏ

Annex II

POESIA Ar chitectureandImplementation Issues

An informationsystemsupportingthePOESIAapproachhasthreecategoriesof modules,com-municatingthroughtheInternet:

Ontology services encapsulateontologiesandallow several applicationsto usetheseontolo-gies. An ontologyserver providesaccessandadaptationmeansfor several ontologiesin differentdomains.Thesharingof ontologiesamongapplicationsenablescooperativeprocesses,usingresourcesdistributedacrosstheWeb.

Application services supportthe definition, compositionandexecutionof services,usingdo-main ontologiesprovided by ontology services. Compositeservices(i.e., cooperativeprocesses)arehandledasworkflows runningon theWeb. A workflow is associatedwitha uniqueontology. An ontology, on theotherhand,canbeassociatedwith severalwork-flows. A workflow andeachoneof its componentservicesanddataflows areassociatedwith ontologicalcoveragesthat refer to termsof thesameontology. Thecompositeser-vicesof agivenworkflow arealsohandledasworkflows,andareassociatedwith thesameontologyastheirparent.

Service brokers servicebrokersprovide facilities to searchfor servicesavailableon the Webto fulfill specificneeds,which are expressedby servicedescriptions(e.g., denotedinDAML-Services[14]) andontologicalcoverages.An ontologybroker hasthecapabilityto adaptaprocessframework to aparticularneed,by choosingtheversionsof thecompo-nentservicescompatiblewith theintendedontologicalcoverageof thedesiredprocess.

Figure1 illustratesthe role of ontologyservicesandapplicationserviceson supportingaPOESIAapplicationfor the agriculturedomain. Ontology Server 1 providesthreeon-tologiesfor differentbut overlappingdomains.TheSupply Chain Ontology is a subset

148

II. POESIAArchitectureandImplementationIssues 149

of theLogistics Ontology . Theseontologiesrefer to theproductionanddistribution ofgoodsto satisfyany kind of need(e.g.,food,energy, water).TheAgriculture Ontology ,in turn,hassomeintersectionwith thespecializationof theformerontologiesto theagriculturerealm. Eachof thesethreeontologiesis referredto by several workflows, for the respectiveapplicationdomains.A givenworkflow, on theotherhand,canonly beassociatedwith oneon-tology. Theassociationof theworkflow with theontologyis fundamentalto enablethePOESIAfacilitiesfor managingtheresourcesnecessaryto executetheworkflow.

Figure1: POESIASystemArchitecture

TheAgriculture Workflow 1 is associatedwith theAgriculture Ontology .Figure1 presentssomedetailsof this workflow on thetop left corner. This workflow involvesthe cooperationof 5 services. The dataconnectionsamongtheseservicesare indicatedbyarrows. Thus,accordingto thefigure, theoutputof Service 1 is inputedin Service 2,theoutputof Service 2 is inputedin Service5, andsoon. Theexecutionof theseservicesaresupportedby differentapplicationservers.Application Server 2 is responsibleforthedefinitionandexecutionof Service 1, Service 2, Service 5 andthecoordinationof the executionof all componentservicesof Agriculture Workflow 1. On the otherhand,Service 4 andService 5 areindividuallydefinedandexecutedin Application

Server 1.

II. POESIAArchitectureandImplementationIssues 150

The internalarchitectureof eachapplicationserver includes:(i) a ServiceDefinitionTool,for building the definition of the servicesthat are provided by that applicationserver; (ii) aServicesExecutionEngine, for executingthe local servicesandmanagingthe local datasets,accordingwith the definitionsof the services;and(iii) an ExternalConnectionsManager, tomanagetheconnectionsof local serviceswith externalservices(i.e., supportedby otherappli-cationservers)andontologyservers.

The OntoCover Java library, implementedin this thesis(seeChapter5), enablesthe con-structionandutilization of ontologyviews to managedataandservicesin POESIAapplica-tions. OntoCover will beusefulto implementontologyservers,applicationserversand/orser-vice brokers. The exact points were the ontology views will be built (the ontology servers,or theapplicationserversandservicebrokers),dependson moredetailedsystemsdesign,andfurtherexperimentsto determinethemostappropriatesolutionsin practice.OntoCover’sfacili-tiesfor browsingontologyviewsandmanagingontologicalcoveragesdefinedover theseviewsarecertainlyusefulfor implementingapplicationservers.In this thesis,we incorporatedOnto-Cover in WOODSS(Workflow-basedDecisionSupportSystem),a tool thatappliesscientificworkflows to processgeographicdatafor decisionmakingpurposes[214], in orderto evaluatethe facilitiesprovidedby OntoCover in a concreteworkflow system.Theassociationof onto-logical coverageswith workflow activities anddatain WOODSSprovidesa testbedfor theuseof POESIAsemanticdescriptionsto organizetheresourcesrequiredby cooperative processesinvolving geographicdata. However, WOODSSonly supportsthedefinitionandexecutionofworkflows in acentralizedenvironment,i.e.,auniqueapplicationserver.

The designand implementationof servicebrokers, the full implementationof ontologyserversandapplicationservers,andthecompletevalidationof thePOESIAapproach,in agri-cultureandotherdomains,areall left asfuturework. Otherchallengesincludethe interoper-ability of workflows associatedwith differentontologiesandthedevelopmentor incorporationof mechanismsfor synchronizingcooperativeWebservicesin aPOESIAsystem.