tese phd
DESCRIPTION
My PhD thesis presentationTRANSCRIPT
Organization is Sharing:From eScience to
Personal Information Management
Rodrigo Dias Arruda Senra
Advisor: Profa Dra. Claudia Bauzer Medeiros
Defesa de Tese de Doutorado em Ciência da Computação Universidade Estadual de Campinas
Instituto de Computação
Campinas 2012-12-10
Outline
• Motivation
• Objectives
• Contributions
• Results
2
• SciFrame
• Database Descriptors
• Organographs{
Motivation
4
Study the relation Heterogeneity ↔ Organization ↔ Sharing
5
NDVI Profile Generation
PostGIS
Filesystem
Postgres
WebMAPS
5
NDVI Profile Generation
Geometries (IBGE)
Spectral Images(NASA)
Crops(Min.Agr)
PostGIS
Filesystem
Postgres
HTTPFTP
WebMAPS
5
NDVI Profile Generation
Geometries (IBGE)
Spectral Images(NASA)
Crops(Min.Agr)
PostGIS
Filesystem
Postgres
HTTPFTP
WebMAPS
5
NDVI Profile Generation
Geometries (IBGE)
Spectral Images(NASA)
Crops(Min.Agr)
PostGIS
Filesystem
Postgres
HTTPFTP
WebMAPS
5
NDVI Profile Generation
Geometries (IBGE)
Spectral Images(NASA)
Crops(Min.Agr)
PostGIS
Filesystem
Postgres
HTTPFTP
WebMAPS
5
NDVI Profile Generation
Geometries (IBGE)
Spectral Images(NASA)
Crops(Min.Agr)
PostGIS
Filesystem
Postgres
HTTPFTP
WebMAPS
5
NDVI Profile Generation
Geometries (IBGE)
Spectral Images(NASA)
Crops(Min.Agr)
PostGIS
Filesystem
Postgres
HTML, Microformats, 2D Plots
HTTPFTP
HTTP
WebMAPS
Objectives
8
• describe and compare eScience systems
• match Applications needs with DBMS capabilities
• manage digital content hierarchies
8
Motivation
Objectives
• Contributions
• Results
9
• SciFrame
• Database Descriptors
• Organographs{
SciFrame
11
SciFrame
The Scientific Digital Data Processing Framework is a conceptual framework that describes systems or
processes involving digital data manipulation.
Interfacing
Acquisition
Publication
(discovery - extraction - transference )
Information Management Data Management
SciFrameInterfacing
Acquisition
Publication
(discovery - extraction - transference )
Information Management Data Management
SciFrameInterfacing
Acquisition
Publication
(discovery - extraction - transference )
Information Management Data Management
SciFrameInterfacing
Acquisition
Publication
(discovery - extraction - transference )
Information Management Data Management
SciFrameInterfacing
Acquisition
Publication
(discovery - extraction - transference )
Information Management Data Management
Data Management
Manipulation
Create Retrieve Update Delete Index
Storage
SciFrameInterfacing
Acquisition
Publication
(discovery - extraction - transference )
Information Management Data Management
Data Management
Manipulation
Create Retrieve Update Delete Index
Storage
Information Management
SciFrameInterfacing
Acquisition
Publication
(discovery - extraction - transference )
Information Management Data Management
Information Management
SciFrameInterfacing
Acquisition
Discovery
Extraction
Transference
Publication
Data Management
Storage
Manipulation
Information Management
Description
TransformationFusing
Filtering
WebMapsInterfacing
Acquisition
Discovery Geometries (IBGE), Raster(NASA), Crops(Min.Agr)
Extraction ad hoc extractor scripts (paparazzi)
Transference FTP and HTTP
Publication HTML, Microformats, 2D Plots
Data Management
Storage Geometries(PostGIS), Raster(Files), Crops(Postgres)
Manipulation Geometries(CRDI), Raster(CRD), Crops(CRUDI)
Information Management
Description Geometries(SHP,WKT), Raster(HDF,GeoTIFF)
TransformationFusing NDVI Time Series
Filtering Cloud and noise removal (HANTS)
Research ProblemsInterfacing
Acquisition
Discovery data scattered, many providers, search engines ?
Extraction feasibility, preserve provenance, lack of semantics
Transference availability, voluminous data, bandwidth, protocol
Publication lack of intention, access control, traceability
Data Management
Storage scalability, distribution, consistency, preservation
Manipulation multimedia, impedance mismatch
Information Management
Description implicit x explicit, semantic web, social, trust, privacy
Transformationinformation lost: conceptual > logical > physical
multi-modalityhandle uncertain and incomplete data
TechnologiesInterfacing
Acquisition
Discovery DAS Registry, BIOCatalogue, SciScope
Extraction Scrappers, Wrappers, PiggyBank, Operator
Transference Streaming, P2P, OpenDAP
Publication SOA x ROA, Microformats x RDFa
Data Management
Storage Scientific Datasets, XML, Cloud Computing
Manipulation SQL extensions, ORMs, LINQ
Information Management
Description In Loco Semantics
TransformationArray Algebra (RASDAMAN)Topological Operators (GIS)
Proximity Search and Report Language (ISIS)
Interfacing
Acquisition
Publication
(discovery - extraction - transference )
Information ManagementData Management
Data Management
Data Management
Data Management
✓enforce loose coupling between Apps and DBMS
✓DBMS product/vendor independence
✓seamless cross-database migration
✓capability verification, validation and negotiation
✓support Apps and DBMS in the cloud!
Database Descriptors
DBMS
Descriptors
Feature descriptor
Desiderata descriptorspecifies what a client application needs
12
App
DBMS
Descriptors
Feature descriptor
Desiderata descriptorspecifies what a client application needs
specifies what a DBMS provides12
App
Architecture
15
WebDMS X
DMS YDMS Z
Architecture
15
WebDMS X
DMS YDMS Z
DescriptorRegistry
descriptor X
descriptor Y
Architecture
15
WebDMS X
DMS YDMS Z
DescriptorRegistry
DescriptorRegistryDescriptor
RegistryDescriptorRegistry
descriptor X
descriptor Y
Architecture
15
WebDMS X
DMS YDMS Z
DescriptorRegistry
DescriptorRegistryDescriptor
RegistryDescriptorRegistry
App
descriptor X
descriptor Y
Architecture
15
WebDMS X
DMS YDMS Z
DescriptorRegistry
Negotiator
DescriptorRegistryDescriptor
RegistryDescriptorRegistry
App
descriptor X
descriptor Y
Architecture
15
WebDMS X
DMS YDMS Z
DescriptorRegistry
Negotiator
DescriptorRegistryDescriptor
RegistryDescriptorRegistry
App
descriptor X
descriptor Y
Architecture
15
WebDMS X
DMS YDMS Z
DescriptorRegistry
Negotiator
DescriptorRegistryDescriptor
RegistryDescriptorRegistry
App
descriptor X
descriptor Y
binding
DBD Structure
13 * http://dublincore.org/documents/dces/
App DBMS
@prefix : <http://www.lis.ic.unicamp.br/purl/DBD/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .
:Cmbm a foaf:Person ; foaf:name “Claudia Bauzer Medeiros” .
:DBD1 dc:identifier “DBD1” ; dc:type “Feature DBD” ; dc:format “text/turtle” ; dc:title “Sample Feature Descriptor” ; dc:description “Hypothetical Feature DBD in RDF/Turtle” ; dc:creator :Cmbm ; dc:date “2009-12-18” ; dc:language “EN” ; :isolation :READ_COMMITED ; :versioning “unsupported” ; :storage “RDF Triples” ; :DML [ a rdf:Bag ; rdf:_1 RDQL ; rdf:_2 SPARQL ; ] .
Feature Descriptor
@prefix : <http://www.lis.ic.unicamp.br/purl/DBD/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dc: <http://purl.org/dc/elements/1.1/> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .
:Rodsenra a foaf:Person ; foaf:name “Rodrigo Dias Arruda Senra” .
:DBD2 dc:identifier “DBD2” ; dc:type “Desiderata DBD” ; dc:format “text/turtle” ; dc:title “Sample Desiderata Descriptor” ; dc:description “Desiderata DBD for hypothetical App” ; dc:creator :Rodsenra; dc:date “2010-01-05” ; dc:language “EN” ; :isolation :READ_COMMITED ; :concurrency “Two phase lock” ; :storage “RDF Triples” ; :DML SPARQL .
Desiderata Descriptor
Understanding Hierarchies...
SciFrame DBDs
Organographs
27
28
Which of the following sets better accommodate the object above ?
29
Red ? Triangles ? Metric Related ?
Problems
30
1. Single Category versus Multi-faceted Content
2. Manually-defined categories
3.Criteria is not explicit
4.Static Membership Relation
5. Organization is not reusable
31
31
Organograph
... artifact to make explicit how to organize information in the context of a particular task.
Organograph
32
Hout = forg(Hin)
vcnt
eagg
ecnt
H(V,E)
vagg
vagg
Organograph
32
Hout = forg(Hin)
forg:• navigation (crawler/iterador)
• feature extraction
• FHil(vagg,vagg): hierarchical structuring
• FCat(vagg,vcnt): categorization
URL
HoutHin
URL
vcnt
eagg
ecnt
H(V,E)
vagg
vagg
NLP
Author
MLContentDomain
Expert Roles
OntologiesClassifiersInformation
Extraction
Algorithms
Similarityforg
Vizualization Strategies
33
Iterators
Data Container UX
Organograph Composition
Task !
NLP
Author
MLContentDomain
Expert Roles
OntologiesClassifiersInformation
Extraction
Algorithms
Similarityforg
Vizualization Strategies
33
Iterators
Data Container UX
Organograph Composition
Task !
• patterns• dictionaries• rules• probabilities• templates/wrappers
NLP
Author
MLContentDomain
Expert Roles
OntologiesClassifiersInformation
Extraction
Algorithms
Similarityforg
Vizualization Strategies
33
Iterators
Data Container UX
Organograph Composition
Task !
• matching• dice• jaccard• overlap• cosine
NLP
Author
MLContentDomain
Expert Roles
OntologiesClassifiersInformation
Extraction
Algorithms
Similarityforg
Vizualization Strategies
33
Iterators
Data Container UX
Organograph Composition
Task !
• FOAF• Dbpedia• Schema.org• Freebase• MusicBrainz• Geonames
NLP
Author
MLContentDomain
Expert Roles
OntologiesClassifiersInformation
Extraction
Algorithms
Similarityforg
Vizualization Strategies
33
Iterators
Data Container UX
Organograph Composition
Task !
• Naive Bayes• SVM• Nearest Neighbors• LDA• LSI
NLP
Author
MLContentDomain
Expert Roles
OntologiesClassifiersInformation
Extraction
Algorithms
Similarityforg
Vizualization Strategies
33
Iterators
Data Container UX
Organograph Composition
Task !
• Filesystem• Gmail• Evernote• Delicious• Dropbox
DBDs!
NLP
Author
MLContentDomain
Expert Roles
OntologiesClassifiersInformation
Extraction
Algorithms
Similarityforg
Vizualization Strategies
33
Iterators
Data Container UX
Organograph Composition
Task !
• Fuse, Dokan• Infoviz• D3
Metodology
34
collection
Metodology
34
collection
organize
Metodology
34
collection
organize
evaluate
Metodology
34
collection
organize
evaluate
reorganize
Metodology
34
collection
organize
evaluate
reorganize
share
Evaluating Hierarchies
35
Evaluating Hierarchies
35
too much content
Evaluating Hierarchies
35
too much content
duplicated or misplaced
Evaluating Hierarchies
35
too much content
too manyaggregators
duplicated or misplaced
Evaluating Hierarchies
35
too much content
too manyaggregators
duplicated or misplaced
too deep
Reorganizing Hierarchies
36
Alice
Bob
2011
2008
2011
Author
Publication Date
paper 1
paper 2
paper 3
Reorganizing Hierarchies
36
Alice
Bob
2011
2008
2011
Author
Publication Date Author
Publication Date
paper 1
paper 2
paper 3
Reorganizing Hierarchies
36
Alice
Bob
2011
2008
2011 Alice
Bob
2008
2011
Alice
Author
Publication Date Author
Publication Date
Task is important!
paper 1
paper 2
paper 3
Reuse Organization
37
Reuse Organization
37
Reuse Organization
37
Hacm Vcntmine
Hin
Hout
Internal Indexes
Pre-processing
Feature Extraction
Transformation Workflow
Organograph Execution
FCat() FHil()
Visualization
Hin
Hout
Internal Indexes
Pre-processing
Feature Extraction
Transformation Workflow
Organograph Execution
FCat() FHil()
Visualization
Hin
Hout
Internal Indexes
Pre-processing
Feature Extraction
Transformation Workflow
Organograph Execution
FCat() FHil()
Visualization
Hin
Hout
Internal Indexes
Pre-processing
Feature Extraction
Transformation Workflow
Organograph Execution
FCat() FHil()
Visualization
@organographdef forg_ccs98(self, input): self.id = new_uuid() #‘ff7d8e21-4226-11e2-b2f1-109add6b426c’ self.description = ‘docs by ACM CCS98’ ccs98 = acm_extract(‘http://www.acm.org/about/class/1998/ccs98.xml’) trainset = [] for category,words in nlp_clean_titles(ccs98.Vcnt.paths): for w in words: trainset.append((make_feature(w), category))
classifier = NaiveBayes(trainset) self.Ecnt = classifier.classify(input) # FCat self.Eagg = ccs98.Eagg.Level[:1] # FHil
@organographdef forg_ccs98(self, input): self.id = new_uuid() #‘ff7d8e21-4226-11e2-b2f1-109add6b426c’ self.description = ‘docs by ACM CCS98’ ccs98 = acm_extract(‘http://www.acm.org/about/class/1998/ccs98.xml’) trainset = [] for category,words in nlp_clean_titles(ccs98.Vcnt.paths): for w in words: trainset.append((make_feature(w), category))
classifier = NaiveBayes(trainset) self.Ecnt = classifier.classify(input) # FCat self.Eagg = ccs98.Eagg.Level[:1] # FHil
input = collection(‘file:///some/local/dir/docs’)output = forg_ccs98(input)publish(output, ‘rodsenra@dropbox:/output’)organicer.render(output, organicer.views.HYPERBOLIC_TREE)
forg_ccs_98Interfacing
Acquisition
Discovery ACM CCS98, Hin
Extraction pdf2txt,pdfbox, pypdf; NLTK (tokenizer)
Transference HTTP, WebDAV, NFS, SMB
Publication Hout :HTML+CSS, JS(Infoviz,D3); Dropbox
Data Management
Storage NoSQL DB (Mongo, Neo4J)
Manipulation Indexes (CRDI)
Information Management
Description SKOS, GraphML, JSON
TransformationMining NaiveBayes
Filtering Vcnt(unconverted pdfs); Vagg (empty or ambiguous)
Related Work
Related Work (SciFrame)
• CLRC scientific metadata modelB. Matthews and S. SufiThe CLRC Scientific Metadata Model, version 1, DL TR 02001, CLRC2001
• myGrid Information ModelSharman, Nick, et al. "The myGrid information model." UK e-Science programme All Hands Conference. 2004.
Related Work (DBDs)
Madnick and Wang.Evolution Towards Strategic Applications Of Databases Through Composite Information Systems.Journal of Management Information Systems 5(2):5-22 1988
“In order to: separate data from the application processing, it is necessary to employ a process descriptor and a database descriptor.
The process descriptor describes the name, the input/output data requirement, and other resource requirements of the processing components.
The database descriptor contains information about the data (e.g., data model, schema, access rights) in the database, similar to data dictionaries.
These two descriptors can be used by the execution environment to coordinate the interaction between the processing component and the database.”
Related Work (Organographs)
• Topic Modeling LSA, LDA, Hierarchical Bayesian
Blei 201; Blei, Ng, & Jordan, 2003; Griffiths & Steyvers, 2002; 2003; 2004; Hofmann, 1999; 2001
• Personal Information Management CALO, UMEA, X-COSIM, Haystack, UpLib, Iris
Zimmermann 2005; Arndt 2007; Lansdale 1988; Kaptelinin 2003; Janssen & Popat 2003; Karger et al 2003
• Semantic DesktopNepomuk, SEMSOCGiannakidou et al 2008; Groza et al 2007
• Personal Digital LibrariesZotero, Mendeley, Papers
Results
Contributions
• SciFrame
• Database Descriptors (DBDs)
• Organographs
• Software tools & algorithms: WebMAPS, Paparazzi & Organicer
46
Publications
submitted to JODS
Evaluating, Reorganizing and Sharing Digital Information Hierarchies.Rodrigo D. A. Senra, Claudia B. Medeiros. Journal on Data Semantics (submetido em 2012-10-25)
2011Organographs - Multi-faceted Hierarchical Categorization of Web Documents. Rodrigo D. A. Senra, Claudia B. Medeiros. Proceeding of the 7th International Conference on Web Information Systems and Technologies - WEBIST: 583-588
2010Database Descriptors: Laying the Path to Commodity Web Data Services.Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of Engineering of Computer-Based Systems (ECBS): 386-392
2009SciFrame: a conceptual framework to describe data sharing in eScience.Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of the III Brazilian eScience workshop (XXIV SBBD)
2009A standards-based framework to foster geospatial data and process interoperability. Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros. Journal of the Brazilian Computer Society 15(1): 13-25
2008Bridging the gap between geospatial resource providers and model developers.Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of the 16th International Conference on Advances in Geographic Information Systems - ACM SIGSPATIAL
2007O projeto WebMAPS: desafios e resultados. Carla G. N. Macário, Claudia B. Medeiros, Rodrigo D. A. Senra. Proceedings of 9th Brazilian Symposium on Geoinformatics - GeoInfo: 239-250
47
Publications
submitted to JODS
Evaluating, Reorganizing and Sharing Digital Information Hierarchies.Rodrigo D. A. Senra, Claudia B. Medeiros. Journal on Data Semantics (submetido em 2012-10-25)
2011Organographs - Multi-faceted Hierarchical Categorization of Web Documents. Rodrigo D. A. Senra, Claudia B. Medeiros. Proceeding of the 7th International Conference on Web Information Systems and Technologies - WEBIST: 583-588
2010Database Descriptors: Laying the Path to Commodity Web Data Services.Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of Engineering of Computer-Based Systems (ECBS): 386-392
2009SciFrame: a conceptual framework to describe data sharing in eScience.Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of the III Brazilian eScience workshop (XXIV SBBD)
2009A standards-based framework to foster geospatial data and process interoperability. Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros. Journal of the Brazilian Computer Society 15(1): 13-25
2008Bridging the gap between geospatial resource providers and model developers.Gilberto Z. Pastorello Jr., Rodrigo D. A. Senra, Claudia B. Medeiros. Proceedings of the 16th International Conference on Advances in Geographic Information Systems - ACM SIGSPATIAL
2007O projeto WebMAPS: desafios e resultados. Carla G. N. Macário, Claudia B. Medeiros, Rodrigo D. A. Senra. Proceedings of 9th Brazilian Symposium on Geoinformatics - GeoInfo: 239-250
47
SciFrame
WebMaps
DBDs
Organographs
Extensions
Theoretical Practical
SciFrame • formalize design pattern• enhance the operations vocabulary
• online catalog of eScience systems• describe as ontology (RDF)
DatabaseDescriptors
• analyse negotiation frameworks• expand DBDs expressivity• explore ranking algorithms
• catalog of concrete DBDs• adapt Organicer to use DBDs• experiment with dynamic negotiation
Organographs • model with Category Theory• explore DSLs to describe forg
• support non-textual media (eg.:img)• expand component palette
48
Agradecimentos
• Laboratório de Sistemas de Informação (IC-Unicamp)
http://www.lis.ic.unicamp.br• Brazilian Institute for Web Science Research
http://webscience.org.br• Fapesp - CNPQ - CAPES
49
Rodrigo Dias Arruda Senrahttp://rodrigo.senra.nom.br
Rodrigo Dias Arruda Senrahttp://rodrigo.senra.nom.br
Thank you.Agradeço sua atenção.
Support Material
Hierarquia de Origem
Hierarquia de Origem
Pre-processamento
BeautifulSouppyPdf
Hierarquia de Origem
ExtraçãoNLTK
Pre-processamento
BeautifulSouppyPdf
Hierarquia de Origem
ExtraçãoNLTK
Pre-processamento
BeautifulSouppyPdf
Índice deFacetas
pymongo
Hierarquia de Origem
Workflow de Transformação
ExtraçãoNLTK
Pre-processamento
BeautifulSouppyPdf
Índice deFacetas
pymongo
Hierarquia de Origem
Workflow de Transformação
ExtraçãoNLTK
Pre-processamento
BeautifulSouppyPdf
Índice deFacetas
pymongo
networkx gensimnumpy scikit-learn
Hierarquia de Origem
Workflow de Transformação
HierarquiaResultante
Visualização
ExtraçãoNLTK
Pre-processamento
BeautifulSouppyPdf
Índice deFacetas
pymongo
networkx gensimnumpy scikit-learn
Hierarquia de Origem
Workflow de Transformação
HierarquiaResultante
Visualização
ExtraçãoNLTK
Pre-processamento
BeautifulSouppyPdf
Índice deFacetas
pymongo
networkx gensimnumpy scikit-learn
matplotlibObsPy
InfoViz.jsD3.js
Hierarquia de Origem
Workflow de Transformação
HierarquiaResultante
Visualização
Navegação daHierarquia
Iterador
ExtraçãoNLTK
Pre-processamento
BeautifulSouppyPdf
Índice deFacetas
pymongo
networkx gensimnumpy scikit-learn
matplotlibObsPy
InfoViz.jsD3.js
Hierarquia de Origem
Workflow de Transformação
HierarquiaResultante
Visualização
Navegação daHierarquia
Iterador
ExtraçãoNLTK
Pre-processamento
BeautifulSouppyPdf
Índice deFacetas
pymongo
networkx gensimnumpy scikit-learn
matplotlibObsPy
InfoViz.jsD3.js
os.walkpydeliciousevernote
Hin Hout
Internal Indexes
Pre-processing
Feature Extraction
Transformation Workflow
FCat() FHil()
Visualization
NLP
Author
MLContentDomain
Expert Roles
OntologiesClassifiersInformation
Extraction
Algorithms
Similarityforg
Vizualization Strategies
54
Iterators
Data Container UX
Task !
55
forg:• navigation (crawler/iterador)
• feature extraction
• FHil(vagg,vagg): hierarchical structuring
• FCat(vagg,vcnt): categorization
Hin: URL
Hout:URL
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dbd="http://www.lis.ic.unicamp.br/purl/DBD"> <rdf:Description rdf:about="http://www.lis.ic.unicamp.br/purl/DBD/DBD1"><!-- metadata --> <dc:creator>Claudia Bauzer Medeiros</dc:creator> <dc:description>Hypothetical DBD for an RDF DBMS</dc:description> <dc:identifier>DBD1</dc:identifier> <dc:format>application/rdf+xml</dc:format> <dc:type><rdf:Description> <dbd:Type>Feature DBD</dbd:Type></rdf:Description> </dc:type><dc:title>Descriptor of an RDF DBMS</dc:title> <dc:date>2009-12-18</dc:date> <dc:language>EN</dc:language> <!-- dimensions and values --> <dbd:concurrency>Two phase lock</dbd:concurrency> <dbd:versioning>unsupported</dbd:versioning> <dbd:storage>RDF triples</dbd:storage><dbd:DML> <rdf:Bag><rdf:li>RDQL</rdf:li><rdf:li>SPARQL</rdf:li> </rdf:Bag></dbd:DML> </rdf:Description></rdf:RDF>
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dbd="http://www.lis.ic.unicamp.br/purl/DBD"> <rdf:Description rdf:about="http://www.lis.ic.unicamp.br/purl/DBD/DBD1"><!-- metadata --> <dc:creator>Rodrigo Dias Arruda Senra</dc:creator> <dc:description>Desiderata DBD for an hypothetical application</dc:description> <dc:identifier>DBD2</dc:identifier> <dc:format>application/rdf+xml</dc:format> <dc:type><rdf:Description> <dbd:Type>Desiderata DBD</dbd:Type></rdf:Description> </dc:type><dc:title>Desiderata descriptor of an hypothetical application</dc:title> <dc:date>2010-01-05</dc:date> <dc:language>EN</dc:language> <!-- dimensions and values --><dbd:concurrency>Two phase lock</dbd:concurrency> <dbd:storage>RDF triple store</dbd:storage> <dbd:DML>RDQL</dbd:DML></rdf:Description> </rdf:RDF>
58
NDVI Profiles
Data Management
Manipulation
Create Retrieve Update Delete Index
Storage
Information Management
Transformations‣Browsing‣Iterating‣Searching‣ Augmenting‣Mining ‣Description‣ Annotation‣ Schematization ‣Summarizing
‣Structuring‣Sorting‣Merging‣ Decreasing‣ Filtering‣ Fusing
Example
61
Example
62
Input Collection
Task: info extraction
Task: transformation
Task: visualization
63
WebMAPS: DataFlow
Correio
FTP
MODIS Reprojection Tool
Imagens
Recorteda região
Geometria(IBGE)
64
NDVI
Related Work
9
• embedded • n-tier client/server (including web services)• mediators
Approaches to App-to-DMS binding
Information Integration [1]
Process• Understanding• Standardization• Specification• Execution
[1] Beauty and the Beast: The Theory and Practice ofInformation IntegrationLaura Haas
Mechanism • Materialization• Federation• Indexing
Related Work
9
• embedded • n-tier client/server (including web services)• mediators
Descriptors are orthogonal to all of these!
Approaches to App-to-DMS binding
Information Integration [1]
Process• Understanding• Standardization• Specification• Execution
[1] Beauty and the Beast: The Theory and Practice ofInformation IntegrationLaura Haas
Mechanism • Materialization• Federation• Indexing
66
Extração dos Dados Sensoriasdataset = gdal.Open(raster_file, GA_ReadOnly )# Obtenção dos coeficientes para funções afins de mapeamento de coordenadasgt = dataset.GetGeoTransform()
# Obtenção da banda de dados de interesseband = dataset.GetRasterBand(1)
# Identificação do padrão de codificação dos dados.# No caso do arquivo TIF os dados são bytes sem sinal ('Byte')data_type = gdal.GetDataTypeName(band.DataType)
# Obtenção das dimensões da imagemwidth, height = band.XSize, band.YSize
# Conversão do MBR do sistema de coordenadas lat/long para linha/coluna# Xgeo = GT(0) + Xpixel*GT(1) + Yline*GT(2)# Ygeo = GT(3) + Xpixel*GT(4) + Yline*GT(5)
ul_pixel, lr_pixel = g2p(gt,*ul_geo), g2p(gt,*lr_geo)
67
WebMAPS
Case Study: WebMaps
Case Study: WebMaps
69
Extração dos Dados
def raster2array(ul_pixel, lr_pixel, dtype='B'): """Using ul_pixel and lr_pixel it generates a numpy array with the extracted interest region from the raster file """ col_size = lr_pixel[1]-ul_pixel[1]+1 row_size = lr_pixel[0]-ul_pixel[0]+1 scanline = band.ReadRaster(ul_pixel[1], ul_pixel[0], col_size, row_size) num_pixels = col_size*row_size roi = numpy.array(struct.unpack(dtype*num_pixels, scanline)) roi.shape = (row_size, col_size) return roi
# Read data from raster file into a numpy array# defining a region of interest matrixroi = raster2array(ul_pixel, lr_pixel)
70
Extração da Geometria
shp = ogr.Open(filepath)
# Layer correspondente ao Estado de São paulolayer = vf.shp.GetLayerByName('35mu500gc')
# Feature correspondente ao município de Campinasfeature = layer.GetFeature(501)
# Extração dos pontos de controle do perímetrogeometry = feature.GetGeometryRef() poly = geometry.GetGeometryRef(0) centroid = geometry.Centroid() centroid_geo = centroid.GetX(), centroid.GetY()
# Definição do Retângulo Envoltório Mínimo (MBR)lg_left, lg_right, lt_bot, lt_up = poly.GetEnvelope()ul_geo, lr_geo = (lg_left, lt_up), (lg_right, lt_bot)
71
Operações Espaciais
Organicer
72
Organicer
72
Organicer
72
Organicer
72
Organicer
72