학술논문

Addressing source to delivery for accessible and sustainable open data: government agency realities for Linked Data; policies, tools and technologies.
Document Type
Article
Source
Geophysical Research Abstracts. 2019, Vol. 21, p1-1. 1p.
Subject
*RDF (Document markup language)
*TRANSPARENCY in government
*GOVERNMENT agencies
*SEMANTIC Web
*RELATIONAL databases
*WEB services
Language
ISSN
1029-7006
Abstract
To deliver open data, government agencies must deal with legacy processes, both social andtechnical, that contain barriers to openness. These barriers limit the true usability of open data- how it can be used over time and in multiple contexts - and are critical to address asgovernments seek to expose open data. Linked Data (LD) has always been, at its core, about ensuring the FAIR Data Principles(Findable, Accessible, Interoperable, Reusable) by focusing on the identity andrelationship of entities and exposing their context to consumers of data, even if theseprinciples have only recently been named FAIR. A fundamental component ofLD is that entities are identified by sustainable URI references called PersistentIdentifiers (PIDs) which retain their utility over time despite system and organisationchange. This poster will show how Geoscience Australia (GA) is applying the use of LD & PIDSin a real world, production IT setting. Long running operational processes have beenincrementally advanced to deliver data from relational databases as LD. Policies, practices and tools have developed and applied to support these LD delivery. Thekey components are: Data transformation tools: reliant on a robust internal data schema, the Corporate Data Model, these tools export views of it as XML or CSV publicly which is then converted to RDF in another step Overarching data model: a Semantic Web ontology that outlines the types of entities delivered publicly by GA and their macro relations. To date, public entities are Datasets, Web Services, vocabulary terms and geological Samples, Sites Surveys and Stratigraphic Units. New objects will include images with multiple formats and resolutions PID service: an application that manages a series of PID redirection rules PID governance policy: the defined process to support the agency with its multiple teams and their different data sources to have consistent application of entity identification rules and ensure uniqueness across multiple systems in the same registers pyLDAPI data service tools: a Web API tool that can present LD endpoints for entities according to given ontologiesCloud infrastructure as code (infracode): Provisioning of LD data holding RDF triple storeson the public cloud following agency best practice in delivering scalable solutions. The toolsused are Apache’s Jena/Fuseki triplestore and API deployed on Amazon Web Services(AWS) with scalability through AWS Elastic Load Balancer and Elastic File Storecomponents. Further work will explore suitability of the new triple store on AWS Neptune. [ABSTRACT FROM AUTHOR]

Online Access