Science Inventory

Consistency Checking the Experimental Data Available from the USEPA NCCT CompTox Database

Citation:

Chalk, S., Chris Grulke, AND A. Williams. Consistency Checking the Experimental Data Available from the USEPA NCCT CompTox Database. Presented at American Chemical Society Spring Meeting, Orlando, FL, March 31 - April 04, 2019.

Impact/Purpose:

Abstract to be presented at the 2019 American Chemical Society Spring Meeting. In order to deliver predictive models NCCT has measured, assembled and delivered an enormous quantity and diversity of data. This presentation will present an evaluation of the consistency of the experimental data by conversion of the raw data into the JavaScript Object Notation for Linked Data. The database is then searched using SPARQL queries to identify inconsistencies that can then be reviewed and curated.

Description:

The US EPA’s National Center for Computational Toxicology (NCCT) is focused on developing computational estimates of the toxicology of chemicals found in commerce and in the environment. In order to deliver predictive models NCCT has measured, assembled and delivered an enormous quantity and diversity of data. This includes high-throughput in vitro screening data, in vivo and functional use data, as data delivered via the CompTox Chemicals Dashboard, a web application providing access to data associated with ~770,000 chemical substances. A subset of these data have been extracted and curated from sources including public and agency databases and scientific publications. This presentation will present an evaluation of the consistency of the experimental data by conversion of the raw data into the JavaScript Object Notation for Linked Data (JSON-LD) SciData format and ingestion of the JSON-LD files into a graph database as Resource Description Framework (RDF) triples. The graph database is then searched using SPARQL queries to identify inconsistencies that can then be reviewed and curated. Conversion of the data was done semi-automatically using PHP and Python scripts to crosswalk the data into a MySQL database and subsequently exported as JSON-LD. As part of the process annotation of compounds in the dataset was augmented with classifications from the ChemOnt ontology. The results of this analysis, pain points encountered, and progress toward automating the workflow using KNIME will be presented. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ ABSTRACT)
Product Published Date:04/04/2019
Record Last Revised:08/14/2019
OMB Category:Other
Record ID: 345958