Science Inventory

Expansion of DSSTox: Leveraging public data to create a semantic cheminformatics resource with quality annotations for support of U.S. EPA applications. (American Chemical Society)

Citation:

Richard, A., Chris Grulke, I. Thillainadarajah, A. Williams, D. Lyons, AND J. Edwards. Expansion of DSSTox: Leveraging public data to create a semantic cheminformatics resource with quality annotations for support of U.S. EPA applications. (American Chemical Society). Presented at ACS Spring Meeting, San Diego, CA, March 13 - 17, 2016. https://doi.org/10.23645/epacomptox.5058472

Impact/Purpose:

Presentation at ACS Conference to disseminate information about our efforts in DSSTox chemical database development

Description:

The expansion of chemical-bioassay data in the public domain is a boon to science; however, the difficulty in establishing accurate linkages from CAS registry number (CASRN) to structure, or for properly annotating names and synonyms for a particular structure is well known. DSSTox has long been considered a trusted source for highly curated CASRN to name to structure relationships within the environmental toxicology community. DSSTOX recently expanded to include accurate annotation of the more than 8000 chemical substances being tested in the ToxCast and Tox21 programs. To extend cheminformatics integrity beyond DSSTox’s initial 25K substances, we collected data from various public sources and performed a series of checks to evaluate the consistency of chemical information within and across these public repositories. Incoming data were constrained by strictly enforcing a 1:1 mapping of CASRN to structure, and each substance was assigned to one of six “QCLevels” to capture the level of confidence in CASRN to name to structure associations. The number of chemicals now supported in DSSTox has expanded to over 750k with over 150k curated to be higher quality than public resources. This expanded version of DSSTox is available to the public in legacy DSSTox flat file and SDF formats, through web interfaces supporting EPA’s Chemical Safety and Sustainability (CSS) projects (including ToxCast and Tox21), and as RDF graph format to facilitate semantic data efforts. Our efforts have quantified a high degree of inconsistency in publicly available chemical annotations, as well as highlighted the challenges caused by limited adoption of semantic data in chemistry to date. This abstract does not reflect U.S. EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:03/16/2016
Record Last Revised:04/04/2016
OMB Category:Other
Record ID: 311656