Science Inventory

Mapping of chemical identifiers to DSSTox to enable data integration in the US-EPA CompTox Chemicals Dashboard

Citation:

Grulke, C., I. Thillainadarajah, P. Brown, A. Williams, AND A. Richard. Mapping of chemical identifiers to DSSTox to enable data integration in the US-EPA CompTox Chemicals Dashboard. Presented at American Chemical Society Spring Meeting, Orlando, FL, March 31 - April 04, 2019. https://doi.org/10.23645/epacomptox.8089133

Impact/Purpose:

The Computational Toxicology Program within the U.S. Environmental Protection Agency (EPA) integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human and environmental health risks. A key component of this integration effort is the mapping of chemical identifiers from a broad range of data sources to DSSTox substances using our chemical list curation protocol. Presentation at the American Chemical Society Spring Meeting March 2019.

Description:

The Computational Toxicology Program within the U.S. Environmental Protection Agency (EPA) integrates advances in biology, chemistry, and computer science to help prioritize chemicals for further research based on potential human and environmental health risks. A key component of this integration effort is the mapping of chemical identifiers from a broad range of data sources to DSSTox substances using our chemical list curation protocol (CLCP). Source identifiers typically consist of chemical names, sometimes with CASRN, and less often with SMILES or mol files. Structure-centric database efforts, such as PubChem and ChemSpider, use purely automated approaches to resolve Source list identifiers (SIDs) with the goal of mapping to unique chemical structures (CIDs). In contrast, the goal of the DSSTox CLCP is to accurately map Source list identifiers to a unique DSSTox substance (DTXSID) and, if possible, a unique structure (DTXCID). The CLCP uses a combination of automated mappings to DTXSIDs and expert manual curation review to resolve conflicts in list identifiers (e.g., Source name maps to one DTXSID, CASRN maps to another). The CLCP has resulted in the mapping of nearly 350 data sources to DSSTox substances, and in the process identified conflicted information in the source and attempting to resolve the chemistry associated with each list member. Three specific applications of the CLCP will be described: (1) mapping of chemicals identified as “active” under the EPA Toxic Substances Control Act (TSCA), (2) collection and mapping of a set of endocrine disruption reference chemicals, and (3) mapping chemical substances to animal toxicity values stored in our Toxicity Value database. Lists processed through the CLCP that are approved for public-release are published on the List page of the CompTox Chemicals Dashboard and support modeling efforts within the National Center of Computational Toxicology. This abstract does not reflect U.S. EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:04/04/2019
Record Last Revised:05/28/2019
OMB Category:Other
Record ID: 344993