Science Inventory

EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research

Citation:

Grulke, C., A. Williams, I. Thillainadarajah, AND A. Richard. EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. Computational Toxicology. Elsevier B.V., Amsterdam, Netherlands, 12:100096, (2019). https://doi.org/10.1016/j.comtox.2019.100096

Impact/Purpose:

The DSSTox database, both within EPA’s larger database environment and as surfaced through the public Dashboard, has come to play an increasingly critical role in supporting a wide range of EPA programs, as well the broader environmental research and regulatory community as list coverage, data linkages and advanced capabilities (such as support for QSAR and NTA research) have expanded. ● DSSTox database strictly controls quality of chemical ID-structure associations ● New data model enabled quality-controlled expansion from 24K to 740K substances ● DSSTox underpins EPA Dashboard and is fueling advances in computational toxicology

Description:

The US Environmental Protection Agency’s (EPA) Distributed Structure-Searchable Toxicity (DSSTox) database, launched in 2004, currently exceeds 875K substances spanning hundreds of lists of interest to EPA and environmental toxicology researchers. The DSSTox project, from its inception, has focused on providing accurate chemical identifier associations for data and lists of importance to the environmental research and regulatory community. DSSTox is the only publicly available chemical database today that is uniquely keyed to both CAS RN (Chemical Abstracts Service Registry Numbers) and structure, and is supported by both automated and expert manual list and substance curation. In 2014, the legacy, manually curated DSSTox_V1 content was migrated to a new DSSTox_V2 MySQL data model supported by modern cheminformatics tools. This was followed by sequential auto-loads of portions of three public datasets: EPA’s Substance Registry Services (SRS), the National Library of Medicine’s ChemID, and the subset of PubChem with CAS RN-type synonyms and structures. This process was constrained by a requirement of uniquely mapped CAS RN, name and structure identifiers (IDs) for each substance, rejecting content where IDs were conflicted either within or across datasets. This rejected content provided quantitative estimates of the degree of conflicting, inaccurate substance-structure ID mappings in the public domain, ranging from 12% internal conflicts within EPA SRS to 49% conflicts (>100K records) between ChemID and PubChem. Content successfully added to DSSTox from each auto-load was assigned to one of five qc_levels reflecting curator confidence in each of the datasets. This process enabled a significant expansion of DSSTox content, while retaining focus on data quality. DSSTox content serves as the core foundation of EPA’s CompTox Chemicals Dashboard [https://comptox.epa.gov/dashboard], which provides public access to, and has greatly expanded the reach of DSSTox content to support a broad range of modeling and research activities across the field of computational toxicology. The views expressed are those of the authors and do not necessary reflect the view or policy of the US Environmental Protection Agency.

Record Details:

Record Type:DOCUMENT( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date:11/01/2019
Record Last Revised:11/16/2020
OMB Category:Other
Record ID: 350156