Science Inventory

Open-source QSAR-ready chemical structure standardization workflow

Citation:

Mansouri, K., Chris Grulke, R. Judson, A. Richard, A. Williams, AND N. Kleinstreuer. Open-source QSAR-ready chemical structure standardization workflow. QSAR 2021 International Workshop on QSAR in Environmental and Health Sciences, Virtual, NC, June 07 - 10, 2021. https://doi.org/10.23645/epacomptox.15070041

Impact/Purpose:

Presentation to the QSAR 2021 International Workshop on QSAR in Environmental and Health Sciences June 2021. Chemical safety decisions and management can be hindered by the lack of ready-access to the ever-expanding array of data, tools, and models that are relevant to the analyses. Even though many chemical safety resources are available, it may not be clear how the various sources of information might be combined in targeted, efficient workflows to address their specific questions. The current product showcases QSAR models providing predictions on toxicity endpoints and physicochemical, environmental fate, and ADME properties. This product provides regulatory scientists, students and researchers with the ability to effectively access and exploit the many in silico data streams to support different regulatory purposes and supports current Agency efforts to reduce mammal study requests by 30% by 2025, and completely eliminate all mammal study requests and funding by 2035.

Description:

The rapid increase of publicly available chemical structures and associated experimental data presents a valuable opportunity to build robust QSAR models for applications in different fields. However, a common concern is the quality of both the chemical structure information and associated experimental data. Especially when collected from multiple sources, chemical structural records usually contain many duplicates and molecular inconsistencies. Such issues can alter the molecular descriptor calculation procedure and subsequently, the quality of the derived QSAR models in terms of accuracy and repeatability. Here we describe the development of an automated workflow to standardize the chemical structures according to a set of standard rules to generate “QSAR-ready” forms prior to calculating molecular descriptors. The workflow design was conducted in the KNIME data-mining environment. This workflow performs a series of operations on the 2D and 3D structures including desalting, standardizing tautomers and nitro groups, correcting valence, neutralizing when possible, and removing duplicates. This workflow has been used in different QSAR related projects and international collaborations as well as all models included in the OPERA application (https://github.com/NIEHS/OPERA). The workflow was also used to standardize over 750k structures available on the EPA’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) and NTP’s Integrated Chemical Environment (https://ice.ntp.niehs.nih.gov/). The QSAR-ready workflow can be downloaded and used separately in KNIME environment or in command line within a Docker container (https://github.com/NIEHS/QSAR-ready). Recently, it was also embedded in OPERA to standardize structures prior to running the models for prediction. This abstract does not necessarily reflect NIEHS and EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:06/10/2021
Record Last Revised:07/28/2021
OMB Category:Other
Record ID: 352423