Science Inventory

Development of a Water Solubility Dataset to Establish Best Practices for Curating New Datasets for QSAR Modeling (ACS Fall 2020)

Citation:

Lowe, C., Chris Grulke, AND A. Williams. Development of a Water Solubility Dataset to Establish Best Practices for Curating New Datasets for QSAR Modeling (ACS Fall 2020). American Chemical Society Fall 2020 meeting, RTP, NC, August 17 - 20, 2020. https://doi.org/10.23645/epacomptox.12824438

Impact/Purpose:

EPA needs to develop data sets suitable for QSAR modeling for a variety of toxicity endpoints and physical properties. It is vital to develop best practices for curating data sets in terms of the molecular structure and experimental meta data associated with each data point. This activity builds on the standard process of DSSTox chemical curation but adds an additional layer of curation to the annotation and appropriate processing, reconciliation, and condensation of experimental data to make it suitable for QSAR modeling. This product will develop a systematic methodology for developing a final curated data set (with both property values and molecular structures) from a series of data sources. This research area is also linked to larger scientific efforts within the cheminformatics community internal and external to EPA.

Description:

The U.S. Environmental Protection Agency’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) hosts a plethora of environmentally-relevant chemical information, including physicochemical property data suitable for QSAR/QSPR modeling. The development of these physical property datasets has generally involved the curation of publicly-available experimental data. The ease of accessing these data, along with the overall quality of the dataset (i.e. machine-readable formatting, inclusion of experimental conditions, etc) is highly variable. This purpose of this work is to identify the challenges associated with the assembly of physicochemical property datasets, with a focus on obtaining high quality water solubility values for organic compounds. Common issues discovered during the process of assembling, integration and review of these data will be presented, along with solutions that can be easily implemented in a high-throughput manner. Our intention is to develop standard workflows and provide guidance that can be used by researchers for the curation of physicochemical property datasets and ideally extended to environmental fate and transport data and other relevant chemical related datasets. The culmination of this work is a curated water solubility dataset for organic compounds from numerous sources, including both online databases and journal articles. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:08/20/2020
Record Last Revised:10/27/2020
OMB Category:Other
Record ID: 350009