Science Inventory

Establishing Best Practices for Water Solubility Dataset Curation (QSAR 2021)

Citation:

Lowe, C., G. Sinclair, C. Ramsland, T. Martin, Chris Grulke, AND A. Williams. Establishing Best Practices for Water Solubility Dataset Curation (QSAR 2021). QSAR 2021 International Workshop on QSAR in Environmental and Health Sciences, Virtual, NC, June 07 - 10, 2021. https://doi.org/10.23645/epacomptox.16539939

Impact/Purpose:

Presentation to the QSAR 2021 International Workshop on QSAR in Environmental and Health Sciences June 2021. The US EPA needs to develop data sets suitable for QSAR modeling for a variety of toxicity endpoints and physical properties. It is vital to develop best practices for curating data sets in terms of the molecular structure and experimental meta data associated with each data point. This activity builds on the standard process of DSSTox chemical curation but adds an additional layer of curation to the annotation and appropriate processing, reconciliation, and condensation of experimental data to make it suitable for QSAR modeling. This product will develop a systematic methodology for developing a final curated data set (with both property values and molecular structures) from a series of data sources. This research area is also linked to larger scientific efforts within the cheminformatics community internal and external to EPA.

Description:

The U.S. Environmental Protection Agency’s CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard) hosts a plethora of environmentally-relevant chemical information, including physicochemical property data suitable for QSAR/QSPR modeling. The development of these physical property datasets has generally involved the curation of publicly-available experimental data. The ease of accessing these data, along with the overall quality of the dataset (i.e. machine-readable formatting, inclusion of experimental conditions, etc) is highly variable. This purpose of this work is to identify the challenges associated with the assembly of physicochemical property datasets, with a focus on obtaining water solubility values for organic compounds. Common issues discovered during the process of assembling, integration and review of these data will be presented, along with solutions that can be easily implemented in a high-throughput manner. Our intention is to develop standard workflows and provide guidance that can be used by researchers for the curation of physicochemical property datasets and ideally extended to environmental fate and transport data and other relevant chemical related datasets. This abstract does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:06/10/2021
Record Last Revised:08/30/2021
OMB Category:Other
Record ID: 352659