Science Inventory

In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning

Citation:

Zang, Q., K. Mansouri, A. Williams, R. Judson, D. Allen, W. Casey, AND N. Kleinstreuer. In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning. Journal of Chemical Information and Modeling. American Chemical Society, Washington, DC, 57(1):36-49, (2017).

Impact/Purpose:

• Agency Research Drivers - EPA regulates the use, production, processing and importing of chemicals used in agriculture, industry and commerce. In addition, EPA evaluates alternative chemical design to support the EPA’s Green Chemistry Program and Design for the Environment Program. • Science Challenge – Physicochemical properties of chemicals are needed to model environmental fate and transport, as well as exposure potential. • Research Approach – In this effort, decades-old experimental property data sets available within the EPA EPI Suite were reanalyzed using modern cheminformatics workflows to develop updated QSPR models of physiochemical properties. • Results – Models were built using updated EPI Suite data sets for the prediction of six physicochemical properties: octanol−water partition coefficient, water solubility, boiling point, melting point, vapor pressure, and bioconcentration factor. Predictive model performance for five of the six properties exceeding those from the original EPI Suite models. • Anticipated Impact/Expected use – This study generated an open-source workflow to predict a variety of physicochemical properties. The newly derived models can be employed for rapid estimation of physicochemical properties to inform fate and toxicity prediction models of environmental chemicals.

Description:

There are little available toxicity data on the vast majority of chemicals in commerce. High-throughput screening (HTS) studies, such as those being carried out by the U.S. Environmental Protection Agency (EPA) ToxCast program in partnership with the federal Tox21 research program, can generate biological data to inform models for predicting potential toxicity. However, physicochemical properties are also needed to model environmental fate and transport, as well as exposure potential. The purpose of the present study was to generate an open-source quantitative structure−property relationship (QSPR) workflow to predict a variety of physicochemical properties that would have cross-platform compatibility to integrate into existing cheminformatics workflows. In this effort, decades-old experimental property data sets available within the EPA EPI Suite were reanalyzed using modern cheminformatics workflows to develop updated QSPR models capable of supplying computationally efficient, open, and transparent HTS property predictions in support of environmental modeling efforts. Models were built using updated EPI Suite data sets for the prediction of six physicochemical properties: octanol−water partition coefficient (logP), water solubility (logS), boiling point (BP), melting point (MP), vapor pressure (logVP), and bioconcentration factor (logBCF). The coefficient of determination (R2) between the estimated values and experimental data for the six predicted properties ranged from 0.826 (MP) to 0.965 (BP), with model performance for five of the six properties exceeding those from the original EPI Suite models. The newly derived models can be employed for rapid estimation of physicochemical properties within an open-source HTS workflow to inform fate and toxicity prediction models of environmental chemicals.

Record Details:

Record Type:DOCUMENT( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date:01/09/2017
Record Last Revised:05/11/2018
OMB Category:Other
Record ID: 337719