Science Inventory

OPERA: A QSAR tool for physicochemical properties and environmental fate predictions (ACS Spring meeting)

Citation:

Mansouri, K., Chris Grulke, R. Judson, AND A. Williams. OPERA: A QSAR tool for physicochemical properties and environmental fate predictions (ACS Spring meeting). Presented at ACS Spring meeting, San Francisco, California, April 02 - 06, 2017.

Impact/Purpose:

Presentation at the ACS spring meeting. The aim of this study was to develop robust QSAR models that can be used for regulatory purposes for endpoints of environmental interest.

Description:

The collection of chemical structures and associated experimental data for QSAR modeling is facilitated by the increasing number and size of public databases. However, the performance of QSAR models highly depends on the quality of the data used and the modeling methodology. The aim of this study was to develop robust QSAR models that can be used for regulatory purposes for endpoints of environmental interest. For that purpose we mainly used the publicly available PHYSPROP database that includes a set of thirteen common physicochemical and environmental fate properties, including logP, Henry’s coefficient, melting point, and biodegradability among others. These datasets have undergone extensive curation using an automated workflow designed for the purpose of selecting only the good quality data. The chemical structures were standardized before molecular descriptor calculation. The modeling procedure was developed based on the five OECD principles for QSARs in order to produce reliable, yet simple models with a minimum number of descriptors using the weighted kNN approach. The sizes of the modeled datasets varied from 150 chemicals for biodegradability half-lives to 14041 chemicals for logP with an average of 3222 chemicals. The optimal models were built on randomly selected training sets (75%) and validated in 5-fold cross-validation (CV) and test sets (25%). The CV Q2 of the models varied from 0.72 to 0.95 with an average of 0.86 and a R2 test from 0.71 to 0.96 with an average of 0.82. The genetic algorithms were used to include only the most pertinent and mechanistically interpretable descriptors that varied from 2 to 15 with an average of 11 descriptors. All models were implemented in a free, open source, and open data application called OPERA (OPEn saR App) and were applied to ~700k chemicals to produce predictions for display on the EPA CompTox Chemistry Dashboard. This abstract does not reflect U.S. EPA policy.

URLs/Downloads:

ACS_OPERA.PDF  (PDF, NA pp,  1660.791  KB,  about PDF)

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:04/06/2017
Record Last Revised:09/08/2017
OMB Category:Other
Record ID: 337488