Science Inventory

Ensemble QSAR Modeling to Predict Multispecies Fish Toxicity Points of Departure (SOT)

Citation:

Sheffield, T. AND R. Judson. Ensemble QSAR Modeling to Predict Multispecies Fish Toxicity Points of Departure (SOT). Presented at Society of Toxicology annual meeting, Baltimore, NC, March 10 - 14, 2019. https://doi.org/10.23645/epacomptox.7844816

Impact/Purpose:

Abstract and poster for submission to Society of Toxicology annual meeting in March 2019. The goal was to prioritize chemicals for further evaluation by estimating acute and chronic points of departure in fish using QSAR (quantitative structure activity relationship) models.

Description:

QSAR modeling is used in prioritization of the thousands of chemical substances for which no ecological toxicity data is available. We pulled experimental results from the U.S. Environmental Protection Agency’s ECOTOX database and the European Chemical Agency’s database to build a large data set containing in vivo test data on thousands of chemical substances and hundreds of species of fish. This data set was used to create QSAR models to predict two types of potential points of departure (POD): acute LC50 (median lethal concentration) and endpoints comparable to the NOEC (no observed effect concentration) for any duration and measured effect. In addition to molecular descriptors and physiochemical property predictions, the QSAR models used study covariates, such as species and exposure route, as features to maximize accuracy when combining multiple data types. A novel method of substituting taxonomy groups for species dummy variables was introduced to allow the model to generalize to other species. A stacked ensemble of three machine learning methods—random forest, gradient boosted trees, and support vector regression—was implemented to increase accuracy with minimal feature selection. The models predicted LC50’s and NOECs within one order of magnitude 81% and 76% of the time, respectively, and had root-mean-square-errors (RMSEs) of roughly 0.83 and 0.98 log10(mg/L), respectively. Benchmarks indicated that the prediction accuracy was improved beyond the 95% confidence intervals of existing models. This abstract does not necessarily represent EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:03/14/2019
Record Last Revised:04/11/2019
OMB Category:Other
Record ID: 344454