Science Inventory

Development of QSAR Models to Predict Systemic Toxicity Points of Departure

Citation:

Pradeep, P. AND R. Judson. Development of QSAR Models to Predict Systemic Toxicity Points of Departure. Presented at NC SOT, Durham, North Carolina, October 30, 2017. https://doi.org/10.23645/epacomptox.6860270

Impact/Purpose:

Poster presentation at the NC SOT annual meeting on Development of QSAR Models to Predict Systemic Toxicity Points of Departure

Description:

Human health risk assessment associated with environmental chemical exposure is limited by the tens of thousands of chemicals little or no experimental in vivo toxicity data. Data gap filling techniques, such as quantitative structure activity relationship (QSAR) models based on chemical structure information, are commonly used to predict hazard in the absence of experimental data. This study presents a set of QSAR models developed using chemical structural and physicochemical properties for chronic or sub-chronic in vivo points of departure (POD, the point on the dose-response that marks the beginning of a low-dose extrapolation). The in vivo data is taken from the EPA’s ToxValDB, a compilation of information on ~3000 unique chemicals from a variety of public data sources. Using these PODs, PubChem fingerprints and Chemistry Development Kit (CDK) descriptors as the feature sets, two types of models were developed: (1) rat (756 training chemicals), and (2) mouse (526 training chemicals). For each model, a POD distribution was constructed for each chemical using the smallest log-transformed POD value as the mean and a standard deviation of 0.5 log-units, to take into account data variability. Bootstrap models were then developed to derive a confidence interval for each prediction. The POD value for each chemical used in the training dataset for each bootstrap model was selected randomly from the previously generated POD distribution. For each bootstrap model, unsupervised feature selection was used to remove the fingerprints with less than 80% variance and supervised recursive feature elimination with linear regression was used to select 5 most relevant CDK descriptors. Regression models, for both rat and mouse, were developed using linear regression, random forests (RF), and K-nearest neighbor algorithms implemented with hyper-parameter tuning within a 5-fold cross validation scheme. The best rat RF model had an average bootstrapped root mean squared error (RMSE) of 1.02 log10 mg/kg/day and an average bootstrapped R2 of 0.36, and the best mouse RF model had an average bootstrapped RMSE of 0.97 log10 mg/kg/day and an average bootstrapped R2 of 0.27. These models will inform chemical screening and prioritization efforts. This abstract does not necessarily represent U.S. EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:10/30/2017
Record Last Revised:08/06/2018
OMB Category:Other
Record ID: 341752