Science Inventory

Quantitative Prediction of Systemic Toxicity Points of Departure (OpenTox USA 2017)

Citation:

Pradeep, P. AND R. Judson. Quantitative Prediction of Systemic Toxicity Points of Departure (OpenTox USA 2017). Presented at OPENTOX USA 2017, Durham, NC, July 12 - 13, 2017.

Impact/Purpose:

poster presentation at the OpenTox 2017 meeting

Description:

Human health risk assessment associated with environmental chemical exposure is limited by the tens of thousands of chemicals little or no experimental in vivo toxicity data. Data gap filling techniques, such as quantitative models based on chemical structure information, are commonly used to predict hazard in the absence of experimental data. This study presents a set of predictive models developed using chemical structural and physicochemical properties for chronic or sub-chronic in vivo points of departure (POD, the point on the dose-response that marks the beginning of a low-dose extrapolation). The in vivo data is taken from the EPA’s ToxValDB, a compilation of information on ~3000 unique chemicals from a variety of public data sources. Using these data, and PubChem fingerprints and Chemistry Development Kit (CDK) descriptors as the feature sets, two types of models were developed: (1) rat (756 training chemicals), and (2) mouse (526 training chemicals). Unsupervised feature selection was used to remove the fingerprints with less than 80% variance and supervised recursive feature elimination with linear regression was used to select 5 most relevant descriptors. Regression models, for both rat and mouse, were developed using linear regression, random forests (RF), and K-nearest neighbor algorithms implemented with hyperparameter tuning within a 5-fold cross validation scheme. The best rat model (RF) had a RMSE of 1.02 log10 mg/kg/day and R2 of 0.36, and the best mouse model (RF) had a RMSE of 0.98 log10 mg/kg/day and R2 of 0.25. Since the training data for both types of models was imbalanced, they were re-constructed by creating 5 bootstrap sample datasets with 10% duplicate data (randomly selected from the long tail), and the models were re-developed on the new bootstrapped datasets. The best resultant rat model (RF) had an average RMSE of 0.92 log10 mg/kg/day and R2 of 0.48, and the best resultant mouse (RF) model had an average RMSE of 0.90 log10 mg/kg/day and R2 of 0.37. Future directions will include adding uncertainty estimates to the predicted POD values. These models will be used in the context of chemical screening and prioritization efforts. This

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:07/13/2017
Record Last Revised:03/20/2018
OMB Category:Other
Record ID: 339934