Science Inventory

Using Chemical and Biological Descriptors to Develop Predictive Models for Rat Acute Oral Toxicity

Citation:

Fitzpatrick, J., P. Pradeep, A. Karmaus, AND G. Patlewicz. Using Chemical and Biological Descriptors to Develop Predictive Models for Rat Acute Oral Toxicity. Presented at Society of Toxicology, San Antonio, TX, March 11 - 15, 2018. https://doi.org/10.23645/epacomptox.7029254

Impact/Purpose:

Assessing the acute toxic potential of a substance is necessary to determine the potential effects of accidental or deliberate short-term exposure. There are no accepted in vitro approaches available, and few in silico models, to predict acute oral toxicity.

Description:

Assessing the acute toxic potential of a substance is necessary to determine the potential effects of accidental or deliberate short-term exposure. There are no accepted in vitro approaches available, and few in silico models, to predict acute oral toxicity. Until recently, a paucity of experimental in vivo acute toxicity data was available for model development and evaluation. Here, a large acute oral toxicity dataset totaling 15,698 unique chemicals was compiled from different sources including the Organization for Economic Corporation and Development’s eChemPortal, the National Library of Medicine’s [NLM] Hazardous Substances Data Bank, NLM’s ChemIDplus via the Toxicity Estimation Software Tool, the European Union Joint Research Centre’s AcutoxBase and the National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Method’s Pesticide Active Ingredients Database. Many of the LD50 values originated from limit tests which estimate an LD50 value as being above/below a specific threshold, typically 2000 mg/kg or 5000 mg/kg. These limit tests present challenges for model development since they provide less information than an explicitly quantified LD50 value. To overcome this limitation, three approaches were used to model acute oral toxicity using ToxCast/Tox21 activities as biological descriptors and ToxPrints and physicochemical properties as chemical descriptors. All models were developed and evaluated using 80% data as training set and 20% data as an external test set. The first approach was a global random forest classification model was built to predict which substances would be above and below a LD50 of 5000 mg/kg. The balanced accuracy of the model on the test set was 76%. Secondly, a global ridge regression model was built using biological descriptors. The RMSE and R2 for the test set were 0.76 and 0.35, respectively. The third was a set of 10 cluster-based local random forest models built using the k-means algorithm for deriving the clusters. The RMSEs and R2 for the test sets ranged from 0.52-0.83 and 0.20-0.48, respectively. Overall, the local cluster-based models performed better than the global models. This abstract does not necessarily reflect US EPA policy and was funded in part with federal funds from the NIEHS, NIH under Contract No. HHSN273201500010C.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:03/15/2018
Record Last Revised:08/31/2018
OMB Category:Other
Record ID: 342154