Science Inventory

An Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset

Citation:

Pradeep, P., R. Judson, D. DeMarini, Nagalakshmi Keshava, T. Martin, J. Dean, C. Gibbons, A. Simha, S. Warren, M. Gwinn, AND G. Patlewicz. An Evaluation of Existing QSAR Models and Structural Alerts and Development of New Ensemble Models for Genotoxicity Using a Newly Compiled Experimental Dataset. Computational Toxicology. Elsevier B.V., Amsterdam, Netherlands, 18:100167, (2021). https://doi.org/10.1016/j.comtox.2021.100167

Impact/Purpose:

Risk-based prioritization for thousands of chemicals based on genotoxicity is a requirement for the US EPA under the amended TSCA. Here, we compiled a large dataset of experimental data from a range of public sources to evaluate a categorization scheme that relied on the presence of at least one Ames or Clastogen assay outcome to make an overall call for genotoxicity. The dataset comprised almost 7500 substances with results from a range of guideline and non-guideline studies. The dataset was curated to reassign misclassified study types and tagged into 3 broad study types: Ames, Clastogen, and Other. We have provided a simple scheme to use data from public structure-based tools to categorize chemicals as genotoxic or not. In silico models and alerts of various types were used to generate predictions, and their performance was compared to the categorization scheme outcomes, either as individual models or as part of a Naïve Bayes ensemble model approach. The balanced accuracies ranged from 60-80%, with the best performing combinations comprising outcomes based on 2 QSAR models and 1-2 alert schemes.

Description:

Regulatory agencies world-wide face the challenge of performing risk-based prioritization of thousands of substances in commerce. In this study, a major effort was undertaken to compile a large genotoxicity dataset (54,805 records for 9299 substances) from several public sources (e.g., TOXNET, COSMOS, eChemPortal). The names and outcomes of the different assays were harmonized, and assays were annotated by type: gene mutation in Salmonella bacteria (Ames assay) and chromosome mutation (clastogenicity) in vitro or in vivo (chromosome aberration, micronucleus, and mouse lymphoma Tk+/− assays). This dataset was then evaluated to assess genotoxic potential using a categorization scheme, whereby a substance was considered genotoxic if it was positive in at least one Ames or clastogen study. The categorization dataset comprised 8442 chemicals, of which 2728 chemicals were genotoxic, 5585 were not and 129 were inconclusive. QSAR models (TEST and VEGA) and selected OECD Toolbox structural alerts/profilers (e.g., OASIS DNA alerts for Ames and chromosomal aberrations) were used to make in silico predictions of genotoxicity potential. The performance of the individual QSAR tools and structural alerts resulted in balanced accuracies of 57–73%. A Naïve Bayes consensus model was developed using combinations of QSAR models and structural alert predictions. The ‘best’ consensus model selected had a balanced accuracy of 81.2%, a sensitivity of 87.24% and a specificity of 75.20%. This in silico scheme offers promise as a first step in ranking thousands of substances as part of a prioritization approach for genotoxicity.

Record Details:

Record Type:DOCUMENT( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date:05/01/2021
Record Last Revised:06/28/2021
OMB Category:Other
Record ID: 352034