Science Inventory

Filling Gaps in Exposure Data from Chemical Descriptors with Machine Learning

Citation:

Isaacs, K. Filling Gaps in Exposure Data from Chemical Descriptors with Machine Learning. Society of Toxicology 2021 Virtual Annual Meeting, Virtual, N/A, March 12 - 26, 2021. https://doi.org/10.23645/epacomptox.17846399

Impact/Purpose:

This talk is an invited presentation in the Society of Toxicology 2021 annual meeting session "New Approach Methodologies for Exposure: Advancing Chemical Risk Assessment"

Description:

One constant across exposure science is a poor data landscape for many chemicals in commerce. Fortunately, machine learning has become an increasingly common approach to fill such gaps in scientific knowledge. A common application of machine learning (ML) in drug-discovery and toxicology is quantitative structure activity/property relationship (QSAR/QSPR) modeling, which uses measured or reported biological activities or physicochemical properties of known chemicals to predict information data-poor chemicals, based on chemical structure descriptors. The methods used in these traditional QSAR applications are now also being used to address similar data gaps in exposure science. ML QSAR approaches include both classification and regression models; selection of appropriate specific algorithm is based on training set characteristics (e.g., size) and the specific question being addressed. This presentation will discuss recent efforts to develop robust training sets and predictive random forest classification and support vector machine regression models for chemical parameters critical for estimating exposures in a high-throughput (HT) manner. Parameters that have been predicted include the functional role of a chemical in products or processes, weight fraction ranges in consumer products, probability of occurrence in environmental media, and potential pathway of human exposure (e.g., consumer, industrial, dietary). In addition, the development of ML models for toxicokinetic parameters that allow for in vitro to in vivo extrapolation of cell-based hazard data for comparison with HT exposure estimates will be covered. The presentation will conclude with a discussion of strategies for facilitating acceptance of these ML-based new approach methodologies in the regulatory arena. These strategies include adoption of transparent and open methods for communicating training sets, model performance metrics, and chemical domain of applicability and the development of frameworks that encourage iterative improvement and expansion of models as new data become available.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:03/26/2021
Record Last Revised:01/04/2022
OMB Category:Other
Record ID: 353843