Science Inventory

Machine Learning Approaches to Predicting Chemical Occurrence in Environmental and Biological Media

Citation:

Isaacs, K. Machine Learning Approaches to Predicting Chemical Occurrence in Environmental and Biological Media. SOT, Salt Lake City, UT, March 10 - 14, 2024. https://doi.org/10.23645/epacomptox.25400143

Impact/Purpose:

This is an abstract for an invited SOT presentation in a session titled "Practical applications of machine learning for gaining mechanistic insights in toxicology".

Description:

Abstract for invited SOT 2024 session "Practical applications of machine learning for gaining mechanistic insights in toxicology" Monitoring of chemical occurrence in various environmental media is gold-standard information for characterizing the mechanisms by which human and ecological receptors are exposed to exogenous chemicals. Monitoring data support the parameterization of quantitative exposure algorithms and validation of predictive models of chemical fate and transport, and play a role in decision-support workflows that prioritize chemicals for further study or regulation. However, since monitoring studies are expensive, technically challenging, and time-consuming, data have been collected for only a fraction of the tens-of-thousands of chemicals in commerce (or their environmental breakdown products). To fill gaps in monitoring data, predictive machine-learning (ML) models can be used to develop either qualitative or quantitative estimates of chemical occurrence in environmental or biological media. This talk will discuss two current examples from the EPA Office of Research and Development’s ExpoCast project that use chemical structure predictors (e.g., structural fingerprints) and chemical use predictors (e.g., use in various industries or products) to parameterize ML models for predicting chemical occurrence. In the first example, random forest classification models (which integrated data from dozens of public monitoring sources) were built to predict probability of chemical occurrence in 30 different environmental and biological media. In the second example, a dataset of quantitative measurements of 76 chemicals in a study of 120 households was used to develop support vector regression models for estimating quantitative air and dust concentrations. The talk will also cover key considerations in ML modeling in the context of these examples, including data curation, external validation, and evaluation of applicability domain.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:03/14/2024
Record Last Revised:03/13/2024
OMB Category:Other
Record ID: 360713