Science Inventory

Predictive Models for Chemical Occurrence in Environmental and Biological Media

Citation:

Eddy, L., K. Phillips, C. Ring, J. Sobus, E. Ulrich, AND K. Isaacs. Predictive Models for Chemical Occurrence in Environmental and Biological Media. Society of Toxicology, San Diego, CA, March 27 - 31, 2022. https://doi.org/10.23645/epacomptox.19404179

Impact/Purpose:

Since monitoring studies are expensive, data have not been exhaustively collected for the tens-of-thousands of chemicals in commerce. To fill this gap, predictive models are built here that can be used to anticipate chemical presence and inform prioritization for further study. 

Description:

Monitoring of chemical occurrence in various media is critical for understanding the mechanisms by which human and ecological receptors are exposed to exogenous chemicals. Since monitoring studies are expensive, data have not been exhaustively collected for the tens-of-thousands of chemicals in commerce. To fill this gap, predictive models can be used to anticipate chemical presence and inform prioritization for further study. Here we present a suite of random forest models which integrate data from dozens of public monitoring sources to predict chemical occurrence in 30 different environmental and biological media. For each medium, classifier models were built to predict the probability of any given chemical being detected in that medium. Training data for a robust classifier model must consist of examples both of chemicals that are and chemicals that are not present in the medium. However, the available training dataset disproportionately contains chemical detections; out of 30 media, 14 media had fewer than 5 true negative chemicals. To address this dearth of negative data, augmented models were built which use positive unlabeled learning to identify likely negative chemicals from an unlabeled data set (here the Toxic Substances Control Act active inventory). Likely negatives identified using the augmented models were then used to train final media models. Final 5-fold cross-validated models with a balanced accuracy of 75% could be built for 14 media. An initial validation of blood model with limited external data demonstrated an accuracy of 73%. Final versions of models for all media will be tested on identified external data sets to assess their ability to predict emerging environmental exposures. These models have the potential to inform the development of 1) workflows for environmental decision-making, and 2) methods for assessing unknown structures in non-targeted analyses of environmental and biological media.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:03/31/2022
Record Last Revised:07/14/2022
OMB Category:Other
Record ID: 355244