Science Inventory

Predicting Chemical Occurrence in Environmental and Biological Media [ISES abstract]

Citation:

Eddy, L., K. Phillips, J. Sobus, E. Ulrich, AND K. Isaacs. Predicting Chemical Occurrence in Environmental and Biological Media [ISES abstract]. International Society for Exposure Science 2021 Virtual Meeting, Virtual, Virtual, August 30 - September 02, 2021. https://doi.org/10.23645/epacomptox.15151947

Impact/Purpose:

This is an abstract for a general ISES submission. The talk will describe recent efforts to predict occurrence in environmental media for data-poor chemicals. These models inform decision support and non-targeted analyses.

Description:

Monitoring of chemical occurrence in various media is critical for understanding the mechanisms by which human and ecological receptors are exposed to exogenous chemicals. Since monitoring studies are expensive, data have not been exhaustively collected for the tens-of-thousands of chemicals in commerce. To fill this gap, predictive models can be used to anticipate chemical presence and inform prioritization for further study. Here we present a suite of random forest models which integrate data from dozens of public monitoring sources to predict chemical occurrence in 25 different environmental and biological media. For each medium, classifier models were built to predict the probability of any given chemical being detected in that medium. Regression models were further developed to predict quantitative global detection rates for novel compounds based on their similarity to chemicals previously characterized in the media of interest.  These models utilize descriptors for physicochemical properties, structural characteristics, and uses. All models were evaluated using out-of-bag error, 5-fold cross-validation, and y-randomization. Classification models having an average out-of-bag error rate of <15% could be built for 22 media. We compared the performance of two distinct regression modeling approaches – one which places more emphasis on training set chemicals having more information, and another which weights the information for all chemicals equally. Preliminary results suggest better performance from the regression model that weights all chemicals equally. In the equal-weights model, the out-of-bag error is <10% for 20 out of 25 media. In the model which places higher weight on data-rich chemicals, the out-of-bag error is <10% for only 4 out of 25 media. The regression models can be used in tandem with the classification models to provide more holistic predictions of occurrence for chemicals having no monitoring data. Final versions of our models will be tested on external data sets to assess their ability to predict emerging environmental exposures. These models have the potential to inform the development of 1) workflows for environmental decision-making, and 2) methods for assessing unknown structures in non-targeted analyses of environmental and biological media. The views expressed here are those of the authors and do not necessarily reflect the views or policies of the U. S. EPA.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:09/02/2021
Record Last Revised:08/12/2021
OMB Category:Other
Record ID: 352556