Prediction of Chemical Function: Model Development and Application
Isaacs, K., J. Wambaugh, K. Dionisio, Chris Grulke, AND K. Phillips. Prediction of Chemical Function: Model Development and Application. Society of Toxicology 2017 Annual Meeting, Baltimore, MD, March 12 - 16, 2017.
New structure-based models of functional use were developed and applied to several applications, including alternatives assessment and high-throughput prioritization of chemicals.
The United States Environmental Protection Agency’s Exposure Forecaster (ExpoCast) project is developing both statistical and mechanism-based computational models for predicting exposures to thousands of chemicals, including those in consumer products. The high-throughput (HT) screening-level exposures developed under ExpoCast can be combined with HT screening (HTS) bioactivity data for the risk-based prioritization of chemicals for further evaluation. The functional role (e.g. solvent, plasticizer, fragrance) that a chemical performs can drive both the types of products in which it is found and the concentration in which it is present and therefore impacting exposure potential. However, critical chemical use information (including functional role) is lacking for the majority of commercial chemicals for which exposure estimates are needed. A suite of machine-learning based models for classifying chemicals in terms of their likely functional roles in products based on structure were developed. This effort required collection, curation, and harmonization of publically-available data sources of chemical functional use information from government and industry bodies. Physicochemical and structure descriptor data were generated for chemicals with function data. Machine-learning classifier models for function were then built in a cross-validated manner from the descriptor/function data using the method of random forests. The models were applied to: 1) predict chemical functions for 10,196 chemicals (including the Tox21 library) with limited or non-existent exposure data, allowing for the parameterization of EPA’s HT Stochastic Human Exposure and Dose Simulation (SHEDS-HT) model for a subset of 500 chemicals known to be present in consumer products, 2) screen a library of nearly 6,400 chemicals with available structure information and HTS data for potential functional substitutes, and 3) characterize the function of over 1,400 chemicals tentatively identified via HT non-targeted analyses of consumer product formulations and other articles of commerce. These models improve the high-throughput (HT) characterization of the function, use, and exposure potential of reported or measured chemicals in consumer products.