Science Inventory

Development of PFAS bioconcentration models using online data sources and python-based machine learning

Citation:

Ramsland, C., G. Sinclair, T. Martin, AND A. Williams. Development of PFAS bioconcentration models using online data sources and python-based machine learning. Fall ACS, Chicago, IL, August 21 - 25, 2022. https://doi.org/10.23645/epacomptox.20493114

Impact/Purpose:

N/A

Description:

Per and polyfluoroalkyl substances (PFAS) are a widely manufactured class of chemicals with unique physicochemical properties and degradation profiles. Bioconcentration factor measures marine absorption of chemicals via the skin or respiratory surfaces; it is an important metric of bioaccumulation for regulators. The purpose of this study was to assemble experimental data from online data sources and literature to develop QSAR (quantitative structure activity relationship) models to predict bioconcentration factor (BCF) for PFAS compounds. Using Java code, the data was filtered on relevant BCF metadata and stored in a PostgreSQL database using a consistent data format. Each record was mapped to a unique substance ID in EPA’s Distributed Structure-Searchable Toxicology Database. The substance ID allows one to associate each record with a “QSAR-ready” SMILES string which is then used to generate molecular descriptors. Data set records consist of an ID value, a log BCF value, and the molecular descriptor values. Discordant records were omitted and the data sets were randomly split into a training and prediction sets. Models were built using machine learning libraries available in Python including random forest and support vector machines (SVM). Investigated is the difference in external prediction statistics between models trained on global chemical space and local PFAS-only data. A consensus model averaging the results from the other approaches was also evaluated. The views expressed here are those of the authors and do not necessarily represent the views or the policies of the U.S. Environmental Protection Agency.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:08/25/2022
Record Last Revised:08/31/2022
OMB Category:Other
Record ID: 355594