Science Inventory

Predicting Molecular Initiating Events from High Throughput Transcriptomic Screening using Machine Learning (EGMS 2021)

Citation:

Bundy, J., R. Judson, A. Williams, Chris Grulke, I. Shah, AND L. Everett. Predicting Molecular Initiating Events from High Throughput Transcriptomic Screening using Machine Learning (EGMS 2021). 2021 Environmental Mutagenesis and Genomics Society Annual Meeting (Virtual), Virtual, NC, September 22 - 25, 2021. https://doi.org/10.23645/epacomptox.17131187

Impact/Purpose:

Presentation to the Environmental Mutagenesis and Genomics Society (EMGS) annual meeting Bioinformatics Challenge in September 2021. The project focuses on predicting molecular initiating events by integrating high throughput transcriptomic data with chemical-MIE labels using a machine learning approach.

Description:

The advent of high-throughput transcriptomic screening technologies has resulted in a wealth of gene expression signatures associated with chemical perturbagens. One resource is the Library of Integrated Network-Based Cellular Signatures (LINCS) spanning ~20k chemical perturbagens. From a chemical safety perspective, such data sets offer utility for the prediction of molecular initiating events (MIEs) induced by exposure to environmental perturbagens. The development of these methods to utilize such data sets is relevant to U.S. EPA’s focus on increasing efficiency in chemical screening using new approach methodologies. To ascertain the utility of high-throughput transcriptomic screening for predicting MIEs in response to chemical exposures, we 1) created target-specific training sets by pairing LINCS gene expression profiles with chemical-target associations from RefChemDB, and 2) trained binary classifiers for each MIE using multiple algorithms. Classifiers were trained using three training feature types, six classification algorithms, and two cell-types (MCF7 and PC3 cells). Classifiers trained on chemically induced perturbations in gene expression successfully predicted MIEs for holdout gene expression profiles excluded from model training. Comparison of cross-fold validation accuracies and holdout accuracies showed high concordance. MIEs modeled with dissimilar accuracies between cell lines were found to correspond to targets that have different baseline expression in MCF7 and PC3 cells, such as estrogen receptors. Linking MIE labels with gene expression compendia can produce models with predictive value, and can also inform which cell types offer the most predictive power for screening chemical perturbagens of a particular MIE. This abstract does not necessarily reflect US EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:09/25/2021
Record Last Revised:12/06/2021
OMB Category:Other
Record ID: 353525