Science Inventory

Predicting Molecular Initiating Events From Gene Expression Profiles

Citation:

Bundy, J., R. Judson, A. Williams, Chris Grulke, I. Shah, AND L. Everett. Predicting Molecular Initiating Events From Gene Expression Profiles. Society of Toxicology 2021 Annual Meeting (Virtual Event), Virtual, NC, March 12 - 26, 2021. https://doi.org/10.23645/epacomptox.25833280

Impact/Purpose:

Poster presented at the Society of Toxicology 2021 meeting. The growing use of high throughput transcriptomic screening methodologies has resulted in the aggregation of large, publicly available gene expression data sets associated with chemical exposure. Integrating these public gene expression data sets with machine learning methodologies permits the prediction of Molecular Initiating Events (MIEs) induced by chemical exposure. In this work, we present a methodology for predicting MIEs by integrating LINCS L1000 gene expression data with RefChemDB chemical-target labels in a machine learning paradigm.

Description:

The advent of high-throughput transcriptomic screening technologies has resulted in a wealth of publicly available gene expression data associated with chemical treatment. From a regulatory perspective, data sets covering a large chemical space offer utility for the prediction of molecular initiating events associated with chemical exposure. Here, we integrate data from a large compendium of gene expression profiles with a catalog of chemical-target associations to train binary classifiers for predicting molecular initiating events (MIEs) from gene expression. To achieve this, we used RefChemDB, a database of chemical-protein interactions, and LINCS CMAP data, a collection of gene expression profiles spanning multiple cell lines and chemical treatments. First, we linked perturbagens present in the LINCS gene expression data to DTXSIDs in RefChemDB. Next, we trained binary classifiers on MCF7-derived gene expression profiles and chemical-target labels using six classification algorithms to identify optimal analysis parameters. To validate classifier accuracy, we used a variety of approaches, including multiple hold-out data sets, and permutation testing of “null” models. We identified 23 MIEs for which our training approach produced high performance classifiers that outperformed greater than 95% of permuted models. High performance classifiers were shown to corroborate RefChemDB chemical-target linkages withheld from model training, demonstrating that predictive accuracy extends beyond the set of chemicals used in classifier training. To explore differences in MIE prediction as a function of cellular context, MCF7 trained classifier accuracies were compared to orthologous classifiers trained on PC3 gene expression data, identifying classifiers that perform differently as a function of the cellular context of training data. This methodology can offer insight into prioritizing candidate perturbagens of interest for targeted screens, as well as selecting relevant cellular contexts for screening classes of candidate perturbagens. This abstract does not necessarily reflect US EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:03/26/2021
Record Last Revised:05/15/2024
OMB Category:Other
Record ID: 361440