Science Inventory

A Machine Learning Approach to Predicting Molecular Initiating Events by Integrating Chemical Target Annotations with Gene Expression

Citation:

Bundy, J., R. Judson, A. Williams, C. Grulke, I. Shah, AND L. Everett. A Machine Learning Approach to Predicting Molecular Initiating Events by Integrating Chemical Target Annotations with Gene Expression. Cosmetics Europe Toxicogenomics Workgroup, Virtual, NC, March 04, 2022. https://doi.org/10.23645/epacomptox.24470509

Impact/Purpose:

Powerpoint presentation to be given at the Cosmetics Europe Toxicogenomics Workgroup. The growing use of high throughput transcriptomic screening methodologies has resulted in the aggregation of large, publicly available gene expression data sets associated with chemical exposure. Integrating these public gene expression data sets with machine learning methodologies permits the prediction of Molecular Initiating Events (MIEs) induced by chemical exposure. In this work, we present a methodology for predicting MIEs by integrating LINCS L1000 gene expression data with RefChemDB chemical-target labels in a machine learning paradigm.

Description:

The advent of high-throughput transcriptomic screening technologies has resulted in a wealth of publicly available gene expression data associated with chemical treatment. From a regulatory perspective, data sets covering a large chemical space offer utility for the prediction of molecular initiating events associated with chemical exposure. Here, we integrate data from a large compendium of gene expression profiles with a catalog of chemical-target associations to train binary classifiers for predicting molecular initiating events (MIEs) from gene expression. To achieve this, we used RefChemDB, a database of chemical-protein interactions, and LINCS CMAP data, a collection of gene expression profiles spanning multiple cell lines and chemical treatments. First, we linked perturbagens present in the LINCS gene expression data to DTXSIDs in RefChemDB. Next, we trained binary classifiers on MCF7-derived gene expression profiles and chemical-target labels using six classification algorithms to identify optimal analysis parameters. To validate classifier accuracy, we used a variety of approaches, including multiple hold-out data sets, and permutation testing of “null” models. To explore differences in MIE prediction as a function of cellular context, MCF7 trained classifier accuracies were compared to orthologous classifiers trained on PC3 gene expression data. This methodology can offer insight into prioritizing candidate perturbagens of interest for targeted screens, as well as selecting relevant cellular contexts for screening classes of candidate perturbagens. This abstract does not necessarily reflect the views or policies of the US EPA. 

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:03/04/2022
Record Last Revised:10/31/2023
OMB Category:Other
Record ID: 359386