Science Inventory

Predicting Molecular Initiating Events from High Throughput Transcriptomic Screening using Machine Learning

Citation:

Bundy, J., R. Judson, A. Williams, Chris Grulke, I. Shah, AND L. Everett. Predicting Molecular Initiating Events from High Throughput Transcriptomic Screening using Machine Learning. Midsouth Computational Biology & Bioinformatics Society (MCBIOS) and MAQC 2021 Joint Conference, Virtual, North Carolina, April 26 - 30, 2021. https://doi.org/10.23645/epacomptox.16764067

Impact/Purpose:

Poster presented to the Midsouth Computational Biology & Bioinformatics Society (MCBIOS) and MAQC 2021 Joint Conference in April 2021. Work supports in vitro risk assessment.

Description:

Background The advent of high-throughput transcriptomic screening technologies has resulted in a wealth of gene expression signatures associated with chemical perturbagens. One resource is the Library of Integrated Network-Based Cellular Signatures (LINCS) spanning ~20k chemical perturbagens across multiple cell types. From a chemical safety perspective, datasets covering a large chemical space offer utility for the prediction of molecular initiating events (MIEs) induced by exposure to environmental perturbagens. The development of methods to interrogate large transcriptomics data sets is relevant to U.S. EPA’s focus on increasing efficiency in chemical screening using new approach methodologies. To ascertain the utility of high-throughput transcriptomic screening for predicting MIEs in response to chemical exposures, we 1) created target-specific training sets by pairing LINCS gene expression profiles with chemical-target associations from RefChemDB, and 2) trained binary classifiers using multiple binary classification algorithms. To explore differences in capacity to predict MIEs across cell types, classifiers were trained on gene expression profiles derived from the breast cancer derived MCF7 cell line, and the prostate cancer derived PC3 cell line separately. Results Classifiers were trained to predict distinct MIEs using three training feature types and six classification algorithms. Comparison of cross-fold validation accuracies and holdout data set accuracies showed high concordance. MIEs modeled with dissimilar accuracies between MCF7 and PC3 cell lines were found to correspond to targets that have differential expression between the two cell lines, such as estrogen receptors. Conclusions Classifiers trained on chemically induced perturbations in gene expression successfully predicted MIEs for holdout gene expression profiles excluded from model training. Linking MIE labels with gene expression compendia can produce models with predictive value. This methodology may be helpful in determining which in vitro cell types offer the most predictive power for screening chemical perturbagens of a particular MIE. This abstract does not necessarily reflect US EPA policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:04/30/2021
Record Last Revised:10/07/2021
OMB Category:Other
Record ID: 352993