Science Inventory

httrpl: A Targeted RNA-seq High-throughput Transcriptomics Analytical Pipeline for Environmental Chemical Screening

Citation:

Haggard, D., J. Bundy, J. Harrill, I. Shah, R. Judson, AND L. Everett. httrpl: A Targeted RNA-seq High-throughput Transcriptomics Analytical Pipeline for Environmental Chemical Screening. SOT, Nashville, TN, March 19 - 23, 2023. https://doi.org/10.23645/epacomptox.22263868

Impact/Purpose:

This abstract describes work being done on developing a data analysis pipeline for CCTE high-throughput transcriptomics screens of environmental chemicals using the TempO-Seq platform. This pipeline allows scientists to efficiently and reproducibly perform a complete analysis of targeted RNA-seq experiments and store output from multiple levels of analysis within a standardized database management system. We describe an overview of the pipeline, comparisons with alternative methods, and explore the rationale for the various decisions made with various tools used throughout the pipeline. This pipeline is planned to be publicly released so other agency and program partners that use targeted RNA-Seq assays can download and use the pipeline on their own datasets. 

Description:

Advancing the pace of chemical risk assessment necessitates the development of new approach methodologies (NAMs) that provide meaningful information on chemical risk without the need for whole animal testing. A potential NAM based on the TempO-seq targeted RNA-seq platform has been proposed which uses high-throughput transcriptomic (HTTr) profiling to rapidly screen and prioritize large numbers of environmental chemicals in vitro. As part of this NAM, we developed the high-throughput transcriptomics pipeline (httrpl) software package, an analytical pipeline that enables researchers to efficiently and reproducibly perform a complete analysis of targeted RNA-seq experiments and store output from multiple levels of analysis within a standardized database management system.  httrpl uses well-established open-source analysis tools, provides a stable, version-controlled containerized environment via Docker to ensure reproducibility across varying compute platforms, and a NoSQL MongoDB database to store all outputs. The general workflow for httrpl is as follows: 1) rapidly align and count raw sequencing data using the HISAT2 and SAMtools open-source software, 2) estimate fold-change values of the read counts using the DESeq2 differential gene expression R package, and 3) derive gene signature-level data from fold-change data and perform benchmark dose-response modeling to define various statistics relevant to chemical risk assessments such as benchmark doses and transcriptional point-of-departures using the US EPA tcplfit2 R package. In this work, we provide an overview of the workflow and MongoDB schema used in httrpl and provide the rationale for the various tools and methods employed by httrpl using data from previously published HTTr chemical screens in various cell lines. First, we compare the HISAT2 sequence aligner in httrpl to several alternative alignment tools. We then describe special considerations and challenges for alignment that the TempO-Seq platform presents. We also evaluate several quality control (QC) metrics used in httrpl. Finally, we compare different shrinkage methods used by DESeq2 during fold-change estimation, which control for variability in lowly expressed probes. Preliminary results demonstrate that the choice in sequence aligner does not provide a large advantage on alignment accuracy, but specific alignment parameters should be considered due to the sequence similarity between some TempO-Seq probes. We show that httrpl QC metrics, in particular percent mapped reads, appear to be a good indicator of replicate reproducibility and overall sample quality. Importantly, these values vary by cell line and should be re-evaluated whenever new cell lines or reference samples are used. Additionally, we demonstrate that the choice of DESeq2 shrinkage method when estimating fold-changes can have large impacts on the ability to detect differentially expressed genes, and selection may depend on how conservative of an approach a user wants to have for a given study. New technologies, declining costs, and increased efficiency in HTTr and other profiling methods have greatly benefited the development of NAMs for high-throughput chemical hazard evaluation. Furthermore, the release of stable, scalable, and reproducible data analysis pipelines is paramount to their successful adoption. httrpl provides a self-contained workflow and database for high-throughput chemical screening studies that use TempO-Seq or similar targeted RNA-Seq platforms. This abstract does not necessarily reflect US EPA policy. Company or product names do not constitute endorsement by US EPA.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:03/23/2023
Record Last Revised:04/13/2023
OMB Category:Other
Record ID: 357585