Science Inventory

Comparing microarrays to RNA-seq for transcriptomic analysis of whole fathead minnow larvae

Citation:

Kostich, M., D. Bencic, R. Flick, J. Martinson, W. Huang, Greg Toth, AND A. Biales. Comparing microarrays to RNA-seq for transcriptomic analysis of whole fathead minnow larvae. SETAC North America, Toronto, ON, CANADA, November 03 - 07, 2019.

Impact/Purpose:

Demonstrating RNA-seq technology as a viable and cost effective alternative to microarrays for gene expression based studies in the fathead minnow.

Description:

RNA-seq is displacing microarrays for mRNA expression analysis, due to the declining costs of sequencing, and published studies reporting better performance for RNA-seq. However, most empirical comparisons lack a gold standard, and those experiments that do have a gold standard often evaluate synthetic samples whose representativeness of real-world experiments is unclear. Furthermore, RNA-seq performance is expected to be a function of sample complexity and sequencing depth. For complex samples, deeper sequencing depth is expected to be required to quantitate expression of rare transcripts. We are interested in using mRNA measurements in whole Fathead Minnow (Pimephales promelas) larvae for detecting aquatic toxicant exposures. Whole larvae represent a wide variety of cell types, some of which constitute a very small percentage of the total sample. The applicability of other RNA-seq performance studies to this system are unclear. To compare RNA-seq to microarrays, we exposed 2d post-hatch Fathead larvae to bifenthrin or negative control water for 48h, split the RNA from each larva, evaluating one sample half using a microarray and the other half using RNA-seq with a targeted sequencing depth of 30 million 100 base pair (bp) reads per sample. About 35 larvae were exposed to each condition. Reads were resampled at depths ranging from 2-30M reads per sample and lengths ranging from 25 bp to 100 bp. Larvae were resampled at depths ranging from 10 larvae per condition to 30 larvae per condition. Resampled reads were mapped to FHM gene models. Sets of normalized read counts per gene and microarray spot intensities were reduced to the most informative features using a linear model, and the reduced feature set was used to develop classifiers using Random Forests. Ten-fold cross-validation was used to estimate performance (Brier scores). Our results showed excellent results for microarrays and RNA-seq across a range of read lengths, read depths, feature selection stringency, all of which had much smaller effects on performance than the number of samples per treatment. Near optimal RNA-seq results, comparable to microarray performance were achieved with 8 million mapped 35-mer reads per sample. Quality trimming of reads and read-mapping software had only minor effects. Our results suggest the primary advantage of RNA-seq stems from cost per sample, rather than from the quality of the measurement data.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:11/07/2019
Record Last Revised:11/08/2019
OMB Category:Other
Record ID: 347367