EPA Science Inventory

A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data

Citation:

Bradshaw, K., Tom Purucker, K. Wong, AND M. Molina. A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data. 2015 Rapid NGS Bioinformatic Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens, Washington, DC, September 24 - 27, 2015.

Description:

Next-Generation Sequencing (NGS) produces large data sets that include tens-of-thousands of sequence reads per sample. For analysis of bacterial diversity, 16S NGS sequences are typically analyzed in a workflow that containing best-of-breed bioinformatics packages that may leverage multiple programming languages (e.g., Python, R, Java, etc.). The process totransform raw NGS data to usable operational taxonomic units (OTUs) can be tedious due tothe number of quality control (QC) steps used in QIIME and other software packages forsample processing. Therefore, the purpose of this work was to simplify the analysis of 16SNGS data from a large number of samples by integrating QC, demultiplexing, and QIIME(Quantitative Insights Into Microbial Ecology) analysis in an accessible R project. User command line operations for each of the pipeline steps were automated into a workflow. In addition, the R server allows multi-user access to the automated pipeline via separate useraccounts while providing access to the same large set of underlying data. We demonstratethe applicability of this pipeline automation using 16S NGS data from approximately 100 stormwater runoff samples collected in a mixed-land use watershed in northeast Georgia. OTU tables were generated for each sample and the relative taxonomic abundances were compared for different periods over storm hydrographs to determine how the microbial ecology of a stream changes with rise and fall of stream stage. Our approach simplifies the pipeline analysis of multiple 16S NGS samples by automating multiple preprocessing, QC, analysis and post-processing command line steps that are called by a sequence of R scripts.

Purpose/Objective:

Presented at ASM 2015 Rapid NGS Bioinformatic Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens

URLs/Downloads:

http://conferences.asm.org/index.php/2012-02-09-21-04-52/past-conferences/2-uncategorised/341-conference-scope-2015-ngs   Exit

Record Details:

Record Type: DOCUMENT (PRESENTATION/POSTER)
Completion Date: 09/27/2015
Record Last Revised: 10/22/2015
Record Created: 10/22/2015
Record Released: 10/22/2015
OMB Category: Other
Record ID: 309890

Organization:

U.S. ENVIRONMENTAL PROTECTION AGENCY

OFFICE OF RESEARCH AND DEVELOPMENT

NATIONAL EXPOSURE RESEARCH LAB

ECOSYSTEMS RESEARCH DIVISION