Science Inventory

Development and Annotation of Protein Coding Gene Models for the Fathead Minnow Genome

Citation:

Martinson, J., W. Huang, G. Toth, D. Bencic, R. Flick, D. Lattier, A. Biales, AND M. Kostich. Development and Annotation of Protein Coding Gene Models for the Fathead Minnow Genome. 2019 SETAC North America Annual Meeting, Toronto, CANADA, November 03 - 07, 2019.

Impact/Purpose:

Poster presented at the 2019 SETAC North America Annual Meeting

Description:

The importance of the Fathead minnow in aquatic toxicology is demonstrated by the thousands of publications on the subject over the last four decades. To make the new assembly of the Fathead genome truly useful, the genome’s features must be identified and annotated. Probably the most important features to annotate are the proteins coding genes. Exploration of the protein coding gene space is fundamental to developing an understanding of the biological activity occurring at the molecular level within an organism in response to the environments it inhabits. A hybrid approach was employed to develop gene models for the new assembly. Two popular genome annotation pipelines, Maker2, and the Program to Assemble Spliced Alignments/Evidence Modeler (PASA/EVM) were each used to produce protein coding gene models. The ~37.2K PASA/EVM models were fed back into Maker2, primarily to update the Untranslated Regions (UTRs) of their transcript models to improve the mapping rates of RNA-seq reads to the transcript models. The Maker2 “adjusted” output resulted in ~25.6K gene models. PASA/EVM models that were not involved in the construction of the 25.6K Maker models and that showed no overlap with the Maker models were returned to the model set, resulting in ~36.9K gene models. The 36.9K models were then filtered based on several criteria; apparent presence of 4584 single-copy orthologous genes (BUSCOs) common to ray-finned fish, homology to reference proteins, and RNA-seq mapping rates. After filtering a final set of ~26.5K models remained, which exhibited complete BUSCO protein coverage of ~93% (~86% single copy) and RNA-seq-to-transcript mapping rates approaching 80%. The results compare favorably with the reference genome of the well-studied and closely related zebrafish, which currently is believed to have ~26K protein coding genes. Gene names and additional annotations were assigned to the models based on subsequent phylogenetic analysis (presented separately). The current set of Fathead gene models will allow detailed exploration of the effects of toxic substances on Fathead minnows and usher in the long-promised era of a deeper, ‘omics-based’, understanding of changes that occur at the molecular level due to exposure to different environments. This presentation will provide the details of the process used to develop the gene models and provide updates on the final number of gene models and their relevant statistics.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:11/07/2019
Record Last Revised:12/06/2019
OMB Category:Other
Record ID: 347657