Science Inventory

Implementation of a Flexible Tool for Automated Literature-Mining and Knowledgebase Development (DevToxMine)

Citation:

KNUDSEN, T. B. AND A. V. SINGH. Implementation of a Flexible Tool for Automated Literature-Mining and Knowledgebase Development (DevToxMine). Presented at 2009 Annual Teratology Society Meeting, Rio Grande, PUERTO RICO, June 27 - July 01, 2009.

Impact/Purpose:

This flexible text-mining tool (DevToxMine™), combined with ontology for embryogenesis, is being used to build a knowledgebase for EPA’s Virtual Embryo project.

Description:

Deriving novel relationships from the scientific literature is an important adjunct to datamining activities for complex datasets in genomics and high-throughput screening activities. Automated text-mining algorithms can be used to extract relevant content from the literature and build a thesaurus to convert word relations into concepts. Conceptmining has become an essential knowledge discovery tool to address causal links, associations, relationships, and patterns among vast collections of nformation. EPA’s ToxRefDB database has been built from source data derived from 30-years worth of in vivo animal toxicity studies, mostly rat and rabbit studies. This database includes 751 prenatal developmental toxicity studies on 387 chemicals. For example, large-scale profiling of environmental chemicals for developmental effects with ToxRefDB revealed a species dimorphism of renal-ureteric defects expressed in the rat over rabbit, and a strong correlation between fetal weight reduction and defects of the axial skeleton. In this study, we applied custom text-mining tools to extract the underlying concepts from PubMed. Automated queries were built as and and strings with perl.script to fetch and store facts and information in a MySQL database. Keywords included the ToxCast_320 chemicals and 988 features from an enhanced thesaurus of developmental effects (www.DevTox.org). The raw search returned 186K PubMed abstracts for 82% of the chemicals. Filtering by and narrowed this to 9K abstracts covering 48% of the chemicals. A computational filter applied to find co-occurrences of chemicals, developmental endpoints, and chemical-endpoint linkages returned 4 distinct chemicals and 14 effects at 10 abstracts cutoff value. Although linkages found with the exploratory text-mining tool were conceptualized from relationships mined from ToxRefDB, they included new relationships beyond the ToxRefDB database. This flexible text-mining tool (DevToxMine™), combined with ontology for embryogenesis, is being used to build a knowledgebase for EPA’s Virtual Embryo project. [This work has been reviewed by EPA and approved for publication but does not necessarily reflect official Agency policy].

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:06/29/2009
Record Last Revised:08/19/2010
OMB Category:Other
Record ID: 210448