Science Inventory

The ECOTOX Knowledgebase pipeline: From literature search to data extraction

Citation:

LaLone, C. The ECOTOX Knowledgebase pipeline: From literature search to data extraction. Chemicals Committee- Working Party on Chemicals, Pesticides and Biotechnology, paris, N/A, FRANCE, October 25 - 26, 2018.

Impact/Purpose:

The ECOTOX Knowledgebase is a publicly accessible web-based tool that houses curated ecotoxicology data. The tool has been significantly updated in terms of the user interface and the curation pipeline. This presentation will be used to describe the advances made in collecting chemical toxicity data from available sources and those made in exploring the resulting data. Additionally, the presentation will discuss future advances that are being made to allow the ECOTOX Knowledgebase to interact with other EPA tools.

Description:

The US Environmental Protection Agency’s Ecotoxicology (ECOTOX) Knowledgebase contains more than 30 years of reported single chemical toxicity effects data on aquatic and terrestrial organisms. Approximately 900,000 test results covering more than 11,000 chemicals and 12,000 species are available in ECOTOX. A significantly enhanced interface (v5.0) of the ECOTOX Knowledgebase was released in 2018. Advances include the integration of improved and computationally automated literature search strategies for data curation, and inclusion of more mechanistic (e.g., genetic, enzymatic) and pathway-based (e.g., hormonal, cellular) data to better align with the evolution of toxicity testing. This newly released ECOTOX user interface has enhanced functionality for searching and exploring data, with interactive data visualization capabilities. Further, links to other EPA tools (e.g., SeqAPASS) and databases are being integrated, laying the foundation for future interoperability While the database is currently used by many sectors for a variety of purposes, a future goal is to allow for computational modeling of the data to identify novel adverse outcome pathways and networks, and assist in predicting chemical hazard and species sensitivity. One obstacle is that ECOTOX captures study information using author-reported descriptions, resulting in more than 4000 codes. Relationships among these codes are often not apparent in the current design (e.g., unique codes exist for both aryl hydrocarbon hydrolase and cytochrome P450 1A), and some codes are uniquely specific to the study of its derivation (e.g., 3rd generation male). To enhance the query capability of the data within and external to the ECOTOX knowledgebase, and to prepare for future computational functionality, the ECOTOX codes were mapped to existing biological ontology classes. To facilitate this mapping, a Java-based Lookup tool was developed using the ontology browser BioPortal (https://bioportal.bioontology.org/) REST API. This tool was designed to allow for batch processing and to make use of BioPortal’s Annotator and Recommender features. A training set composed of every code in the knowledgebase applicable to > 0.5% (2,368) of ECOTOX references (as of March, 2018) was used initially with the BioPortal Lookup tool. The majority (58%) of these codes were successfully mapped, with a higher rate of success for the effect measurement codes. Manual review of the mappings indicated that a proportion of the unmapped codes could be described using multiple ontology identifiers in combination, while some codes mapped using the BioPortal Lookup tool resulted in ontology classes with improper context.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:10/26/2018
Record Last Revised:10/26/2018
OMB Category:Other
Record ID: 342976