Science Inventory

A Web-based Literature Identification Platform for the ECOTOXicology Knowledgebase, Powered by Deep Learning

Citation:

Howard, B., C. Norman, A. Tandon, R. Shaw, J. Olker, C. Elonen, AND D. Hoff. A Web-based Literature Identification Platform for the ECOTOXicology Knowledgebase, Powered by Deep Learning. SETAC North America, Fort Worth, TX, November 15 - 19, 2020. https://doi.org/10.23645/epacomptox.13235174

Impact/Purpose:

The ECOTOX Knowledgebase is a comprehensive, publicly available application providing chemical environmental toxicity data on aquatic life, terrestrial plants and wildlife. ECOTOX data has been compiled over more than 30 years and currently includes over 50,000 references covering over 12,000 chemicals and over 13,000 species. Data from ECOTOX are used for all ecological risk assessments supporting pesticide registrations and re-registrations, all ambient water quality criteria for chemicals published since 1985, site-specific water quality criteria (by EPA Regions, States, and Tribes), and assessments used in emergency response. ECOTOX has established standard operating procedures that meet requirements for Agency systematic reviews of available information for use in Agency decision making. Presently, the literature review and data extraction processes are manually completed; however, development of more efficient data mining tools will ultimately lead to more informed predictive tools. This presentation describes an effort to use machine learning methods to automatically identify relevant documents and develop a customized software application for screening literature. By increasing efficiencies in identifying, obtaining, reviewing and encoding data for the user interface of ECOTOX, we will be able to quickly identify and curate ecotoxicological data to meet Program offices’ needs, as well as for use by State and tribes to determine thresholds and conduct risk assessments.

Description:

The ECOTOXicology Knowledgebase (ECOTOX) is a comprehensive, publicly available resource providing single chemical environmental toxicity data on aquatic life, terrestrial plants and wildlife. The database is updated quarterly, and to identify relevant references and extract pertinent data, the ECOTOX data curation pipeline employs a methodical- process - similar to initial stages of systematic review. This labor-intensive workflow requires curators to regularly evaluate tens of thousands of candidate references, the majority of which are then rejected as not relevant. After the careful review of hundreds of thousands of potentially relevant articles, the ECOTOX database currently (as of June 2020) contains data for 12,089 chemicals and 13,138 species manually extracted from 50,092 references. The availability of this extensive dataset of historical screening decisions provided us with the opportunity to develop high performance, state-of-the-art neural network classifiers to partially automate title and abstract screening and to categorize (e.g. human health, fate, chemical methods) rejected references. First, we prepared a database containing more than 88,000 previously screened references spanning nearly 100 different chemical-centric datasets. We used this data to develop two deep learning models which were then integrated into a modified version of the SWIFT-Active Screener software, a collaborative web-based reference screening platform. The first model is a neural language-model classifier that predicts the relevance of candidate references. When used to augment the standard SWIFT-Active Screener document prioritization model, this method provides a mean improvement of 6.5% Work Saved over random Sampling (WSS) compared to the standard Active Screener approach. The second model uses a separate deep learning network to conduct multi-class classification of excluded documents to predict the reason for exclusion. This model achieves F-scores in the 65-75% range for the most frequent classes and has been integrated into Active Screener to provide intelligent “default choices” for capturing exclusion reason. Using extensive simulations, we demonstrate that this modified version of Active Screener results in more than a 50% reduction, on average, in time spent screening ECOTOX references, with larger savings for the datasets having the most articles.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:11/19/2020
Record Last Revised:11/16/2020
OMB Category:Other
Record ID: 350151