Science Inventory

Text-mining strategies to support computational research in chemical toxicity (ACS 2017 Spring meeting)


Baker, N., T. Knudsen, A. Williams, AND K. Crofton. Text-mining strategies to support computational research in chemical toxicity (ACS 2017 Spring meeting). Presented at ACS National Meeting, San Francisco, California, April 03 - 06, 2017.


Platform presentation at 2017 Spring ACS National Meeting..


With 26 million citations, PubMed is one of the largest sources of information about the activity of chemicals in biological systems. Because this information is expressed in natural language and not stored as data, using the biomedical literature directly in computational research is not straightforward. At the EPA’s National Center for Computational Toxicology, we address the challenges of navigating the biomedical literature in several novel ways. For the article retrieval text-mining task, we have integrated a custom search tool called Abstract Sifter in our CompTox Chemistry Dashboard. This tool allows our end-users to send a query to PubMed, retrieve the citations, and then “sift” the returned results iteratively ranked by relevancy. The Sifter software determines relevancy by finding and counting user-specified terms in the body of the abstracts. To address information retrieval, the other major task in text-mining, we have constructed a database populated by extracting and processing the MeSH indexing keywords from each article. We characterize these text-mining methods as high-throughput, because, like high-throughput in vitro testing, one article may only yield a few bits of data, but the accumulation of that data over millions of articles produces a very large, rich source of computable information. A third area is extraction of structured information from unstructured documents. For example, we are extracting chemical properties such as LogP from patents. The current and on-going challenge is to develop methods to use this literature-derived data in research. The construction of adverse outcome pathways (AOPs), for example, presents the text-mining challenge of finding literature evidence for perturbation of molecular initiating events and downstream adverse outcomes, and then tracing evidence through the intermediary key events and their cellular, tissue, and organ relationships. Another example of an emerging research area in computational toxicology is chemical read-across. Read-across is a computational risk assessment methodology that seeks to infer the safety profile of a low-information chemical with structural neighbors that have more information – including the information mined from the literature. Each of these research directions in computational toxicology challenges us to employ our text-mining approaches in novel and effective ways.

Record Details:

Product Published Date: 04/06/2017
Record Last Revised: 03/14/2018
OMB Category: Other
Record ID: 339884