Science Inventory

Searching for LINCS to Stress: using text-mining to automate reference chemical curation

Citation:

Chambers, B., D. Basili, L. Taylor, N. Baker, A. Middleton, R. Judson, AND I. Shah. Searching for LINCS to Stress: using text-mining to automate reference chemical curation. Society of Toxicology 62nd Annual Meeting and ToxExpo 2023, Nashville, TN, March 19 - 23, 2023. https://doi.org/10.23645/epacomptox.22731980

Impact/Purpose:

Poster presented to the Society of Toxicology 62nd Annual Meeting and ToxExpo March 2023 will allow for dissemination of a large stress response chemical data base. 

Description:

Adaptive stress response pathways (SRPs) restore cellular homeostasis after chemical exposure. SRPs also activate terminal programs if cellular disruption exceeds adaptive thresholds and are implicated in various diseases (e.g., type II diabetes and neurodegeneration). To investigate SRP activity from high-throughput data, we developed a text-mining pipeline for annotating hundreds of chemicals associated with canonical SRPs. First, we identified 129 candidate reference chemicals used in published biomarker studies and annotated them based on consensus with six SRPs, including DNA damage (DDR), heat shock (HSR), hypoxia (HPX), metal stress (MSR), oxidative stress (OSR), and unfolded protein (UPR). Second, we used information retrieval from PubMed to find 123,696 abstracts containing cooccurrences of terms for 129 chemicals and six SRPs. Since term cooccurrence frequency is not a reliable proxy for biological relationships, we calculated pairwise mutual information (PMI), an information-theoretic measure, for finding relevant chemical-SRP relationships. The area under the receiver operator curve (AUROC) performance of PMI for classifying the 129 chemicals to the expert-assigned SRP classes was: 0.91/DDR, 0.83/HPX 0.96/HSR,0.93/MSR, 0.86/OSR and 0.82/UPR, with a mean AUROC of 0.88. The maximum PMI score correctly classified the expert-curated SRP annotation for 82% of chemicals, while the top two PMI scores matched 96% of annotations. We also identified PMI score thresholds for classifying cooccurrences found in PubMed abstracts as relevant chemical-SRP relationships. Fourth, we applied this text-mining pipeline to 4,671 chemicals from the Library of Integrated Network-based Cellular Signatures (LINCS) and found 1,206 chemicals with putative SRP relationships (based on PMI scores exceeding the thresholds from the candidate reference chemicals). An independent analysis of the L1000 transcriptomic profiles shows that the 1,206 chemicals in LINCS cluster by putative SRP class. Our findings suggest that text mining based on PMI scores for chemical and SRP term cooccurrences in PubMed abstracts is a powerful approach for building high-quality annotated data sets to solve computational toxicology problems.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ POSTER)
Product Published Date:03/23/2023
Record Last Revised:05/15/2023
OMB Category:Other
Record ID: 357839