Grantee Research Project Results
Machine Learning Toolkit for Academic and Grey Literature Screening
EPA Contract Number: 68HERC24C0008Title: Machine Learning Toolkit for Academic and Grey Literature Screening
Investigators: Mintas, Constantine
Small Business: VISIMO, LLC
EPA Contact: Richards, April
Phase: II
Project Period: October 17, 2023 through October 16, 2025
Project Amount: $397,332
RFA: Small Business Innovation Research (SBIR) Phase II (2024) Recipients Lists
Research Category: Small Business Innovation Research (SBIR)
Description:
ChemGrey, VISIMO’s machine-learning (ML) toolkit for academic and grey literature, will facilitate efficient and transparent collection, tagging, and screening of relevant sources during the systematic review (SR) process. ChemGrey will enable researchers to filter full documents of all types and formats, identify relationships between documents, determine source relevance, perform improved meta- tagging, provide accurate duplicate removal, process new inputs in real-time, and tailor the document relevancy algorithm to the needs of each researcher. By building a tool to locate and filter both academic and grey literature, VISIMO can reduce the time required to conduct SRs, increase the accuracy of SRs, and improve the reproducibility of SRs. The improvements provided by ChemGrey will strengthen the quality of SRs, increase the number of SRs conducted, and generate improvements in human and environmental health. While ChemGrey can filter academic literature, its focus is to reduce the burden of grey literature collection and screening, an often arduous process for researchers. While there are commercialized tools that address academic literature in a generalized manner, ChemGrey is document-agnostic and capable of processing grey literature just as accurately and as easily as academic literature.
Identification and incorporation of additional relevant grey sources will foster more comprehensive reviews, reduce publication bias, and increase scientific integrity to ensure science-based decision making. Furthermore, ChemGrey is designed to continually optimize using feedback from the end user, allowing the tool to be tailored to the needs of each researcher. During the Phase I effort, VISIMO successfully developed a proof-of-concept (PoC) of ChemGrey with statistically significant results. During Phase II, VISIMO will further refine the ML pipeline, enhance the user interface, increase the toolkit’s transparency and interpretability, and add data gathering capabilities. VISIMO will perform iterative alpha and beta testing to continue to refine the tool, supported by researchers at multiple organizations and universities. While ChemGrey is designed to meet EPA needs for chemical risk assessments, the toolkit can be adapted to other disciplines, including medical science, food and veterinary science, agricultural science, and pharmacology.
SBIR Phase I:
Machine Learning Toolkit for Grey Literature Screening | Final ReportThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.