Science Inventory

Data Annotation and Migration Across Systematic Review Tools


Angrish, M. Data Annotation and Migration Across Systematic Review Tools. 10th Annual Meeting of the ASCCT, virtual, NA -Virtual, October 12 - 14, 2021.


Automatic update of the information supporting an assessment is limited by manual finding, uploading, and migration of information between various systematic review tools. This is a particular challenge in the development of chemical assessments where a rapid compilation of new information and/or update to existing information is needed in a pre-decisional, regulatory context. Keeping these influential SEMs updated with the latest relevant research is time-consuming and labor-intensive. Our goal is to use machine learning models to reduce this ongoing effort.


Automatic update of the information supporting an assessment is limited by manual finding, uploading, and migration of information between various systematic review tools. This is a particular challenge in the development of chemical assessments where a rapid compilation of new information and/or update to existing information is needed in a pre-decisional, regulatory context. Artificial intelligence was previously used to rapidly screen >40K studies for ~150 perfluoroalkyl substances and manually extracted data were summarized in systematic evidence maps (SEMs). Keeping these influential SEMs updated with the latest relevant research is time-consuming and labor-intensive. Our goal is to use machine learning models to reduce this ongoing effort. However, creating such models first requires detailed, machine-readable annotation of datasets to label entities of interest. The structured data extraction templates supporting DistillerSR for the PFAS SEM were used to create entities (labels for text) that were manually annotated from the titles, abstracts, methods, and results sections from 67 animal toxicology PDFs using the FIDDLE Extraction Workbench. A total of >24K annotations were generated from 12 entities across the corpus. Pilots targeting the migration of annotated data outputs revealed the needs that are the focus of current work. These include  mapping  annotated terms that do not follow convention (e.g. linking to controlled vocabularies and ontologies),piloting annotation tool grouping and relating annotations according to predefined schema, and input/output formats that support interoperability between tools. These views do not necessarily reflect those of the US EPA.

Record Details:

Product Published Date:10/12/2021
Record Last Revised:05/27/2022
OMB Category:Other
Record ID: 354836