Science Inventory

Assessing author willingness to enter study information into structured data templates as part of the manuscript submission process: A pilot study


Wilkins, A., P. Whaley, A. Persad, I. Druwe, J. Lee, MicheleM Taylor, A. Shapiro, N. Blanton, C. Lemeris, AND K. Thayer. Assessing author willingness to enter study information into structured data templates as part of the manuscript submission process: A pilot study. Heliyon. Elsevier B.V., Amsterdam, Netherlands, 8(3):1-9, (2022).


The process of summarizing study methods and results (referred to as data extraction) is one of the most time and resource intensive aspects of conducting a systematic review. Thus, there is keen interest in assessing the extent to which this process could be automated or semi-automated by using natural language processing (NLP) methods. Realistically, full automation of data extraction seems unlikely in the near-term given non-uniform formatting across most papers published to date. It may, therefore, be of value to encourage investigators to submit structured summaries of their methods and results in a standardized format with metadata tagging of content (“semantic authoring”) during the publication process. Once a critical mass of published journal articles presenting similarly structured data (structured data repositories) is reached, this builds the base necessary for implementing automated data extraction for a scientific discipline. Structured data repositories also increase the ability to access, share and integrate data (for example within or across different studies, and with other tools such as visualization software programs). Authors play a key role in preparing structured data and contributing to structured data repositories; thus, a small pilot study was conducted to ascertain authors’ attitudes toward and experience with entering extracted data and information into a structured format. Evaluating authors’ attitude, experience and barriers to providing data in a structured template brings the state-of-the-art a step closer to achieving automated data extraction. The work to conduct this pilot included recruiting participants, preparing extraction instructions, administering short pre- and post-extraction surveys and summarizing results. This pilot study would be of interest to human health risk assessors, data and social scientists and publishing companies.


Background Environmental health and other researchers can benefit from automated or semi-automated summaries of data within published studies as summarizing study methods and results is time and resource intensive. Automated summaries can be designed to identify and extract details of interest pertaining to the study design, population, testing agent/intervention, or outcome (etc.). Much of the data reported across existing publications lack unified structure, standardization and machine-readable formats or may be presented in complex tables which serve as barriers that impede the development of automated data extraction methodologies. As full automation of data extraction seems unlikely soon, encouraging investigators to submit structured summaries of methods and results in standardized formats with meta-data tagging of content may be of value during the publication process. This would produce machine-readable content to facilitate automated data extraction, establish sharable data repositories, help make research data FAIR, and could improve reporting quality. Objectives A pilot study was conducted to assess the feasibility of asking participants to summarize study methods and results using a structured, web-based data extraction model as a potential workflow that could be implemented during the manuscript submission process. Methods Eight participants entered study details and data into the Health Assessment Workplace Collaborative (HAWC). Participants were surveyed after the extraction exercise to ascertain 1) whether this extraction exercise will impact their conducting and reporting of future research, 2) the ease of data extraction, including which fields were easiest and relatively more problematic to extract and 3) the amount of time taken to perform data extractions and other related tasks. Investigators then presented participants the potential benefits of providing structured data in the format they were extracting. After this, participants were surveyed about 1) their willingness to provide structured data during the publication process and 2) whether they felt the potential application of structured data entry approaches and their implementation during the journal submission process should continue to be further explored. Conclusions Routine provision of structured data that summarizes key information from research studies could reduce the amount of effort required for reusing that data in the future, such as in systematic reviews or agency scientific assessments. Our pilot study suggests that directly asking authors to provide that data, via structured templates, may be a viable approach to achieving this: participants were willing to do so, and the overall process was not prohibitively arduous. We also found some support for the hypothesis that use of study templates may have halo benefits in improving the conduct and completeness of reporting of future research. While limitations in the generalizability of our findings mean that the conditions of success of templates cannot be assumed, further research into how such templates might be designed and implemented does seem to have enough chance of success that it ought to be undertaken.

Record Details:

Product Published Date:03/01/2022
Record Last Revised:06/01/2022
OMB Category:Other
Record ID: 354882