Science Inventory

Scientific Data Management in the Age of Big Data: An Approach Supporting a Resilience Index Development Effort

Citation:

Harwell, L., D. Vivian, M. McLaughlin, AND S. Hafner. Scientific Data Management in the Age of Big Data: An Approach Supporting a Resilience Index Development Effort. Frontiers in Environmental Science. Frontiers, Lausanne, Switzerland, 7(Article 72):13, (2019). https://doi.org/10.3389/fenvs.2019.00072

Impact/Purpose:

Big data are a valuable commodity in indicator research and development. However, researchers can expend as much as 50-80% of their efforts in collecting, wrangling, and formatting secondary data, often without the benefit of sufficient knowledge to manage and curate research data assets effectively. As part of the research strategy, the Climate Resilience Screening Index (CRSI) team implemented tangible steps to accommodate the different levels of data management skills, balance the data workload, and increase data management and curation capacity within the CRSI research team. This manuscript describes our approach for managing and curating secondary data to support transparent and reproducible CRSI research.

Description:

The increased availability of publicly available data is, in many ways, changing our approach to conducting research. Not only are cloud-based information resources providing supplementary data to bolster traditional scientific activities (e.g., field studies, laboratory experiments), they also serve as the foundation for secondary data research projects such as indicator development. Indicators and indices are a convenient way to synthesize disparate information to address complex scientific questions that are difficult to measure directly (e.g., resilience, sustainability, well-being). In the current literature, there is no shortage of indicator or index examples derived from secondary data with a growing number that are scientifically focused. However, little information is provided describing the management approaches and best practices used to govern the data underpinnings supporting these efforts. From acquisition to storage and maintenance, secondary data research products rely on the availability of relevant, high-quality data, repeatable data handling methods and a multi-faceted data flow process to promote and sustain research transparency and integrity. The U.S. Environmental Protection Agency recently published a report describing the development of a climate resilience screening index which used over one million data points to calculate the final index. The pool of data was derived exclusively from secondary sources such as the U.S. Census Bureau, Bureau of Labor Statistics, Postal Service, Housing and Urban Development, Forestry Services and others. Available data were presented in various forms including portable document format (PDF), delimited ASCII and proprietary format (e.g., Microsoft Excel, ESRI ArcGIS). The strategy employed for managing these data in an indicator research and development effort represented a blend of business practices, information science, and the scientific method. This paper describes the approach, highlighting key points unique for managing the data assets of a small-scale research project in an era of “big data.”

Record Details:

Record Type:DOCUMENT( JOURNAL/ PEER REVIEWED JOURNAL)
Product Published Date:06/04/2019
Record Last Revised:07/13/2020
OMB Category:Other
Record ID: 349313