Science Inventory

harmonize-wq: Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats


Bousquin, J., L. Smith, N. Ilias, James Harvey, AND L. Harwell. harmonize-wq: Standardize, clean, and wrangle Water Quality Portal data into more analytic-ready formats. U.S. Environmental Protection Agency, Washington, DC, 2022.


This python code package 'harmonize-wq' provides a robust but flexible framework with functions to harmonize, clean and wrangle Water Quality Portal data in a less time-intensive, more standardized, and more reproducible way. Results leverage existing data to help inform resource management priorities for improving designated uses in coastal waters and reducing beneficial use impairments at site and estuary scales. When integrated with other tools in the water quality data pipeline it can allow the user to combine datasets and make more informed inferences about what the water quality may be outside of where and when NCCA samples were collected. 


The National Aquatic Resource Surveys (NARS) were designed to provide a statistically robust evaluation of the condition of the nation's waters. Research is needed to adapt these national datasets to better address state and regional resource management more local needs, particularly the prioritization of restoration and monitoring efforts. The Water Quality Portal (WQP) is a data warehouse that facilitates access to water quality, biological, and physical data provided by state environmental agencies, the EPA, other federal agencies, universities, private citizens, and other organizations in a common format. Included in that data is NARS National Coastal Condition Assessment results. However, given the variety of data and variety of data originators, using the data in local analysis often requires data cleaning to ensure it meets the required quality standards and data wrangling to get it in a more analytic-ready formats. This python code package 'harmonize-wq' provides a robust but flexible framework with functions to harmonize, clean and wrangle the data in a less time-intensive, more standardized, and more reproducible way. Results can leveraging existing data to help inform resource management priorities for improving designated uses in coastal waters and reducing beneficial use impairments at site and estuary scales. When integrated with other tools in the water quality data pipeline it can allow the user to combine datasets and make more informed inferences about what the water quality may be outside of where and when NCCA samples were collected.  Methods to extrapolate estuarine condition from NCCA data to unsurveyed areas in GOM. Old CEMM ID 1.2.2.B

Record Details:

Product Published Date:08/11/2022
Record Last Revised:07/11/2024
OMB Category:Other
Record ID: 362113