Grantee Research Project Results
2020 Progress Report: Drinking water vulnerability and neonatal health outcomes in relation to oil and gas production in the Appalachian Basin
EPA Grant Number: CR839249Title: Drinking water vulnerability and neonatal health outcomes in relation to oil and gas production in the Appalachian Basin
Investigators: Deziel, Nicole Cardello , Saiers, James E. , Bell, Michelle L. , Ma, Xiaomei , Plata, Desiree , Warren, Joshua
Institution: Yale University
EPA Project Officer: Hahn, Intaek
Project Period: September 1, 2017 through August 31, 2020 (Extended to August 31, 2021)
Project Period Covered by this Report: September 1, 2019 through August 31,2020
Project Amount: $1,998,515
RFA: Oil and Gas Development in the Appalachian Basin (2016) RFA Text | Recipients Lists
Research Category: Water , Human Health
Objective:
This project, which we refer to as the WATer and Energy Resources Study (WATER Study), has three objectives: (1) advance a modeling framework to estimate drinking-water vulnerability to contamination by unconventional oil and gas (UOG) activities; (2) evaluate the vulnerability framework by comparing its predictions with water-quality measurements collected from households in the Appalachian Basin; and (3) investigate the associations between exposure to UOG-related water contaminants and adverse neonatal outcomes in Pennsylvania (PA) and Ohio (OH) using our vulnerability index as well as traditional exposure surrogates, while also accounting for other UOG stressors and factors related to social disadvantage (e.g., income, education).This project, which we refer to as the WATer and Energy Resources Study (WATER Study), has three objectives: (1) advance a modeling framework to estimate drinking-water vulnerability to contamination by unconventional oil and gas (UOG) activities; (2) evaluate the vulnerability framework by comparing its predictions with water-quality measurements collected from households in the Appalachian Basin; and (3) investigate the associations between exposure to UOG-related water contaminants and adverse neonatal outcomes in Pennsylvania (PA) and Ohio (OH) using our vulnerability index as well as traditional exposure surrogates, while also accounting for other UOG stressors and factors related to social disadvantage (e.g., income, education).
Progress Summary:
The WATER Study team has made considerable progress toward fulfillment of Objectives 1 through 3. With respect to Objective 2, we were able to not only meet the current goals but expand the scope of them to include a third state in the Appalachian Basin, West Virginia (WV). Notably, this expansion also required adjustment of our protocols to fulfill this objective during the extraordinary circumstances posed by the COVID-19 pandemic. These activities and related accomplishments, as organized by project objective, are summarized in the following paragraphs.
2.1 Objective 1: Drinking-Water Vulnerability Assessment
2.1.1 Physically Based Vulnerability Modeling
We are implementing a capture-zone approach to evaluate the vulnerability of residential drinking-water supplies to contamination by UOG development. A capture zone represents the contributing area of a groundwater well; in other words, it is the portion of an aquifer from which the well draws its water. We are using hydrologic models to simulate the capture zones of residential drinking-water wells and estimate vulnerability based on their proximity to UOG infrastructure (e.g., well pads). The development and an application of this framework are described in Soriano et al. (2020), and a variation of the approach is implemented in a recently submitted manuscript (Xiong et al. in review). Work towards advancing and testing this vulnerability framework has involved four major tasks that began at the outset of the project and have continued through this past year:
Database creation. A geographic information system (GIS) database was assembled for northeastern PA. The data were drawn from various agencies, including the PA Department of Environmental Protection (PADEP), US Geological Survey (USGS), and Susquehanna River Basin Commission. The database comprises thematic layers of local hydrogeology, topography, climatology, domestic and monitoring well locations, UOG well pads and other infrastructure. Satellite data and aerial photographs were used to characterize the spatial distribution in surface lineaments, which, in turn was used to infer anisotropy in aquifer permeability and potential zones of naturally occurring fractures. Data on the yields and specific capacity of 2,500 water wells within northeastern PA were retrieved and screened for reporting errors. Regression models were used to estimate groundwater discharge to streams and rivers of the study region. The data on well yields and groundwater discharge to streams were used to inform and constrain calibration of the groundwater flow model (see below).
Model development. Vulnerability assessments are based on computations of hydrologic models that simulate coupled subsurface flow and solute transport. Two types of flow and transport models were developed – an equivalent porous medium (EPM) model and a discrete-fracture network (DFN) model. Both of these models are fully three-dimensional and were constructed using a using a robust finite element hydrologic simulator. Over the past year, these models were subjected to extensive testing and refinement to ensure numerical accuracy and reasonable representation of the groundwater system. The EPM and DFN models described in Soriano et al. (2020) were vertically discretized into 21 sublayers with the resulting computational mesh consisting of over 21 million triangular elements. The model accounts for groundwater and solute exchange with streams, rivers, and lakes within the study domain; the effects of local and regional topography on flow patterns; and aquifer stresses imposed by water-supply wells. Both the EPM and DFN models were parallelized to run efficiently using Yale’s high-performance computing cluster.
Model calibration. The hydrologic models were calibrated using the pilot points approach, wherein the aquifer properties are assigned at a set of points in the domain and subsequently mapped to the model elements through geostatistical interpolation. This scheme was previously shown to be particularly well-suited for characterizing well capture zones under varying degrees of subsurface heterogeneity. For both the EPM and HYB models, aquifer properties were estimated for 136 pilot points that were distributed in the model domain following published guidelines. A Tikhonov regularization scheme with singular value decomposition was implemented to ensure hydrogeological reasonableness and stability of the inversion process. Model calibration was accomplished with the Gauss-Marquardt-Levenberg optimization scheme implemented in the PEST++ software suite. Calibration targets were historical observations of hydraulic head from the USGS National Water Information System (NWIS), heads determined from driller’s water well logs in PaGWIS, and groundwater discharge along stream reaches as estimated from streamflow regression equations. We examined the relative importance of the different model parameters on capture-zone geometry and well vulnerability by conducting a global sensitivity analysis.
Estimates of drinking-water vulnerability. The calibrated models have been used to quantify drinking-water vulnerability in northeastern PA. In 2020, we expanded our initial assessment from 220 to 316 drinking-water wells. Using the two alternative models and a precautionary paradigm to integrate their results, we find that most, but not all, of the domestic wells within the model domain have low vulnerability as the extent of their probabilistic capture zones are smaller than distances to the nearest existing UOG well pad. Nevertheless, capture zones generally exceed the mandatory setback distance between UOG and groundwater wells in PA, suggesting that existing regulations may not be adequate to safeguard water quality over time. Our framework offers a physically based alternative to existing risk assessment approaches that utilize simple proximity metrics between UOG sources and groundwater receptors.
2.1.2 Vulnerability Assessment through Physics-Informed Machine Learning
In 2020, we began exploring the application of a physics-informed machine learning (ML) approach to further elucidate the factors governing well vulnerability to contamination and to alleviate the computational demands of our physically based (PB) modeling framework. Specifically, we are training ML models to learn functional input-output relationships from the PB model; this approach is also known as metamodeling. We recast the vulnerability question as a classification problem: a domestic well is designated as vulnerable (class: V) if V ≥ 0.001 and as non-vulnerable (class: NV) otherwise. The objective is to predict the vulnerability class of a domestic well given a set of input variables that are readily available or easy to quantify by using geographic information systems. These input variables include metrics characterizing proximity of a domestic well to UOG sites and various indices describing topography, hydrology, and landscape position. We have adopted an ensemble tree ML algorithm called Conditional Inference Forest (CIF), a variant of the popular Random Forest (RF) algorithm, that is well-suited for applications involving highly correlated predictors. CIF conducts tests of association in its tree-building process, making it useful for understanding underlying relationships and guarding against spurious correlations.
The ML training/testing dataset used in our analysis consists, in part, of vulnerability estimates for 316 domestic wells in Bradford County that were computed with the PB models reported by Soriano et al. (2020). In order to better encompass field conditions, we augmented the training/testing dataset with “seed wells” that were generated by dividing the PB model domain into a uniform 2 km x 2 km grid and randomly distributing 20 seed wells in each grid cell. We have thus far completed one-hundred CIF realizations, with each run employing 3000 trees. The average metamodel accuracy from the training/testing dataset is 97.1% (min: 94.9%, max: 98.7%). The high accuracy indicates that the metamodels successfully emulates the PB model behavior and is on par with other ML methods. Analysis of the conditional importance shows that the most important variables for predicting vulnerability class are metrics quantifying proximity to UOG sources. Metrics such as topographic position index and distance from stream to divide are also highly important predictors.
We have used the trained model to make predictions of vulnerability class for the 94 WATer Study wells in Bradford County that were sampled in the summer of 2018. We have found statistically significant differences between the V and NV groundwater classes for constituents associated with UOG wastewaters, specifically, Ba, Br, Li, Cl, and gasoline range organics. We have also compared geochemical frameworks that utilize elemental mass ratios to identify groundwater contaminated by UOG, and have found that the majority of the WATer Study samples exhibiting geochemical signatures similar to shale gas wastewaters are predicted to be in class V.
Our findings to date suggest that the physics-informed ML approach for vulnerability complements traditional geochemical fingerprinting techniques, and thus, may serve as a useful additional line of evidence in retrospective studies investigating suspected cases of contamination. The approach may also provide insights for designing long-term prospective studies (e.g., by identifying optimal locations for monitoring wells) or for regulating UOG development to protect groundwater (e.g., by minimizing new development in areas where there are already clusters of vulnerable water wells).
2.2 Objective 2: Drinking-Water Sampling and Analysis
The WATer Study team successfully completed all water collection, chemical analysis, and participant result reports for two sampling campaigns: (1) Bradford County, PA (n=94 homes in 2018) and (2) Belmont and Monroe Counties, OH (n=161 in 2019). Home visits involved administration of a detailed questionnaire to a head-of-household, followed by collection of untreated well water and treated water for those homes with water softening or other treatment systems, and collection of GPS coordinates of the homes and drinking-water wells. Analysis of the 2018 and 2019 water samples for nearly 100 inorganic and organic analytes is in progress.
Informed by the data from the PA and OH sampling campaigns, our team determined that we were able to meet study objectives in those locations and sought to extend the study objectives to a broader geographic scope. We incorporated the state of West Virginia into our data and sample collection activities to provide a more comprehensive picture of the potential impacts to water quality of UOG development in the Appalachian region.
Protocols, Data Collection Instruments, Training, and Institutional Approvals. To prepare for limited personal contact with participants in our third sampling wave, we modified our protocols to ensure consistency and standardization of data collection with prior waves while reducing participant contact. We conducted more data collection activities remotely or via phone with study subjects, practiced social distancing during home visits, and required face coverings for study staff and participants. Our sample collection activities in this wave were conducted outside the home (i.e., at the outdoor spigot or well) and therefore were accomplished at safe distances from residents.
Stakeholder Engagement. The study website (http://waterstudy.yale.edu) was updated to disseminate information about the study to federal and state agencies, as well as to the general public. A sample copy of the participant drinking water result reports was provided to these stakeholders as well. A new website section for study publications has been added. To further our outreach, a member of our team, Helen Siegel, MS, gave a presentation to the Ohio State University Extension Energy Outreach Program entitled” Outreach Extension about our study entitled “Investigating Groundwater Quality in Regions of Energy Development within the Appalachian Basin.”
Participant Recruitment. To recruit participants, informational postcards were created and mailed to households in several West Virginia zip codes. Our call center at Yale was staffed to respond to inquiries from potential participants, screen individuals for study eligibility, and schedule household visits with eligible participants.
Household Water Collection and Surveys. Home visits commenced in September 2020 and were conducted by one sampling team of 2-3 in-person data collectors and 1 remote data collector. Informed consent and a 50-question survey were conducted via phone. The survey purpose was to gain demographic information, as well as information on home and drinking-water characteristics. The home visit consisted solely of outdoor water collection, led by one water sampler with at least one other person to assist. A total of 56 home visits were completed in September-October 2020.
Laboratory Analyses. Water samples collected in 2018 and 2019 were analyzed for major cations, major anions, trace metals, dissolved organic carbon (DOC), and dissolved in organic carbon (DIC) at Yale University. Samples were analyzed at MIT to quantify levels of more than 60 different volatile- and semi-volatile organic compounds (VOCs and SVOCs), integrated gasoline range organic (GRO) compounds, methane, ethane, propane, as well as a preliminary screen of and diesel range organic compounds in order to prioritize further compound-specific analysis. Additional samples were collected during the 2019 sampling campaign were analyzed for dissolved noble gases. All chemical analyses of 2018 and 2019 water samples have been completed. Analyses of 2020 samples are projected for completion by March 2021. Both the Yale and MIT laboratories have also developed new procedures and policies to protect the health and safety of the study team while minimizing disruptions to the research activities.
Data Cleaning, Management, and Quality Assurance. The data management team has implemented procedures to ensure complete, accurate, and high-quality datasets. Questionnaires and protocols were reviewed for clarity, consistency, and utility of response data. Codebooks were created for each data collection instrument. Survey data collected in the field on paper forms were manually checked and hand-coded to identify errors or missing responses. Data from paper forms were entered into electronic databases, which were designed with variable restrictions to minimize errors. After data entry, algorithms were run to edit check data for inconsistencies, missing values and response error to ensure data validity. Errant data were corrected wherever possible with documentation. All data cleaning and processing for the 2018 and 2019 sampling campaigns has been completed. Cleaning and processing of the 2020 questionnaire, GPS, and collection form data is nearly complete. The final datasets are stored on secure servers at Yale University.
Report-Back to Participants. All PA participants of the first sampling campaign were mailed a detailed report of the results of the water samples collected at their home in March 2019. All OH participants in the second field sampling campaign were mailed their reports in March 2020. Reports were designed to be accessible to a lay audience. Reports included both text and color-coded tables with a legend and instructions on how to interpret the report. The report included Frequently Asked Questions and links to additional resources and study contact information. All WV participants in the third wave of sampling will receive their water reports in spring 2021.
UOG Spatial Metrics Calculation. Using a geographic information systems analysis, we constructed several spatial metrics that capture proximity or density of UOG wells in relation to participant residences: number of UOG wells within a buffer around the home, distance to nearest UOG well, inverse distance weighted well count, and inverse distance-squared weighted well count. Locations of homes were obtained by interviewers who collected latitude and longitude at each home. Locations of UOG wells were obtained from PA Department of Environmental Protection Office of Oil and Gas Management, Ohio Department of Natural Resources Division of Oil and Gas, and West Virginia Department of Environmental Protection Office of Oil and Gas. Additional metrics that incorporate other UOG well attributes and hydrogeology are being developed.
Data Analysis. Multi-disciplinary analysis of the water sampling data from 2018 and 2019 is underway, and several manuscripts are in progress. Some of the data are included in Soriano et al. (2020) and Xiong et al. (in review).
2.3 Objective 3: Epidemiologic Analysis of Neonatal Health Outcomes
Overview. We are evaluating whether potential exposure to UOG-related water contaminants (as captured by the vulnerability index and traditional surrogate metrics) is associated with incidence of adverse birth outcomes in the PA and OH study area, while accounting for the potential influence of socioeconomic and other chemical and non-chemical stressors on adverse birth outcomes. To carry out this aim, we have built a birth cohort study database by assembling data from numerous sources. This effort required addressing spatial and temporal misalignments as data have different spatial and temporal resolutions and spatial units. It also required harmonizing variables over time and location, as information recorded on birth certificates differed by year, state, and hospital.
Health Outcome Data. All health data have been cleaned, processed, and standardized to harmonize key variables for which collection or coding changed across years. Birth certificate data were obtained from PA and OH from the years 2002-2017.
UOG Exposure Data. Data pertaining to unconventional oil and gas well locations, permit, spud, and completion dates have been obtained from relevant state agencies and cleaned and assembled. Well permit and production datasets were obtained from PA DEP (2000-2020), OH DNR (1924-2019), and WV DEP (1985-2019). These datasets included information such as location, type (e.g. oil or gas), target formation (i.e. unconventional or unconventional), spud date, production dates, and production volumes per reporting period. Maternal residential address at birth was obtained and geocoded from birth records. Birth address will be used to assign UOG exposures using a series of spatial surrogates at several buffer sizes (1 km, 2 km, 5 km, and 10 km). The exposure time window of interest was January 1, 2008 to December 31, 2017.
Sociodemographic Data. Sociodemographic data from the US Census (decennial Census and American Community Survey) have been obtained, cleaned, and assembled. Community-level variables related to income, education, employment, racial and ethnic distributions, age distributions, and others will be assigned to each birth residence.
Potential Covariates and Confounders. We are obtaining data on other possible environmental sources related to agricultural land use and air pollution.
Statistical Analysis: Sociodemographic and UOG-Related Data
While the health outcome data was being processed, we conducted a study utilizing the sociodemographic and UOG data, which complements and provides context for the exposure and health studies. We assessed associations between county-level socio-economic and demographic factors, oil and gas drilling, and three outcomes: number of oil and gas complaints filed by Pennsylvania (PA) residents and both the number and proportion of PA Department of Environmental Protection water supply investigations resulting in positive water supply determinations (i.e., water quality impairments). UOG development may be differentially distributed based on socio-economic or demographic factors. Filing complaints with state agencies is one mechanism by which citizens can register concerns and seek investigations. Understanding inequalities in reporting or response can inform surveillance, resource distribution, and identification of impacted communities. We used hierarchical Bayesian Poisson regression to calculate rate ratios (RR) and 95% credible intervals (CI) for count data and binomial regression to calculate odds ratios (OR) and 95% CI for proportions of positive determinations. Relationships between several socio-economic and demographic factors and complaints and determinations suggested potential environmental and procedural inequities for future investigation. These analyses and results are included in Clark et al. (in review).
Statistical Analysis: Etiological analyses
Exploratory and preliminary analyses with the current dataset are in progress.
Future Activities:
4.1 Objective 1: Vulnerability Assessment
We will continue development of physically based models suitable for describing groundwater flow and chemical transport through aquifers that lie beneath our study areas in OH and WV. Once tested and calibrated, these models will form the basis of vulnerability predictions for the 160 drinking water wells sampled during our OH field campaign and the 56 wells sampled during our WV field campaign. Although the general modeling framework used for PA (see above) will be transferable to OH and WV, we expect that differences in hydrologic setting will lead to notable differences in the characteristics of drinking-water vulnerability between the three regions. Using results from these model simulations together with our drinking-water quality measurements, we will refine our machine-learning approach for vulnerability assessment and extend its geographic coverage within the Appalachian Basin.
4.2 Objective 2: Drinking-Water Sampling and Analysis
We will complete laboratory analysis for the 2020 field study and mail participants their individual reports of their drinking water test results. We will continue data analysis to understand the distributions of chemicals in drinking water and how they compare to traditional proximity metrics and the newly developed vulnerability model. We will continue to conduct source apportionment evaluations. We also plan to expand our chemical analyses of WV drinking-water samples to include quantification of concentrations of per- and polyfluoroalkyl substances (PFAS).
4.3 Objective 3: Epidemiologic Analysis of Neonatal Health Outcomes
We will complete construction of our data architecture by including final covariates and confounders. We will identify our cases with respect to the relevant birth outcomes. We will select our controls. We will assign all subjects a UOG exposure estimate first by using traditional, simple proximity metrics while the vulnerability model is being calibrated and further developed. We will conduct our etiologic analyses.
Journal Articles on this Report : 2 Displayed | Download in RIS Format
Other project views: | All 23 publications | 4 publications in selected types | All 4 journal articles |
---|
Type | Citation | ||
---|---|---|---|
|
Soriano Jr MA, Siegel HG, Gutchess KM, Clark CJ, Li Y, Xiong B, Plata DL, Deziel NC, Saiers JE. Evaluating Domestic Well Vulnerability to Contamination From Unconventional Oil and Gas Development Sites. Water Resources Research 2020;56(10):e2020WR028005. |
CR839249 (2020) CR839249 (Final) |
Exit |
|
Deziel NC, Brokovich E, Grotto I, Clark CJ, Barnett-Itzhaki Z, Broday D, Agay-Shay K. Unconventional oil and gas development and health outcomes:A scoping review of the epidemiological research. Environmental Research 2020;182:109124. |
CR839249 (2020) CR839249 (Final) |
|
Supplemental Keywords:
drinking water, vulnerability index, epidemiology, environmental exposures, fate and transport, children’s health, spatial surrogatesRelevant Websites:
Progress and Final Reports:
Original AbstractThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.