Grantee Research Project Results
Final Report: Drinking water vulnerability and neonatal health outcomes in relation to oil and gas production in the Appalachian Basin
EPA Grant Number: CR839249Title: Drinking water vulnerability and neonatal health outcomes in relation to oil and gas production in the Appalachian Basin
Investigators: Deziel, Nicole Cardello , Saiers, James E. , Bell, Michelle L. , Ma, Xiaomei , Plata, Desiree , Warren, Joshua
Institution: Yale University
EPA Project Officer: Hahn, Intaek
Project Period: September 1, 2017 through August 31, 2020 (Extended to August 31, 2021)
Project Amount: $1,998,515
RFA: Oil and Gas Development in the Appalachian Basin (2016) RFA Text | Recipients Lists
Research Category: Water , Human Health
Objective:
This project, which we refer to as the WATer and Energy Resources Study (WATer Study), has three objectives: (1) advance a modeling framework to estimate drinking-water vulnerability to contamination by unconventional oil and gas (UOG) activities; (2) characterize household drinking-water quality within portions of the Appalachian Basin that host UOG development and interpret these measurements in the context of proximate UOG activity and potential sources of contamination unrelated to UOG; and (3) investigate the associations between exposure to UOG-related water contaminants and adverse neonatal outcomes in Pennsylvania (PA) and Ohio (OH), while also accounting for other UOG stressors and factors related to social disadvantage (e.g., income, education).
Summary/Accomplishments (Outputs/Outcomes):
2.1 Objective 1: Drinking-Water Vulnerability Assessment
2.1.1 Physically Based Vulnerability Modeling
We implemented a capture-zone approach to evaluate the vulnerability of household drinking-water supplies to contamination by UOG development. A capture zone represents the contributing area of a groundwater well; in other words, it is the portion of an aquifer from which the well draws its water. We used hydrologic models to simulate the capture zones of residential drinking-water wells and estimate vulnerability based on their proximity to UOG infrastructure (i.e., well pads). The development and an application of this framework are described in detail in Soriano et al. (2020), and a variation of the approach is implemented in a submitted manuscript (Xiong et al., in review). Work towards advancing and testing this vulnerability framework involved four major tasks that began at the outset of the project.
Database creation. A geographic information system (GIS) database was assembled for PA. The data were drawn from various agencies, including the PA Department of Environmental Protection (PADEP), US Geological Survey (USGS), and Susquehanna River Basin Commission. The database comprises thematic layers of local hydrogeology, topography, climatology, domestic well locations, UOG well pads, and other infrastructure.
Model development. Estimates of well-water vulnerability were based on computations of hydrologic models that simulate coupled subsurface flow and solute transport. Two types of flow and transport models were developed – an equivalent porous medium (EPM) model and a discrete-fracture network (DFN) model. Both models are fully three-dimensional and were constructed using a using a robust finite element hydrologic simulator. These models were subjected to extensive testing and refinement to ensure numerical accuracy and reasonable representation of the groundwater system (Soriano et al. 2020).
Model calibration. The hydrologic models were calibrated using the pilot points approach, wherein the aquifer properties are assigned at a set of points in the domain and subsequently mapped to the model elements through geostatistical interpolation. For both the EPM and HYB models, aquifer properties were estimated for 136 pilot points that were distributed in the model domain following published guidelines. A Tikhonov regularization scheme with singular value decomposition was implemented to ensure hydrogeological reasonableness and stability of the inversion process. Model calibration was accomplished with the Gauss-Marquardt-Levenberg optimization scheme implemented in the PEST++ software suite. Calibration targets were historical observations of hydraulic head from the USGS National Water Information System (NWIS), heads determined from driller’s water well logs in PaGWIS, and groundwater discharge along stream reaches as estimated from streamflow regression equations. We examined the relative importance of the different model parameters on capture-zone geometry and well vulnerability by conducting a global sensitivity analysis.
Estimates of drinking-water vulnerability. The calibrated models were used to quantify drinking-water vulnerability of 316 drinking-water wells in northeastern PA. Using the two alternative models and a precautionary paradigm to integrate their results, we found that most, but not all, of the domestic wells within the model domain have low vulnerability as the extent of their probabilistic capture zones are smaller than distances to the nearest existing UOG well pad. Nevertheless, capture zones generally exceed the mandatory setback distance between UOG and groundwater wells in PA, suggesting that existing regulations may not be adequate to safeguard water quality over time. Our framework offers a physically based alternative to existing risk assessment approaches that utilize simple proximity metrics between UOG sources and groundwater receptors.
2.1.2 Vulnerability Assessment through Physics-Informed Machine Learning
We explored the application of a physics-informed machine learning (ML) approach to further elucidate the factors governing water-well vulnerability to contamination and to alleviate the computational demands of our physically based (PB) modeling framework. Specifically, we trained ML models to learn functional input-output relationships from the PB model according to an approach known as metamodeling. We recast the vulnerability question as a classification problem: a domestic well is designated as vulnerable (class: V) if V ≥ 0.001 and as non-vulnerable (class: NV) otherwise. The objective was to predict the vulnerability class of a domestic well given a set of input variables that are readily available or easy to quantify by using geographic information systems. These input variables included metrics characterizing proximity of a domestic well to UOG sites and various indices describing topography, hydrology, and landscape position. We adopted an ensemble tree ML algorithm called Conditional Inference Forest (CIF), a variant of the popular Random Forest (RF) algorithm, that is well-suited for applications involving highly correlated predictors. CIF conducts tests of association in its tree-building process, making it useful for understanding underlying relationships and guarding against spurious correlations.
The ML training/testing dataset used in our analysis consisted of vulnerability estimates for 316 domestic wells in Bradford County that were computed with the PB models reported by Soriano et al. (2020). In order to better encompass field conditions, we augmented the training/testing dataset with “seed wells” that were generated by dividing the PB model domain into a uniform 2 km x 2 km grid and randomly distributing 20 seed wells in each grid cell. We completed hundreds of CIF realizations, with each run employing 3000 trees. The average metamodel accuracy from the training/testing dataset was 97.1% (min: 94.9%, max: 98.7%). The high accuracy indicates that the metamodels successfully emulated the PB model behavior. A predictor combining information on topography, hydrology, and proximity to contaminant sources was found to be highly important for accurate metamodel predictions.
We used the trained metamodels to make predictions of vulnerability class for the 94 WATer Study wells in Bradford County that were sampled in the summer of 2018. We found statistically significant differences between the V and NV groundwater classes for constituents associated with UOG wastewaters, specifically, Ba, Br, Li, Cl, and gasoline range organics. We also compared geochemical frameworks that utilize elemental mass ratios to identify groundwater contaminated by UOG, and found that the majority of the WATer Study samples exhibiting geochemical signatures similar to shale gas wastewaters are predicted to be in class V.
These findings, which have been recently published in the Environmental Research Letters (Soriano et al., 2001), suggest that the physics-informed ML approach for vulnerability complements traditional geochemical fingerprinting techniques, and thus, may serve as a useful additional line of evidence in retrospective studies investigating suspected cases of contamination. The approach may also provide insights for designing long-term prospective studies (e.g., by identifying optimal locations for monitoring wells) or for regulating UOG extraction to protect groundwater (e.g., by minimizing UOG development near areas with vulnerable water wells).
2.1.3 Large-Scale, Multi-State Application of the Vulnerability Framework
Our most recent efforts involve expanding the vulnerability framework to accommodate a more substantial portion of the UOG-producing area within the Appalachian Basin. The upscaled domain for the analysis stretches across parts of OH, PA, and WV and encompasses an area of 104,000 km2, within which 1.5 million people are served by domestic groundwater. We followed the same overall approach outlined in sections 2.1.1 and 2.1.2 for the smaller domain in PA (190 km2), but made several modifications to make estimation of vulnerability across large spatial scales tractable. Physically based estimates of vulnerability were computed by implementing an iterative ensemble smoother algorithm (PESTPP-IES) with a physically based model for groundwater flow (MODFLOW) and particle tracking (MODPATH). While preliminary, the results show that 5-8% of the population served by domestic groundwater within counties that host at least 100 UOG wells live in households with water wells that are vulnerable to contamination by UOG activities. We have proceeded by using the vulnerability calculations of the physically based model to train and test machine-learning models. We find that a metamodel trained in one part of the 104,000 km2 domain can be used to predict vulnerability of water wells withing another part of the domain. Although more testing is needed (and is ongoing), it appears that our metamodels are generalizable and hence broadly applicable for probing water-related contaminant exposure pathways and informing policies to safeguard drinking-water resources within the context of UOG. We anticipate completion of a manuscript that describes this large-scale vulnerability assessment in early 2022.
2.2 Objective 2: Drinking-Water Sampling and Analysis
The WATER Study team successfully completed water collection, chemical analysis, and participant result reports for three sampling campaigns: (1) Bradford County, PA (n=94 homes in 2018); (2) Belmont and Monroe Counties, OH (n=161 in 2019); and (3) Doddridge, Marshall, Tyler, and Wetzel Counties, WV (n=58 in 2020). Home visits involved administration of a detailed questionnaire to a head-of-household, followed by collection of untreated well water and treated water for those homes with water softening or other treatment systems, and collection of GPS coordinates of the homes and drinking-water wells. Analysis of the 2018, 2019, and 2020 water samples for nearly 100 inorganic and organic analytes has been completed.
Stakeholder Engagement. A study website (http://waterstudy.yale.edu) was created and updated throughout the project period to disseminate information about the study to federal and state agencies, as well as to the general public. The project PIs informed the Project Officer and officials at state agencies, including the Pennsylvania Department of Environmental Protection, the Ohio Department of Natural Resources, the Ohio Department of Public Health, and the West Virginia Office of Epidemiology and Prevention Services, of the project team’s intention to collect household water samples in their regions.
Participant Recruitment. To recruit participants for the sampling campaigns, informational postcards were created and mailed to households in several zip codes within targeted counties of PA, OH, and WV. In addition, project staff posted informational flyers at local businesses, distributed flyers at community events, posted to websites of local organizations and community groups, and advertisements were placed in local newspapers. A Facebook page was also created as an outreach and recruitment tool. A call center at Yale was staffed to respond to inquiries from potential participants, screen individuals for study eligibility, and schedule household visits.
Household Water Collection and Surveys. Home visits in PA, OH, and WV were conducted by sampling teams of 2-3 in-person data collectors. Informed consent and a 50-question survey were administered in-person in PA and OH and by telephone in WV. The survey purpose was to gain demographic information, as well as information on home and drinking-water characteristics. The home visit also involved water collection. Untreated water samples were collected at every household, typically (but not always) at an outdoor spigot, while treated water samples from those households in PA and OH with treatment systems were usually taken at the kitchen tap. A total of 313 home visits were completed during the project.
Laboratory Analyses. Water samples collected in 2018, 2019, and 2020 were analyzed for major cations, major anions, trace metals, and dissolved inorganic carbon (DIC) at Yale University. Samples were analyzed at MIT to quantify levels of more than 60 different volatile- and semi-volatile organic compounds (VOCs and SVOCs), gasoline range organic (GRO) and diesel range organic (DRO) compounds, methane, ethane, and propane. Both the Yale and MIT laboratories developed new procedures and policies in 2020 to protect the health and safety of the study team while minimizing disruptions stemming from the COVID-19 pandemic.
Data Cleaning, Management, and Quality Assurance. The data management team implemented procedures to ensure complete, accurate, and high-quality datasets. Questionnaires and protocols were reviewed for clarity, consistency, and utility of response data. Codebooks were created for each data collection instrument. Survey data collected in the field on paper forms were manually checked and hand-coded to identify errors or missing responses. Data from paper forms were entered into electronic databases, which were designed with variable restrictions to minimize errors. After data entry, algorithms were run to edit check data for inconsistencies, missing values and response error to ensure data validity. Errant data were corrected wherever possible with documentation. The final datasets are stored on secure servers at Yale University.
Report-Back to Participants. Project participants were mailed a detailed report of the results of the chemical analyses of the water samples collected at their homes. Reports were designed to be accessible to a lay audience. Reports included both text and color-coded tables with a legend and instructions on how to interpret the report. The report included Frequently Asked Questions and links to additional resources and study contact information.
UOG Spatial Metrics Calculation. Using a geographic information systems analysis, we constructed several spatial metrics that capture proximity or density of UOG wells in relation to participant residences: number of UOG wells within a buffer around the home, distance to nearest UOG well, inverse distance weighted well count, and inverse distance-squared weighted well count. Locations of homes were obtained by interviewers who collected latitude and longitude at each home using GPS devices. Locations of UOG wells were obtained from PA Department of Environmental Protection Office of Oil and Gas Management, Ohio Department of Natural Resources Division of Oil and Gas, and West Virginia Department of Environmental Protection Office of Oil and Gas. Metrics that incorporate other UOG well attributes, topographical features, and hydrogeological characteristics were also developed.
Analyses of Drinking-Water Chemistry Data. We conducted numerous analyses to characterize drinking-water quality within our Appalachian Basin study areas. In particular, we utilized inorganic mass-ratio frameworks to identify water samples with UOG produced water signatures, and we compared our measurements to historical water-quality records to identify recent changes in drinking-water quality that, based UOG well-pad violation records, may be attributable to UOG activities (Soriano et al. 2021). We evaluated the occurrence of 64 UOG-related chemicals within our PA and OH water samples in context of UOG spatial metrics (Clark et al., in review). Using a coupled flow and transport model, we examined potential sources of diesel-range organics (DROs) and gasoline-range organics (DROs) found at low levels in a subset of our PA water samples (Xiong et al., in review). In addition, we employed hierarchical cluster analysis to group OH and WV samples according to similarities in their inorganic-ion chemistry, which permitted inferences on the influences of coal mining, conventional oil and gas extraction (COG), fertilizer application, and road-salt application on household water quality (Siegel et al. in preparation). We also leveraged measurements of isotopic composition and hydrocarbon ratios to elucidate the most probable sources of methane present within our water samples (Li et al., in press). Finally, we quantified the frequencies at which concentrations of health-relevant chemicals exceeded EPA maximum contaminant levels (MCLs) and secondary maximum contaminant levels (SMCLs).
Our analysis of water samples collected from more than 300 households within the Appalachian Basin is one of the most comprehensive executed to date. We found that chemical concentrations in water samples were generally below federal and state standards and guidelines. Instances in which concentrations of one or more chemicals exceeded available health-based or aesthetic standards were rare, and most of these exceedances were associated with analytes that have natural sources (e.g., arsenic, barium). Wells from a small fraction of households yielded water that, while safe to drink according to current health guideless, had chemical signatures consistent with UOG activities; however, natural or non-UOG related sources for these signatures cannot be excluded at the present time. Efforts aimed at strengthening inferences on chemical-source attribution are ongoing. We found limited correlations between presence or concentration of chemicals and spatial metrics commonly used in human exposure and health studies. These limited associations could indicate that UOG-related water contamination occurs rarely or episodically, that water contamination may be highly localized, that more complex metrics may be needed to capture drinking water exposure, or that metrics linked to adverse health outcomes are better reflecting exposures to other stressors.
2.3 Objective 3: Epidemiologic Analysis of Neonatal Health Outcomes
Overview. We are evaluating whether potential exposure to UOG-related water contaminants (as captured by traditional surrogate metrics and newly developed metrics) is associated with incidence of adverse birth outcomes in the PA and OH study area, while accounting for the potential influence of socioeconomic and other chemical and non-chemical stressors on adverse birth outcomes. To carry out this aim, we have designed and created a retrospective birth cohort study by assembling data from numerous sources. This effort required addressing spatial and temporal misalignments as data have different spatial and temporal resolutions and spatial units. It also required harmonizing variables over time and location, as information recorded on birth certificates differed by year, state, and hospital.
Study Population. Birth records were obtained from the Pennsylvania Department of Public Health for all live, singleton births occurring in PA from 2010-2017 (n=1,057,559). We excluded those with addresses unable to be geocoded to street level (n=53,560; 5%) and those with no information on the critical covariates or outcomes of gender, gestational age, birthweight (n=10,158; 1%). This yielded a final cohort of 993,836. Birth records were obtained from the Ohio Department of Health for all live, singleton births occurring in OH from 2010-2017 (n=1,029,682). We excluded those with addresses unable to be geocoded to street level (n=60,070; 5.8%) and those with no information on the critical covariates or outcomes of gender, gestational age, birthweight (n=2,500; 0.24%). This yielded a final OH cohort of 967,112.
Birth Outcome Assessment. We classified all birth with respect to numerous outcomes. With regard to gestational age, we classified birth as term (47-42 gestational weeks), preterm (<37 weeks) and postterm birth (≥42 weeks). We evaluated birth weight among term births as very low birth weight (<1500 g), low birth weight (<2500 g), normal birth weight (2500-4500 g), and high birthweight (>4500 grams). We evaluated small for gestational age (<10th percentile for gender and gestational age) and large for gestational age (>90th percentile for gender and gestational age). We classified births into six categories of congenital anomalies that are immediately observable and diagnosable at delivery, thereby limiting the possibility of outcome misclassification due to a delayed diagnosis: 1) neural tube defects; 2) cyanotic congenital heart defects; 3) musculoskeletal defects (gastroschisis, omphalocele, limb reductions); 4) cleft lip with/without cleft palate; 5) Down syndrome/other chromosomal defects; and 6) hypospadias.
UOG Exposure Data. Data pertaining to unconventional oil and gas well locations, permit, spud, and completion dates were obtained from relevant state agencies and cleaned and assembled. Well permit and production datasets were obtained from PA DEP (2000-2020), OH DNR (1924-2019), and WV DEP (1985-2019). These datasets included information such as location, type (e.g. oil or gas), target formation (i.e. unconventional or unconventional), spud date, production dates, and production volumes per reporting period. Extensive data processing was performed to remove duplicates and confirm active status for UOG wells. Maternal residential address at birth was obtained and geocoded from birth records. Birth address was used to assign UOG exposures according to a series of spatial surrogates (distance to nearest well, inverse distance weighted well count, and inverse distance squared weighted well count at several buffer sizes (1 km, 2 km, 5 km, and 10 km). Metrics were also calculated for different windows of susceptibility, including 3 months prior to conception through birth and trimester-specific metrics. Spatial surrogates were averaged over each exposure window. We are also applying water-specific metrics in these etiologic analyses, as developed from our modeling efforts.
Sociodemographic Data. Sociodemographic data from the US Census (decennial Census and American Community Survey) was obtained, cleaned, and assembled at the Census tract-level (United States Census Bureau 2000-2017). Community-level variables related to income, education, employment, racial and ethnic distributions, age distributions, and others were assigned to each birth residence, based on census tracts. We also linked individuals to the Center for Disease Control Social Vulnerability Index, a composite metric representing various social conditions.
Model Construction and Statistical Analysis
We generated a list of a priori potential confounders informed by the etiologic literature, including sex, delivery route, race, ethnicity, maternal education, socio-economic status. To account for other environmental exposures which could be serve as positive or negative confounders in the etiologic analyses, we created exposure metrics for both air pollution and pesticide exposure. Logistic regression was used to test associations in univariate unadjusted models and multivariate models adjusted for potential confounders. Results for (i) birth defects and growth and (ii) gestational-age outcomes are being finalized and incorporated into two separate manuscripts anticipated to be completed in early 2022.
2.3.1 Analyses of Sociodemographic Water Supply Complaints and UOG-Related Data
While the health outcome data was being processed, we conducted an analysis utilizing the sociodemographic and UOG data, which complements and provides context for the exposure and health studies. We assessed associations between county-level socio-economic and demographic factors, oil and gas drilling, and three outcomes: number of oil and gas complaints filed by Pennsylvania (PA) residents and both the number and proportion of PA Department of Environmental Protection water supply investigations resulting in positive water supply determinations (i.e., water quality impairments). UOG development may be differentially distributed based on socio-economic or demographic factors. Filing complaints with state agencies is one mechanism by which citizens can register concerns and seek investigations. Understanding inequalities in reporting or response can inform surveillance, resource distribution, and identification of impacted communities. We used hierarchical Bayesian Poisson regression to calculate rate ratios (RR) and 95% credible intervals (CI) for count data and binomial regression to calculate odds ratios (OR) and 95% CI for proportions of positive determinations. Relationships between several socio-economic and demographic factors and complaints and determinations suggested potential environmental and procedural inequities for future investigation. These analyses and results are presented in Clark et al. 2020.
Journal Articles on this Report : 4 Displayed | Download in RIS Format
Other project views: | All 23 publications | 4 publications in selected types | All 4 journal articles |
---|
Type | Citation | ||
---|---|---|---|
|
Silva GS, Warren JL, Deziel NC. Spatial modeling to identify sociodemographic predictors of hydraulic fracturing wastewater injection wells in Ohio census block groups. Environmental Health Perspectives 2018;126(6):067008 (8 pp.). |
CR839249 (2018) CR839249 (2019) CR839249 (Final) R835871 (2018) R835871 (2020) |
|
|
Soriano Jr MA, Siegel HG, Gutchess KM, Clark CJ, Li Y, Xiong B, Plata DL, Deziel NC, Saiers JE. Evaluating Domestic Well Vulnerability to Contamination From Unconventional Oil and Gas Development Sites. Water Resources Research 2020;56(10):e2020WR028005. |
CR839249 (2020) CR839249 (Final) |
Exit |
|
Deziel NC, Brokovich E, Grotto I, Clark CJ, Barnett-Itzhaki Z, Broday D, Agay-Shay K. Unconventional oil and gas development and health outcomes:A scoping review of the epidemiological research. Environmental Research 2020;182:109124. |
CR839249 (2020) CR839249 (Final) |
|
|
Soriano MA, Siegel HG, Johnson NP, Gutchess KM, Xiong B, Li Y, Clark CJ, Plata DL, Deziel NC, Saiers JE. Assessment of groundwater well vulnerability to contamination through physics-informed machine learning. Environmental Research Letters 2021;16(8):084013. |
CR839249 (Final) |
not available |
Supplemental Keywords:
drinking water, vulnerability index, epidemiology, environmental exposures, fate and transport, children’s health, spatial surrogatesRelevant Websites:
Progress and Final Reports:
Original AbstractThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.