Final Report: Geographic Information System and Statistical Analysis Core
EPA Grant Number:
Subproject: this is subproject number 005 , established and managed by the Center Director under
(EPA does not fund or establish subprojects; EPA awards and manages the overall grant for this center).
Southern Center on Environmentally Driven Disparities in Birth Outcomes
Miranda , Marie Lynn
Geographic Information System and Statistical Analysis Core
, Ashley-Koch, Allison
, Goodall, Jonathan
, Miranda , Marie Lynn
, Reiter, Jerome
EPA Project Officer:
May 1, 2007 through
April 30, 2012
(Extended to April 30, 2014)
Centers for Children’s Environmental Health and Disease Prevention Research (2005)
The overall objective of the GIS and Statistical Analysis (GISSA) Core was to support spatial and quantitative analysis needs of the Center research projects, as well as the Community Outreach and Translation Core. Our specific aims included:
Providing support for the development of environmental and social data layers needed to implement data analyses required for the research projects and the Community Outreach and Translation Core;
Providing statistical analysis, advice, and consulting on the broad range of statistical issues that arise in conjunction with the research projects, with a particular emphasis on data reduction methods and modeling spatial and spatio-temporal data within a Bayesian framework; and
Providing analysis for the unique needs of genetic data arising from the clinical and animal studies of the Center.
This support core facilitated the development of innovative quantitative methodology for children's environmental health research associated with the projects and cores. Equally important, it enhanced substantive collaboration between statisticians and scientists involved in the research projects yielding improved analyses of research core data, as well as novel statistical modeling.
Over the project period, the GISSA Core built and maintained a spatially and temporally linked data architecture for maternal and child health outcomes from the prenatal period to early childhood. The central objective was to track mothers and offspring in their residential environments at varying time slices. While capitalizing on the extensive data warehouse that we had assembled since the Center's inception, we continued to integrate data layers into the architecture, such as metrics from EPA's Air Quality System (AQS), National-Scale Air Toxics Assessment, fused air pollution data combining modeled and monitored data, and in-house constructed road proximity measures, in addition to the most recently available years of North Carolina statewide administrative data on births, educational outcomes, and blood lead levels. Based on linking methods described in previous reporting periods, the unique individual-level identifying record enables connections across multiple administrative databases on births, blood lead surveillance, deaths, and educational outcomes. Each of these datasets can be examined separately and in various combinations according to the master linking file.
With the completion of participant recruitment in Project B in August 2011, GISSA staff focused on data quality control/quality assurance, along with finalizing the project analysis dataset and planning related studies with the participants. All of the participants have been integrated into a GIS with information on environmental exposures, factors of the built environment, and standard demographic data.
In addition to data acquisition, management, and georeferencing, the GISSA Core provided innovative statistical support to each of the Projects. In Project A (R833293C001), the GISSA Core developed spatial models to better characterize associations between birth outcomes and environmental exposures, including air pollution and the built environment. In Project B (R833293C002), the GISSA Core supported multiple imputation efforts to construct finalized imputed datasets based on the full study population.
The GIS team street geocoded all residential addresses in the 1990-2012 DBR data for the state of North Carolina, with the exception of 2010 births due to transitions in the birth records. Street geocoding, which allows us to link births to Census data resolved at the block level, have been completed for 80% of 1990-2009 birth records, with success rates increasing over time up to 82% by 1999. Success rates increased significantly as the years progressed, with 2000-2009 geocoded at 86% and 2011-2012 geocoded at 95%.
The DBR is compiled from questionnaires obtained at the time of birth certificate filing and includes elements essential to our proposed analyses. Available variables include, inter alia: maternal residence and state and country of birth; marital status; maternal and paternal race, Hispanic ethnicity, and education; alcohol and tobacco use; plurality; parity; maternal complications; congenital anomalies; whether an infant death certificate was filed; and infant birth weight and gestational age. All 22 years of data have been integrated and standardized to facilitate data linkages and statistical analysis.
We developed methods for linking the North Carolina DBR data with other clinical and administrative datasets. These methods rely on the individually identifying variables provided in the DBR, including full name and date of birth of both infant and mother. We first applied this methodology to link DBR data with participant data from Project B, matching participants who delivered between 2005 and 2009 to their corresponding record in the DBR. This linkage allowed us to examine how accurately the administrative dataset (DBR) captures key information, as well as undertake analysis of residential mobility during pregnancy. Using the 1990-2007 DBR, we linked births occurring to the same mother. This linkage allowed us to examine internatal spacing and birth outcomes across pregnancies, and by further combining these data with the DBR-linked Project B data, we were able to capture the participants' subsequent pregnancy outcomes. In addition, we used this method to link the DBR data with an administrative dataset of educational outcomes at the individual child level, which allowed us to examine how disparities in birth outcomes may have long-term implications for child development.
We expanded the environmental data layers available for use through the SCEDDBO data warehouse to include spatial data on road intensity, criteria air pollutants from EPA's AQS, water quality, environmental releases documented in the Toxics Release Inventory, and housing quality.
We genotyped 1600 blood samples from pregnant women for 412 Single Nucleotide Polymorphisms (SNPs) in 52 genes, primarily involved in either metabolism of heavy metals or immune response. In addition, we generated the Illumina African American Admixture Chip on 1016 NHB women.
The GISSA Core provided innovative statistical support to each of the Projects. For example, in Project A (references included under Project A), the GISSA Core provided statistical methods development to obtain unbiased estimates of the effect of air pollution exposure on birth outcomes (Chang, et al., 2012) and address measurement issues in aggregated estimates of ambient exposure to air pollution (Berrocal, et al., 2012; Gray, et al., 2011; Berrocal, et al., 2011). The GISSA Core marked a second publication on multiple imputation in Project B (references included under Project B). This work extended previously developed imputation methods (Burgette, et al., 2012) to handle inconsistent laboratory measurements (Burgette, et al., 2012).
In support of both Projects A and B, the GISSA Core developed quantile regression techniques to examine the effect of risk factors of interest at varying quantiles along the outcome distribution, rather than limiting analyses to mean effects. As a continuation of the Center's work on joint outcome modeling (Lum and Gelfand, 2012; Burgette, et al., 2011; Burgette, et al., 2012), the GISSA Core also developed multivariate (Neelon, et al., 2011) modeling techniques to better understand individual and shared risk factors of related health outcomes, in addition to capturing geographic variation in disease risk through spatial methods.
Despite the project period ending, we will continue to develop and expand the geospatial data warehouse that supports analysis among various projects. The GIS team will continue to identify additional environmental layers to integrate into our data architecture. With the construction of the spatio-temporal data architecture, we will continue to conduct analyses that leverage the spatial and longitudinal nature of the data, focusing on the quantile and multivariate approaches already developed by our team. We will continue analyses on approximately 1,600 Project B participants with complete pregnancy data, genetic results, and environmental results. Analyses will look at the joint impact of environmental, social, and host factors on birth outcomes, especially as they differ by and within race. Identification of such co-exposures could lead to development and implementation of strategies to prevent adverse birth outcomes, ultimately decreasing or eliminating the racial disparity.
data fusion, meta analysis, health disparities, spatial disaggregation, spatial interpolation, spatial modeling, environmental justice
Progress and Final Reports:
2009 Progress Report
Main Center Abstract and Reports:
Southern Center on Environmentally Driven Disparities in Birth Outcomes
Subprojects under this Center: (EPA does not fund or establish subprojects; EPA awards and manages the overall grant for this center).
R833293C001 Research Project A: Mapping Disparities in Birth Outcomes
R833293C002 Research Project B: Healthy Pregnancy, Healthy Baby: Studying Racial Disparities in Birth Outcomes
R833293C003 Research Project C: Perinatal Environmental Exposure Disparity and Neonatal Respiratory Health
R833293C004 Community Outreach and Translation Core
R833293C005 Geographic Information System and Statistical Analysis Core