Office of Research and Development Publications

“Skill of Generalized Additive Model to Detect PM2.5 Health Signal in the Presence of Confounding Variables”


Garcia, V., P. Porter, E. Gego, AND S. Rao. “Skill of Generalized Additive Model to Detect PM2.5 Health Signal in the Presence of Confounding Variables”. 9th International Conference on Air Quality, Garmisch-Partenkirchen, GERMANY, March 24 - 28, 2014.


The National Exposure Research Laboratory (NERL) Atmospheric Modeling and Analysis Division (AMAD) conducts research in support of EPA mission to protect human health and the environment. AMAD research program is engaged in developing and evaluating predictive atmospheric models on all spatial and temporal scales for forecasting the air quality and for assessing changes in air quality and air pollutant exposures, as affected by changes in ecosystem management and regulatory decisions. AMAD is responsible for providing a sound scientific and technical basis for regulatory policies based on air quality models to improve ambient air quality. The models developed by AMAD are being used by EPA, NOAA, and the air pollution community in understanding and forecasting not only the magnitude of the air pollution problem, but also in developing emission control policies and regulations for air quality improvements.


Summary. Measures of health outcomes are collinear with meteorology and air quality, making analysis of connections between human health and air quality difficult. The purpose of this analysis was to determine time scales and periods shared by the variables of interest (and by implication scales and periods that are not shared). Hospital admissions, meteorology (temperature and relative humidity), and air quality (PM2.5 and daily maximum ozone) for New York City during the period 2000-2006 were decomposed into temporal scales ranging from 2 days to greater than two years using a complex wavelet transform. Health effects were modeled as functions of the wavelet components of meteorology and air quality using the generalized additive model (GAM) framework. This simulation study showed that GAM is extremely successful at extracting and estimating a health effect embedded in a dataset. It also shows that, if the objective in mind is to estimate the health signal but not to fully explain this signal, a simple GAM model with a single confounder (calendar time) whose smooth representation includes a sufficient number of constraints is as good as a more complex model.Introduction. In the context of wavelet regression, confounding occurs when two or more independent variables interact with the dependent variable at the same frequency. Confounding also acts on a variety of time scales, changing the PM2.5 coefficient (magnitude and sign) and its significance according to the amount of variability at non-confounding time scales. Removal of time scales that do not interact from an analysis of hospital admissions and PM2.5 has the potential for reducing bias and parameter uncertainty. A simulation using the GAM model as presented in epidemiological studies was designed to assess how collinearity between PM2.5 and four confounding variables (calendar time values, ozone 8-hr daily maximum concentration, daily maximum temperature, and relative humidity), limits our ability to accurately quantify the health effect (β) of PM2.5 on daily counts of hospital admissions. Main. The objective of the simulation was to assess the skills of the GAM model at teasing out the health factor hidden in the data. In addition to this basic dataset, a factor variable indicating whether or not a given day coincides with the 3-month (June to August) ozone season was also integrated in the dataset to allow comparison of GAM results obtained when continuous time series or temporally unconnected seasons are utilized. In order for the results not to give precedence to the confounding variables, time series of all confounders were kept unchanged for all model runs while time series of PM and HA were systematically modified. The correlation coefficient increases with the number of knots in the time smooth function, supporting that the overall effect of PM pollution on hospital admissions is rather small in comparison to the effects of other factors such as seasonality. Increasing the number of knots, i.e., increasing the winding of the ‘time’ smooth factor, quickly causes the significance of the other confounders to vanish. The latter finding suggests that a very simple GAM model in which ‘time’ is defined as the only confounder but the number of knots in the smooth term is large would be more adequate than a more complicated model to tease out the health signal embedded in the data.



Record Details:

Product Published Date: 03/28/2014
Record Last Revised: 11/25/2015
OMB Category: Other
Record ID: 310418