2012 Progress Report: Improving Particulate Matter Source Apportionment for Health Studies: A Trained Receptor Modeling Approach with Sensitivity, Uncertainty and Spatial Analyses

EPA Grant Number: R833866
Title: Improving Particulate Matter Source Apportionment for Health Studies: A Trained Receptor Modeling Approach with Sensitivity, Uncertainty and Spatial Analyses
Investigators: Russell, Armistead G. , Klein, Mitchel , Mulholland, James , Sarnat, Stefanie Ebelt , Sarnat, Jeremy , Tolbert, Paige
Current Investigators: Russell, Armistead G. , Klein, Mitchel , Marmur, Amit , Mulholland, James , Sarnat, Stefanie Ebelt , Sarnat, Jeremy , Tolbert, Paige
Institution: Georgia Institute of Technology , Emory University
EPA Project Officer: Ilacqua, Vito
Project Period: December 1, 2008 through November 30, 2012 (Extended to November 30, 2013)
Project Period Covered by this Report: January 1, 2012 through December 31,2012
Project Amount: $899,956
RFA: Innovative Approaches to Particulate Matter Health, Composition, and Source Questions (2007) RFA Text |  Recipients Lists
Research Category: Health Effects , Particulate Matter , Air


As discussed in detail in the 2010 and 2011 progress reports, the main objectives of this research are to test four hypotheses derived from ongoing source apportionment (SA)-based epidemiologic and air quality modeling studies:

  1. A receptor-based approach, trained using an ensemble of model results (including receptor and emissions-based models), can be developed that neither introduces excessive nor inhibits an appropriate level of day-to-day variability.
  2. The method can be applied to long-term data sets for use in acute health effect studies.
  3. The method can be used to temporally interpolate between observations (e.g., for data available every third day) and spatially interpolate between urban and rural monitors.
  4. Uncertainties can be propagated from SA model inputs to health analysis outputs, with outputs most sensitive to source profile inputs.

To test the hypotheses, a three-step chemical mass balance (CMB) approach has been developed for particulate matter (PM) source apportionment (SA) that utilizes an ensemble of both source- and receptor-based approaches to train a CMB method for use in longer term applications. These three steps include:

  1. Averaging SA results, using weights based on method uncertainty, from four receptor models and one chemical transport model, the Community Multiscale Air Quality (CMAQ) model, to develop ensemble-based source impacts.
  2. Using the weighted source impacts (from Step 1) in an application of CMB with the Lipschitz Global Optimizer (CMB-LGO) to calculate nine ensemble-based source profiles (EBSPs):  The source profiles developed include gasoline vehicles (GV), diesel vehicles (DV), dust (DUST), biomass burning (BURN), coal combustion (COAL), secondary organic carbon (SOC), SULFATE, NITRATE, and AMMONIUM.
  3. Using the EBSPs on a longer term data set of observations to develop improved source impacts. 

Progress Summary:

As detailed in the 2010 and 2011 progress reports, we have focused on using the ensemble method’s Step 1 to gain new insights into uncertainties of ensemble results as well as source apportionment methods. One of the least understood aspects of source apportionment is that uncertainties in daily source impacts and overall method uncertainties have not been well characterized. Furthermore, they often use different methods, intrinsic to each SA method, which makes inter-comparison of SA method uncertainties difficult. We published the work of ensemble averaging four methods (CMB-LGO, PMF, CMB-MM and CMAQ) using the methods refined in 2011. Because most CMB analyses do not use CMB-LGO, we performed a sensitivity analysis using CMB-RG in lieu of CMB-LGO. We also validated the ensemble results by comparing with SOC estimates from another independent method and determined that mixed weighting is the most appropriate way to conduct the ensemble. In 2012, we developed a Bayesian method of ensemble averaging. Our results show that Bayesian-based ensemble averaging results in a higher correlation with levoglucosan, a tracer of biomass burning. This work was presented at the 2012 American Association of Aerosol Research conference.  We currently are in the process of submitting this work to a top journal. It also should be noted that instead of using the Excel-based CMB-LGO, we developed a Matlab-based program that incorporates gas-based constraints; we refer to this method as CMB-GC (gas contratints). The input into the initial ensemble, however, still uses CMB-LGO results.

We have performed the ensemble method for July 2001, to represent summer, and January 2002, to represent winter, in a manner similar to Lee et al. (2009).  Three features of this work distinguish it from that of Lee et al. (2009).  First, we performed source apportionment using CMB-RG, CMB-LGO, and PMF using a data set for the Jefferson St. (JST) SEARCH site in Atlanta, GA, from January 1, 1999 through December 31, 2004.  Missing data were treated in the same manner as Marmur et al (2005).  We did not include several fitting species because on the vast majority of days, they were below the detection limit.  These species include:  Al, As, Ba, Sb, Sn, and Ti.   In addition, we focused on ensemble averaging using no weights (N=0) and weights of uncertainty squared (N=2), 1/σN, where σ is the daily source impact uncertainty (Lee et al. 2009 focused on weights using 1/σ).  Finally, ensemble average uncertainties take into account the covariance of source impacts from the five SA methods.

In the Bayesian-based ensemble averaging method, a posterior distribution of uncertainties is determined using subjective prior information with root mean square error (RMSE) between each method and the ensemble average as updated information. In the previous approach, the uncertainty of each method, the RMSE, was constant.  In the Bayesian approach, we treat each method’s uncertainty as itself having uncertainty. That is, the RMSE represents the average uncertainty, which can take on values from a distribution formulated in a Bayesian context. Using this approach, method uncertainties can be sampled L times from the formulated distribution. This give’s L sets of uncertainties that can be used for N realizations of ensemble results for any given day. Thus, if there are K days of methods results, then there can be K*L ensemble results. There are two major consequences of this. First, ensemble results are more variable than ensemble averaging using the RMSEs, because the Bayesian formulation results in different weights for each ensemble average. Second, having K*L ensemble averages results in a distribution of K*L source profiles. For each day in the long-term time series, M source profiles can be chosen from the distribution of K*L source profiles, resulting in a distribution of M source impacts for each day. The Bayesian formulation of SA method uncertainties, with subsequent random sampling from distributions, obviates the need for propagation of errors in estimating uncertainties. Further, random sampling and multiple realization of ensemble averages, source profiles and final source impacts, results in distributions that automatically propagate uncertainties. 

Both non-informative and informative priors were tested. For each day of the short term application of the four SA methods, source impact uncertainties are sampled from the Bayesian-based posterior distribution. These uncertainties are used as weights to determine an ensemble average. A Monte Carlo technique is used to estimate a distribution of Bayesian ensemble-based source impacts for each day in the ensemble. These distributions of source impacts then are used to determine distributions of two seasonally based source profiles. For each day in a long term PM2.5 data set, 10 source profiles are sampled from these distributions and used in a CMB application resulting in 10 SA results for each day. This formulation results in a distribution of daily source impacts rather than a single value with an estimated uncertainty. The average and standard deviation of the distribution are used as the final estimate of source impact and uncertainty, respectively. 

Ensemble averaging results in reduction of zero impact days and provides results for every day of the data set and has reduced variability by averaging out excessively high and low source impact days.  The ensemble averages and their overall uncertainties are consistent with ensemble averages found in Balachandran et al. [2012].  This is expected because the mean of the posterior distribution is approximately equal to the RMSE.

We used the ensemble results to determine new source profiles, one that was representative of summer and one that was representative of winter. To determine the source profiles, we ran CMB-LGO in “reverse," where the source impacts were treated as the known quantity (i.e. the ensemble averages) and the source profiles were treated as the unknown. We then ran CMB-LGO for a data set at JST from 8/31/98 – 12/31/07 using these new Bayesian-based source profiles (BBSPs) and with measurement based source profiles (MBSPs) (cite Marmur et al. 2005). The long-term application source impacts using these two sets of source profiles are highly correlated for all sources except biomass burning, coal combustions and to a lesser extent SOC. Using BBSPs resulted in similar values of the chi-squared statistic. Zero impact days for diesel vehicles also were reduced using BBSPs. However, zero impact days for SOC increased using EBSPs, although the majority of these zero impact days were in winter (October – March). The Bayesian-based biomass burning source impacts using profiles derived from non-informative priors correlated better with observed levoglucosan (R2=0.66) and water soluble potassium (R2=0.63) than source impacts estimated using measurement-based source profiles (R2=0.21 and 0.5, respectively) and positive matrix factorization (R2=0.016 and 0.26, respectively).  The Bayesian approach led to closer agreement with total mass (predicted to observed PM2.5 ratio of 0.93) than other methods.  The Bayesian approach also corrects for expected seasonal variation of biomass burning and secondary impacts. 

  Our health research partners at Emory University used these source apportionment results in their health models.   

Future Activities:

We are in the process of applying the ensemble method to data sets from monitoring stations located in Yorkville, GA, and South Dekalb (Atlanta, GA) to assess regional differences and demonstrate applicability to other locations. Finally, we will use geospatial techniques to develop representative source impacts that can be used in epidemiologic modeling for metropolitan Atlanta, GA.

In addition, we will focus on the following issues:  assessing variability, applying this method to other locations and its use in epidemiologic modeling. We will investigate, using a central-difference metric as a measure of variability, which will be applied for the 9.5 year data set for SA results using both EBSPs and MBSPs.  We also will conduct time series filtering by using a Fourier transform method to better understand variability.

Journal Articles on this Report : 1 Displayed | Download in RIS Format

Other project views: All 30 publications 20 publications in selected types All 18 journal articles
Type Citation Project Document Sources
Journal Article Balachandran S, Pachon JE, Hu Y, Lee D, Mulholland JA, Russell AG. Ensemble-trained source apportionment of fine particulate matter and method uncertainty analysis. Atmospheric Environment 2012;61:387-394. R833866 (2012)
R833866 (Final)
R834799 (2012)
R834799 (2013)
R834799 (2014)
R834799 (2015)
R834799 (2016)
R834799 (Final)
R834799C003 (2013)
R834799C003 (2014)
R834799C003 (2015)
R834799C003 (Final)
R834799C004 (2013)
R834799C004 (2014)
R834799C004 (2015)
R834799C004 (Final)
  • Full-text: ScienceDirect-Full Text HTML
  • Abstract: ScienceDirect-Abstract
  • Other: ScienceDirect-Full Text PDF
  • Supplemental Keywords:

    ensemble, ensemble-trained CMB, source apportionment, health study

    Progress and Final Reports:

    Original Abstract
  • 2009 Progress Report
  • 2010 Progress Report
  • 2011 Progress Report
  • Final Report