2010 Progress Report: Improving Particulate Matter Source Apportionment for Health Studies: A Trained Receptor Modeling Approach with Sensitivity, Uncertainty and Spatial Analyses

EPA Grant Number: R833866
Title: Improving Particulate Matter Source Apportionment for Health Studies: A Trained Receptor Modeling Approach with Sensitivity, Uncertainty and Spatial Analyses
Investigators: Russell, Armistead G. , Klein, Mitchel , Marmur, Amit , Mulholland, James , Sarnat, Stefanie Ebelt , Sarnat, Jeremy , Tolbert, Paige
Institution: Georgia Institute of Technology , Emory University
EPA Project Officer: Ilacqua, Vito
Project Period: December 1, 2008 through November 30, 2012 (Extended to November 30, 2013)
Project Period Covered by this Report: November 1, 2009 through October 31,2010
Project Amount: $899,956
RFA: Innovative Approaches to Particulate Matter Health, Composition, and Source Questions (2007) RFA Text |  Recipients Lists
Research Category: Health Effects , Particulate Matter , Air


As discussed in detail of in the 2009 progress report, the main objective of this research is to test the following four hypotheses derived from ongoing source apportionment (SA)-based epidemiologic and air quality modeling studies:
  1. A receptor-based approach, trained using an ensemble of model results (including receptor and emissions-based models), can be developed that neither introduces excessive nor inhibits an appropriate level of day-to-day variability.
  2. The method can be applied to long-term data sets for use in acute health effects studies.
  3. The method can be used to temporally interpolate between observations (e.g., for data available every third day) and spatially interpolate between urban and rural monitors.
  4. Uncertainties can be propagated from SA model inputs to health analysis outputs, with ouputs most sensitive to source profile inputs.
To test the hypotheses, a three-step chemical mass balance (CMB) approach has been developed for particulate matter (PM) source apportionment (SA) that utilizes an ensemble of both source- and receptor-based approaches to train a CMB method for use in longer term applications. These three steps include:
  1. Averaging SA results, using weights based on method uncertainty, from four receptor models and one chemical transport model, the Community Multiscale Air Quality (CMAQ) model, to develop ensemble-based source impacts.
  2. Using the weighted source impacts (from Step 1) in an application of CMB with the Lipschitz Global Optimizer (CMB-LGO) to calculate nine ensemble-based source profiles (EBSPs); the source profiles developed include gasoline vehicles (GV), diesel vehicles (DV), dust (DUST), biomass burning (BURN), coal combustion (COAL), secondary organic carbon (SOC), SULFATE, NITRATE, and AMMONIUM.
  3. Using the EBSPs on a longer term data set of observations to develop improved source impacts.

Progress Summary:

Since our last progress report, we have focused on using the ensemble method’s Step 1 to gain new insights into uncertainties of ensemble results as well as source apportionment methods.  One of the least understood aspects of source apportionment is that uncertainties in daily source impacts and overall method uncertainties have not been well characterized.  Furthermore, they often use different methods, intrinsic to each SA method, which makes inter-comparison of SA method uncertainties difficult.
Here we have performed the ensemble method for July 2001, to represent summer, and January 2002, to represent winter, in a manner similar to Lee, et al. (2009).  Three features of this work distinguish it from that of Lee, et al. (2009).  First, we performed source apportionment using CMB-RG, CMB-LGO, and PMF using a data set for the Jefferson St. (JST) SEARCH site in Atlanta, Georgia from January 1, 1999 through December 31, 2004.  Missing data were treated in the same manner as Marmur, et al. (2005).   We did not include several fitting species because, on the vast majority of days, they were below the detection limit.  These species include metals such as  Al, As, Ba, Sb, Sn, and Ti.  In addition, we focused on ensembling using no weights (N=0) and weights of uncertainty squared (N=2), 1/σN, where σ is the daily source impact uncertainty (Lee, et al., 2009 focused on weights using 1/σ).  Finally, ensemble average uncertainties take into account the covariance of source impacts from the five SA methods.

We have developed a two-step method for determining source impact uncertainties.  First, we average the five individual source apportionment methods and determine uncertainties of the ensemble by using propagation of errors.  Next, we estimate an updated uncertainty for each SA method to be equal to the root mean square error (RMSE) between each SA method and the ensemble.  Subsequently, we estimate an updated uncertainty for the ensemble using these new SA uncertainties by propagation of errors.

One major consequence of setting the updated source impact uncertainties to the RMSE for the five individual SA methods is that the daily updated uncertainties for each source and method will have the same uncertainty regardless of the magnitude of source impact.  Thus, whereas traditional source apportionment results often have daily relative uncertainties that are constant, our work results in constant daily absolute uncertainties.  We calculate updated uncertainties this way because square errors between each individual method and the ensemble do not, in general, correlate well with source impact, based on linear regression results.

Ensembling results in: 

  • reduction of zero impact days and provides results for every day of the data set, and
  • reduced variability by averaging out excessively high and low source impact days.
The choice of weighting does not significantly change the overall relative uncertainties (taken here to mean the root mean square average of daily source impact uncertainties divided by average source impacts) in the ensemble averages for primary sources and SOC.  Overall relative uncertainties for SO4, NO3 and NH4 are increased significantly (greater than a factor of 2) in both seasons. The ORUs for each individual SA method change significantly between the non-weighted and weighted cases.   The choice of weighting has significant impact for source impacts and source impact uncertainties that vary significantly from method-to-method.

In summer, the ensemble, when weighted, has the lowest overall relative uncertainties for BURN (45%), COAL (48%), NH4 (3%), and SOC (39%) and has the second lowest overall relative uncertainties for GV (45%), DV (21%), DUST (61%), SO4 (3%), NO3(11%); for these categories, CMB-MM had lowest overall relative uncertainties with 28%, 6%, 40%, 2% and 9%, respectively.  Without weighting, the ensemble has the lowest overall relative uncertainties for DV (37%), DUST (47%), BURN (35%), and COAL (41%). For SO4 (11%), NO3 (47%) NH4 (12%) and SOC (34%) overall relative uncertainties are greater than three to four receptor models; this is due the influence of large initial uncertainties in CMAQ. 

In winter, the ensemble, when weighted, has the lowest overall relative uncertainties for GV (48%), DV (38%), BURN (61%), and SOC (56%) and has the second lowest overall relative uncertainties for DUST (92%), COAL (77%),  SO4 (12%), NO3 (4%), and NH4 (2%); for these categories, PMF (61%), CMAQ (59%), CMB-RG/CMB-LGO (both at 11%), CMB-MM (3%), and CMB-LGO (1%), respectively, had the lowest overall relative uncertainties.  Without weighting, the ensemble ORUs do not change very much from the weighted case for primary sources and SOC:  GV (46%), DV (36%), DUST (99%), BURN (53%), COAL (65%), and SOC (45%).  In addition, the non-weighted ensemble did not have the lowest overall uncertainties for any category.  Nevertheless, no one source in the ensemble had the highest uncertainty.  For example, all four receptor models have very high ORUs for DUST, ranging from 127% to 1037%, while CMAQ had an ORU of 73%. 

The ensemble ORUs are stable for primary sources and SOC and are all within a factor of 2 between the weighted and non-weighted case in both seasons.  This is not true for individual SA methods, which have a number of sources that vary significantly from the weighted to non-weighted cases.

Future Activities:

We will use these current ensemble results and develop new source profiles for Step 2 and perform source apportionment for a 10-year data set at JST.  In addition, we will focus on the following issues:  assessing variability, and applying this method to other locations and its use in epidemiologic modeling.  We will investigate, using a central-difference metric, as a measure of variability, which will be applied for the 9.5-year data set for SA results using both EBSPs and MBSPs.  We also will conduct time series filtering by using a Fourier transform method to better understand variability. We will be applying the procedure to a simulated JST data set that mimics other data sets that typically only have speciated PM2.5 data every 3 or 6 days and develop a method to interpolate data for days without measurements.  Subsequently, we will apply the entire ensemble method to a data set in St. Louis, Missouri, to assess regional differences and demonstrate applicability to other locations.  Finally, we will use the results in epidemiologic modeling.

Journal Articles:

No journal articles submitted with this report: View all 30 publications for this project

Supplemental Keywords:

ensemble, ensemble-trained CMB, source apportionment, health study

Progress and Final Reports:

Original Abstract
  • 2009 Progress Report
  • 2011 Progress Report
  • 2012 Progress Report
  • Final Report