Grantee Research Project Results
2007 Progress Report: Evaluation of Regional Scale Receptor Modeling
EPA Grant Number: R832156Title: Evaluation of Regional Scale Receptor Modeling
Investigators: Lowenthal, Douglas H. , Chen, Lung-Wen Antony , Watson, John L. , Koracin, Darko
Institution: Desert Research Institute
EPA Project Officer: Chung, Serena
Project Period: January 1, 2005 through December 31, 2007 (Extended to December 31, 2009)
Project Period Covered by this Report: October 1, 2006 through September 30,2007
Project Amount: $436,687
RFA: Source Apportionment of Particulate Matter (2004) RFA Text | Recipients Lists
Research Category: Air Quality and Air Toxics , Particulate Matter , Air
Objective:
Evaluate multivariate and trajectory-based receptor models for regional source apportionment relevant to the USEPA Regional Haze Rule. Document models currently in use, including classical factor analysis, Positive Matrix Factorization (PMF), UNMIX, and Trajectory Mass Balance Regression (TMBR). Review previous model applications and critically evaluate the results. Apply receptor models to synthetic data generated with an air quality model for two eastern IMPROVE sites: Brigantine National Wildlife Refuge (BRIG), NJ and Great Smoky Mountains National Park (GRSM), TN. Document the approach required to reproduce the known regional contributions to sulfate. Perform a “blind” test on a second simulated data set using this guidance. Apply models to real-word IMPROVE data at these sites. Finalize guidance for systematic application and validation of these models in future regional-scale applications.
Progress Summary:
Generation of Synthetic IMPROVE Data Sets
This project is a cooperative effort with Drs. Naresh Kumar and Eladio Knipping of EPRI. EPRI’s role is to provide synthetic IMPROVE data sets using the SMOKE/ CMAQ/MM5 modeling system. EPRI subcontracted this task Sonoma Technology, Inc. (STI), which generated synthetic IMPROVE data at BRIG and GRSM for summer (July-September) and winter (January-March), 2002.
Figure 1 shows the modeling domain divided into 7 source regions representing Regional Planning Organizations (RPOs) and subsets thereof.
Figure 1. Study domain for application of receptor models to synthetic data. The BRIG and GRSM sites are indicated by the heavy black dots in southern NJ and eastern TN.
Primary PM2.5 source profiles for 43 source categories were taken from the EPA’s Speciate and DRI’s PM source profile libraries. The profiles were used in the CMAQ model to produce hourly multi-species IMPROVE-style concentration data. The meteorological input to CMAQ was high-resolution (12 km) data generated with the NCAR Mesoscale Meteorological Model (MM5). Forty-three additional variables (T1-43) were added to each source profile, with unique values equal to the primary PM2.5 emitted by that source. This allows us to follow each source’s primary PM2.5 contribution to each receptor site. Contributions from each of the seven regions were estimated by sequentially running the model with 30% of a given region’s anthropogenic emissions removed. For summer, the “true” regional contributions were provided to DRI as a guide in the receptor modeling analysis. The summer data simulations were completed during the 2006 reporting period. However, errors were discovered in the contributions for region 6 (CENRAP-N) and these were recalculated in 2007. Simulations for winter were completed during 2007. Hourly average PM2.5 concentrations at BRIG and GRSM were given to DRI for a “blind” analysis, which means that the “true” contributions were retained by EPRI and were not provided to DRI. The primary goal of the receptor modeling analysis is to correctly apportion sulfate, the main contributor to light scattering at the two sites, to the seven source regions (Figure 1). During the 2007 reporting period, issues related to the PMF and TMBR receptor models were addressed. Work was begun on verifying HYSPLIT trajectories used in the TMBR model with DRI’s Lagrangian particle model that accounts for turbulences in estimating vertical and horizontal motion.
PMF Results
The initial analysis using CMAQ concentrations without measurement error showed that PMF was not able to resolve regional contributions to sulfate. The PMF factors were interpretable in terms of the individual source categories (and mixtures thereof) whose speciated emissions profiles were used to construct the data. During the 2007 reporting period, PMF was run on simulated data with various perturbations, as proposed: 1) different temporal averaging periods: e.g., 1, 6, and 24 hours; 2) randomly perturbed concentrations based on the uncertainties of actual IMPROVE species concentrations; 3) perturbation of the source profiles in post-processing to better resolve the regional contributions. Figure 2 compares sulfate associated with each of 7 PMF factors with the true average sulfate contributions (from CMAQ) at BRIG and GRSM. Sulfate was included in the PMF analysis, which determined a 7-factor solution. These factors were then associated with the regions one-by-one, starting with the smallest factor, to give the lowest RMSQ (root mean squared) difference between the modeled and “true” sulfate contribution. At BRIG and GRSM, PMF put most of the sulfate into a single factor, regardless of averaging time, although decreasing the averaging time increased the amount of sulfate associated with this factor. This was generally the case at both sites. Note that at both sites, the PMF “sulfate” factor corresponded most closely with the region contributing the majority of sulfate.
Figure 2. Comparison of PMF sulfate apportionment with true regional contributions to sulfate with 1, 6, and 24-hour data averaging times.
To examine effect of applying PMF to synthetic data with realistic uncertainties, actual IMPROVE data from BRIG and GRSM from January, 2002 through May, 2006 were analyzed. The mean squared minimum detection limit (MDL) was calculated for each species. Synthetic average concentrations equal to zero were replaced with MDL/3. The corresponding uncertainties were set to the same value. Data from BRIG and GRSM were sorted by tertile (3 segments) for each species. The average fractional uncertainties in each tertile of the actual IMPROVE data were applied to the corresponding tertiles of the synthetic data. Fractional uncertainties in the synthetic data were not allowed to exceed 0.6. The synthetic concentrations were then randomly perturbed around the uncertainties, assuming that the uncertainties were distributed log-normally.
The effect of source-profile variability was examined by changing the content of Se in the coal source profile in post-processing. This is possible because the primary PM2.5 contributions for each of the 43 source categories were tagged in CMAQ. The coal combustion source was a major contributor to primary PM2.5 at both sites. It was also the major source of SO2 emissions in the modeling domain. Selenium is highly enriched in the coal combustion profile. Thus, for BRIG, Se in the coal profile in regions 2, 3, 4, and 5 (see Figure 1) was multiplied by 10. For GRSM, Se in the coal profile in regions 2, 4, and 5 (see Figure 1) was multiplied by 10. This increased the ambient Se concentrations when coal contributions to primary PM2.5 from these regions was strong. Since the main regional contributor to sulfate was the local region at both sites, i.e., MANE-VU at BRIG and VISTAS-W at GRSM, it was hoped that this perturbation would increase the ability of PMF to distinguish contributions of the non-local regions to sulfate at the two receptor sites.
PMF results from perturbing 6-hour average concentrations around realistic uncertainties and by increasing the Se content of primary PM2.5 coal combustion emissions are shown in Figure 3. The associations between sulfate from the PMF factors and the true regional sulfate contributions were determined as described above. An additional PMF run was performed on the perturbed data by increasing the sulfate uncertainty by a factor of 1000. This was equivalent to removing sulfate from the PMF analysis and estimating its contribution to the factor profiles by multiple linear regression (MLR) against the modeled G matrix. The reference PMF run was defined as concentrations without measurement uncertainty. However, since PMF requires weighting the species concentrations by an uncertainty, concentrations were weighted the same, i.e., 1% of the mean concentration for each species.
Figure 3. Comparison of PMF sulfate apportionment with true regional contributions to sulfate for perturbed 6-hour average concentrations: perturbation around random measurement uncertainties and 10-fold increase in Se content of the coal source profile for emissions in selected regions. bThis case represents down-weighing sulfate, as described in the text, above.
Figure 3 shows that for the baseline case (no measurement uncertainty), most of the sulfate in the PMF solution was associated with a “secondary sulfate factor” whose contribution was much larger than those of the regional contributions to sulfate at both sites. Factor analysis often produces such a factor since, as a secondary species, sulfate does not necessarily correlate well with any primary species or cluster of species that may come from a particular source type or region. A stand-alone “sulfate factor” therefore results in relatively low chi-square values in the PMF fitting. Using data with perturbed uncertainties decreases the weighing of sulfate in the PMF fitting. The extreme case presented above shows that when the sulfate uncertainty was set to an unrealistically high level, the dominant “sulfate factor” disappeared. In this case, sulfate was apportioned to the PMF factors using MLR. However, as shown in case “b” in Figure 3, substantial sulfate was not apportioned to any factor by the MLR because sulfate was not well correlated with the factors. Much of the sulfate was associated with a large intercept in the MLR. Large MLR intercepts have typically been interpreted as “background”. This is misleading because all of the ambient sulfate comes from pollution sources. The intercept represents an average of concentrations that are not well correlated with the independent variables, in this case, PMF factors. Increasing the Se content by a factor of 10 in coal combustion emissions (above) appeared to increase sulfate associated with region 5 (MIDWEST) for BRIG but not for GRSM, probably because the latter site is surrounded by regions with high coal-combustion emissions. The result at BRIG demonstrates the effect of enhancing the relationship between Se, which is enriched in coal-combustion emissions, and secondary sulfate from that source. This actually simulates historical trends of 20-30 years ago when power generation in the Northeast and Midwest was dominated by oil- and coal-fired power plants, respectively. Thurston and Spengler (Harvard School of Public Health) presented factor analysis results from Boston in the early 1980’s which showed a secondary sulfate factor that contained a high loading for Se. They interpreted this factor as representing Midwestern coal combustion.
Trajectory Mass Balance Regression (TMBR)
HYSPLIT trajectories were calculated every 3 hours using EDAS and high-resolution MM5 wind fields starting at 100, 200, 500, 1000, 1500 and 3000 m above each site. The TMBR model was applied to daily sulfate concentrations regressed on the number of 1-hour trajectory endpoints over each region. In the last progress report, TMBR analysis was based on ordinary least squares regression, which resulted in negative regression coefficients and thus negative regional source contributions, mainly when the “true” average regional contributions to sulfate were small. It has long been recognized that since negative source contributions do not exist, a non-negativity constraint should be applied to receptor model estimates. For example, the PMF model constrains both source contributions and source profiles to positive values. The following results are based on non-negative least squares, where the regression coefficients were constrained to values of zero or greater. This gives less weight to large estimation errors for small source contributions when TMBR based on different meteorological inputs (EDAS vs MM5) and starting altitudes are compared. Table 1 compares TMBR results for BRIG and GRSM with EDAS and MM5 inputs at six different trajectory starting elevations. The index of comparison is the weighted average absolute error (AAE):
where truei and estimatedi refer to the average true (CMAQ) and estimated (TMBR) sulfate from the ith regional source. Weighting is by the true values so as to reduce the influence of small source contributions on the AAE. The lower the AAE, the better the fit. The best fits (lowest AAE’s) at BRIG and GRSM were for starting altitudes of 100 and 500 m, respectively, using both EDAS and MM5 inputs. However, the results were not always consistent. For example, at BRIG, there were also relatively low AAE’s for the 1000 m (49%) and 3000 m (51%) starting altitudes. At GRSM, there were similar AAE’s for the 200 m (41%) and 1500 m (39%) starting altitudes. These ambiguities may be due to the assumption underlying TMBR that the residence time of a back trajectory over an areal source is proportional to the contribution of that source to a receptor. They may also reflect the inability of the HYSPLIT trajectory model to correctly simulate the vertical motions of the trajectory. We hope to gain insight into the latter effect by comparing HYSPLIT trajectories with trajectories estimated with the Lagrangian particle model.
Table 1. Comparison (AAE %) of TMBR results at BRIG and GRSM with EDAS and MM5 as inputs to HYSPLIT at 6 starting elevations.
|
BRIG |
GRSM |
||
|
|
|
|
|
Elevation (m) |
EDAS |
MM5 |
EDAS |
MM5 |
|
|
|
|
|
100 |
52 |
45 |
43 |
48 |
200 |
56 |
65 |
42 |
41 |
500 |
67 |
72 |
38 |
33 |
1000 |
86 |
49 |
50 |
40 |
1500 |
90 |
77 |
62 |
39 |
3000 |
75 |
51 |
56 |
55 |
Lagrangian Particle Modeling
In order to enhance the back-trajectory analysis, we used a Lagrangian random particle model (Koracin et al., 2007) in “inverse mode” to estimate the most probable source areas starting from the receptor locations. By using inverse modeling, we will be able to establish probabilities for the receptor-source relationship between a single receptor and many source elements. The inverse modeling leads to explicit calculation of a source-receptor relationship in terms of linear transformations (i.e., in a matrix form) to describe the relative importance of specific subsets of the source to the impact at the receptor site. The elements of the receptor-source matrix simply represent the residence times spent in the respective source grid cells by the particles released from the receptor.
A backward simulation for 7 days was conducted with the particle model starting at 20 July, 2002 at 00 UTC (local time: 19 July, 2002 at 1900 EST). The highest 24-hour average CMAQ-derived sulfate concentration at BRIG occurred on 19 Jul (19.0 μg/m3). The largest regional contributions estimated by CMAQ were from region 5 (MIDWEST), region 1 (MANE-VU) and region 3 (VISTAS-W). Figure 4 shows the instantaneous positions of the “hypothetical” particles at -168 hours obtained from the particle model. The direction of particle transport corresponds well with the HYSPLIT back trajectories.
Figure 4. Instantaneous position of the particulate matter obtained from the Lagrangian particle dispersion model at 7 days in the inverse mode (for clarity, particles over the ocean and water bodies are excluded in the plot). The HYSPLIT-derived back trajectories from the BRIG site at different starting heights (100, 200, 500, 1000, and 1500 m) are overlaid on the plot.
Preliminary analysis showed that the correlation coefficients between the CMAQ- simulated average daily sulfate concentrations and the particle concentrations from the particle model for regions 1 to 7 lie in the range 0.78-0.97. Note that the spread of the particles indicates a range of probabilities for determining the most likely source areas. Since the stochastic particle model accounts for uncertainties in prediction of turbulence, it is important to assess the relative ratios (not the occurrence of individual particles) among the regions to identify the most likely source areas. This will be the subject of continuing analysis.
Future Activities:
This project has been extended to December 31, 2008 under a no-cost-extension. In the remaining time, results will be synthesized for a final report and publications. Results from PMF, UNMIX and TMBR applied to summer synthetic data will be used to describe the ability of these models to resolve regional contributions to sulfate when the actual regional contributions are known. The strengths and weaknesses of these models will be summarized. The winter synthetic data will be used in a “blind” test, where the true source contributions will be unknown to the PMF/TMBR modellers at DRI. PMF will be applied to real-world IMPROVE summer concentrations from 2000-2006 at BRIG and GRSM. The results will be compared with those obtained from the summer synthetic data and from previous studies at these sites. TMBR will be done using EDAS and MM5 input for 6, 12, and 24-hour averaging times for ambient (synthetic) sulfate concentrations and starting altitudes of 100, 200, 500, 1000, 1500, and 3000 m above the sites. Uncertainties of single-realization HYSPLIT trajectories that do not account for turbulence will be estimated using the Lagrangian random particle dispersion model operated in inverse mode. The CMAQ emissions and simulated concentrations at the BRIG and GRSM receptors during episodes of elevated sulfate will be used as references.
Journal Articles on this Report : 1 Displayed | Download in RIS Format
Other project views: | All 10 publications | 7 publications in selected types | All 6 journal articles |
---|
Type | Citation | ||
---|---|---|---|
|
Watson JG, Chen L-WA, Chow JC, Doraiswamy P, Lowenthal DH. Source apportionment: findings from the U.S. Supersites Program. Journal of the Air & Waste Management Association 2008;58(2):265-288. |
R832156 (2007) R832156 (2008) R832156 (Final) |
Exit Exit |
Supplemental Keywords:
RFA, Scientific Discipline, Air, Ecosystem Protection/Environmental Exposure & Risk, particulate matter, Air Quality, Environmental Chemistry, Monitoring/Modeling, Environmental Monitoring, Atmospheric Sciences, Environmental Engineering, atmospheric dispersion models, atmospheric measurements, model-based analysis, area of influence analysis, source receptor based methods, source apportionment, chemical characteristics, emissions monitoring, environmental measurement, airborne particulate matter, air quality models, air quality model, air sampling, particulate matter mass, analytical chemistry, modeling studies, real-time monitoring, aerosol analyzers, chemical speciation sampling, particle size measurementRelevant Websites:
Progress and Final Reports:
Original AbstractThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.