Final Report: Applying Data Assimilation and Adjoint Sensitivity to Epidemiological and Policy Studies of Airborne Particulate Matter

EPA Grant Number: R833865
Title: Applying Data Assimilation and Adjoint Sensitivity to Epidemiological and Policy Studies of Airborne Particulate Matter
Investigators: Stanier, Charles , Carmichael, Gregory R. , Field, R. William , Krewski, Daniel , Kumar, Naresh , Oleson, Jacob J.
Institution: University of Iowa , University of Ottawa
EPA Project Officer: Ilacqua, Vito
Project Period: February 1, 2009 through January 31, 2013 (Extended to January 31, 2014)
Project Amount: $899,401
RFA: Innovative Approaches to Particulate Matter Health, Composition, and Source Questions (2007) RFA Text |  Recipients Lists
Research Category: Health Effects , Particulate Matter , Air

Objective:

Broadly stated, the objectives of the grant were to explore novel combinations of model-simulated fine particulate matter and its components together with observations to make more accurate and finer spatial resolution estimates of air pollution. Furthermore, these model-simulated and hybrid concentration values are linked with available epidemiological datasets to (a) determine and communicate best practices for working with modeled exposure data in epidemiological studies, as well as (b) to learn new insights about the health effects of fine particulate matter, source-resolved fine particulate matter, and speciated particulate matter. Finally, the project proposed to demonstrate the potential for target-oriented modeling using adjoints of 3D chemical transport models such as CMAQ.

Summary/Accomplishments (Outputs/Outcomes):

Large-scale multi-year CMAQ simulations:
 
All of the project objectives rely on large-scale air quality modeling over North America for the period 2002-2006. The modeling domains for this project are shown in Figure 1. To enable the air quality modeling, 5 years of meteorology simulations were completed using the Weather Research Forecasting (WRF) model and MCIP for the CONUS 36 km parent domain and four subdomains. In addition to showing the model domains, Figure 1 also shows evaluation areas (black numbers 1-28) used for evaluation of meteorological model skill. An optimized WRF setup was identified for the project and is discussed in section 2.1. Meteorology outputs were evaluated for the year 2002 to establish that meteorological model skill was similar to other state-of- science models for the U.S. The WRF configuration and its evaluation are discussed in section 2.1 and in an appendix (A1).
 
 
Jaemeen Baek (postdoctoral researcher) performed emissions preprocessing (using the SMOKE emissions model) for the domains shown in Figure 1. CMAQ PM2.5 results are available at 3-h temporal resolution for the modeling period and domains. As an example, the annual average 36 km PM2.5 simulation result for 2002 is shown in Figure 2.
 
 
The model result was evaluated to establish similar model-observation skill relative to other contemporaneous CONUS implementations of CMAQ. The model exhibited negative bias in summer and positive bias in winter, a characteristic of many contemporary air quality models. Model evaluation summary is shown in Table 1.
 
 
 
Links to the American Cancer Society Cohort Study
 
Air pollution concentrations predicted by CMAQ were linked to the American Cancer Society (ACS) study—specifically the concentrations and health data used in the ACS Extended Reanalysis Study by Krewski et al. (2007). The linkage was performed at the Metropolitan Statistical Area (MSA) level and at the individual participant level. An example of MSA areas used in the ACS analysis is shown in Figure 3.
 
 
Model skill, after aggregation to the MSA level, is consistent with the levels of bias and error shown in Table 1. A scatterplot of observed PM2.5 concentrations (x-axis) vs. modeled PM2.5 (y-axis) is shown in Figure 4.
 
 
A flowchart demonstrating the linkage process is found in Figure 5. We have successfully matched concentrations to 669,253 subjects with during the ACS followup period 1982-2004. This includes 237,253 mortality cases and 84,161 cardiovascular mortality cases. Preliminary hazard ratios for PM2.5 for selected causes of death have been calculated adjusting for individual level covariates and stratifying the baseline hazard function by age, gender, and race using the standard Cox survival model. Analysis is ongoing examining the influence of source of exposure estimate (measurements averaged at the MSA level, model averaged at the MSA level, model value at 36 km, model value at highest available resolution, etc.) and at varying levels of aggregation in the cohort. Analysis will be completed using total PM2.5 and source-resolved PM2.5 aggregated into simplified groupings of 9, 7, 6, or 4 source categories. For example, the 4 source analysis will be categorized as (a) organic primary PM2.5 (mobile + biomass/open burning); (ii) dust / crustal; (iii) mainly inorganic/crustal/metal primary PM2.5 (stationary combustion + industrial point sources); and (iv) total secondary PM2.5.
 
 
Data assimilation
 
Two types of data assimilation have been developed as part of the project. The first combines satellite data from the MODIS sensor and combines it with CMAQ using optimal interpolation. An example of the model-data fusion is shown in Figure 5.
 
 
By using optimized settings for data assimilation of CMAQ PM2.5 and MODIS aerosol optical depth (AOD), a domain-wide average improvement in fractional error from 1.2 to 0.97 at IMPROVE monitoring sites was demonstrated, and from 0.99 to 0.89 at STN monitoring sites. Somewhat larger improvements to fractional bias were observed. However, for 38% of the month-region combinations, MODIS OI degraded the forward model skill, due to biases and outliers in MODIS AOD. Negative bias in CMAQ for coarse aerosols (i.e., wind-blown dust) was found to have a much smaller influence than positive bias in MODIS AOD. Use of newer Level 3 MODIS products with reduced bias through post-processing to remove/reduce and screen out known bias and errors gave somewhat better results.
 
The second type of data assimilation was to combine surface PM2.5 measurements with CMAQ model product using optimal interpolation. This yields superior results when evaluated with PM2.5 measurements withheld from the data assimilation. An example of this interpolation process is shown in Figure 6. This type of assimilation was able to (relative to data randomly withheld from the model-data fusion) to improve r2 from 0.36 to 0.76, reduce fractional error from 0.43 to 0.15, reduce normalized meal error from 36% to 13%, and reduce RMSE from 5.4 to 2.3 μg m-3.
 
 
Understanding the information content of MODIS satellite aerosol measurements
 
Results on the statistical properties of MODIS aerosol optical depth (AOD) relative to surface PM2.5 measurements were examined and described in the in press article entitled “"On the Spatio-Temporal Relationship Between MODIS AOD and PM2.5 Particulate Matter Measurements."” The paper demonstrates that (using Chicago air quality measurements taken between 2007 and 2008) AOD and PM2.5 can be efficiently linked statistically using a weekly hierarchical model. We determined an optimal space-time window in the small-scale correlation study for imputing PM2.5 values around AOD measurements. We also demonstrated that AOD measurements aggregated weekly can be used to predict PM2.5 measurements. A 5,000 to 6,000 meter radius with an 8- to 12-hour timeframe was determined suitable for collecting AOD measurements around a given PM2.5 value when attempting to perform localized predictions. Spatio-temporal structure within the dataset was not significant enough to be exploited by the statistical model. Even a very flexible latent spatio-temporal process (exponential spatial decay with a separable autoregressive temporal process) did not detect any systematic spatiotemporal variability of PM2.5 around a mean structure containing only AOD.
 
Working with source-resolved PM2.5
 
The University of Iowa project includes ongoing work to simulate source-resolved PM2.5 and link the results to the ACS cohort to identify more and less hazardous types of aerosols. This is being done by adding tracers for primary aerosols categorized into 20 sources, such as wild fires, fireplaces, natural gas combustion, etc. The list of source categories can be found in Table 2. Primary PM2.5 - EC, primary OC and unspecified PM2.5 - from specific sources are added as tracers in CMAQ. Progress as of the grant ending period is that the emissions files have been processed, but that the source resolved simulations are incomplete.
 
Figure 8 shows the emissions for the L2 source category (non road diesel).
 
 
 
 
Efforts towards development of the adjoint model of CMAQ
 
Target-oriented modeling has significant potential for recovery and visualization of the influence of spatially resolved emissions on user-defined targets (giving the name to target-oriented modeling). The power of the target-oriented approach is (a) the flexibility to define the target; and (b) the ability to calculate the influence of sources in all model grid cells to the target. Example targets can be simple (average concentration over the domain) or complicated (number of NAAQS ozone violations in a specific state). Relevant to the air pollution PM-health questions are population weighted aerosol exposures, possibly weighted according to source-specific health risks.
 
An example of a target-oriented calculation is the spatially resolved influence of NOx emissions on an ozone violation index. Such a map is possible using the computational machinery (called the adjoint model) associated with variational data assimilation. It is much more difficult to construct a similar target-oriented sensitivity map using traditional forward sensitivity methods. The adjoint allows efficient calculation of the grid cells that have (at a previous time step) influence on the current concentration at a receptor point. With forward sensitivity calculations, the influences are instead propagated from source to receptor.
 
As the particle component toxicity question is unraveled, the target-oriented approach will be invaluable for the evaluation and implementation of effective air quality regulations. If a target function could be identified connecting excess mortality with aerosol concentrations (or an ensemble of targets to account for uncertainty in the health effects), then maps can be generated relating emission reductions in specific sectors and locations to associated decreases in the burden of morbidity and mortality caused by air pollution. The targets can be customized to focus on specific locations and/or on sensitive subpopulations, and could target gas phase precursors of secondary aerosols. By combining health and climate endpoints in one target, climate-health co-benefits could be optimized.
 
The proposal called for application of a 3D chemical transport model with an adjoint model (including gas, transport, and aerosol processes). This was a necessary component to the demonstration of target-oriented sensitivity calculations. We chose to join a CMAQ adjoint development group, led by Amir Hakami and Daven Henze, in 2010. The initial thinking was that a community adjoint model could be completed in about 2 years. Dr. Baek took on the task of building an adjoint model of aqueous chemistry (AQCHEM). Dr. Baek’s approaches to the aqueous chemistry adjoint were promising (and are described herein) but to date have neither yielded a suitable match to the native forward CMAQ model, nor a sufficiently accurate and robust tangent linear or adjoint model.
 
CMAQ solves the aqueous chemistry as a series of equilibrium equations, removal, SOA formation, and kinetic reactions. The default CMAQ version solves the aqueous chemistry via the bisection method (for pH) and the Euler method (for aqueous chemical kinetics). There are other solvers that give more accurate solutions than the Euler method but they are not used in CMAQ since they are resource demanding.
 
Our approach for developing the CMAQ adjoint model was to replace the default CMAQ solver with the Rosenbrock solver that is based on Kinetic PreProcessor (KPP) generated codes. KPP generates FORTRAN codes that solve the set of chemical reactions using the Rosenbrock solver. The use of KPP was anticipated to lead to higher accuracy and to the efficient generation of an automated adjoint from KPP. Forward, tangent linear, and adjoint models were developed as part of the project. Output for sulfate from the University of Iowa KPP/Rosenbrock solver is compared to the bisection/Euler solver of CMAQ in Figure 9.
 
 
Although qualitatively and quantitatively very similar (see Figure 9), very precise agreement in the forward models is a requirement (see for example the differences over Lake Erie), and we were not able to achieve sufficient agreement in the 3D forward model. Furthermore, when running the 3D adjoint model (as opposed to the forward model shown in Figure 9), unresolved runtime errors, infeasible small integrator step sizes, and nonphysical values were encountered.
 
Additional discussion of the AQCHEM KPP model can be found in Appendix 4.
 
Effects of resolution
 
Because both meteorological and PM2.5 models were available for this project at 36-km, 12-km, and 4-km resolution, it was natural to study the effect of resolution on (i) meteorology skill, (ii) total and source-resolved PM2.5, and (iii) epidemiological results. The first two of these three were completed. As discussed in the Appendix A1, increased meteorology resolution usually (but not always) increases meteorology skill using the WRF configuration adopted in this work.
 
The effect of resolution on PM2.5 concentrations is shown by example in Figures 9 and 10. High emission areas become resolved as resolution increases to 12 km (Figure 10) and to 4 km (Figure 11).
 
 
 
The impact of the increased resolution is shown in Table 3. Mean modeled concentrations at all of the receptors increase with the higher resolution. Since the model is already biased high in 3 of 4 seasons this causes a small decrease in performance. The project team is continuing to examine whether similar results are shown for primary emission species and in the various 36 km vs. 12 km comparisons. Note that while Table 3 is an “apples-to-apples” comparison, Table 1 is not – the 12 km and 36 km simulations described in Table 1 are not for the same domain boundaries.
 
 
 
Table 4 summarizes key project outcomes and results.
 
 
The proposal called for application of a 3D chemical transport model with an adjoint model (including gas, transport, and aerosol processes). This was a necessary component to the demonstration of target oriented sensitivity calculations. We chose to join a CMAQ adjoint development group, led by Amir Hakami and Daven Henze, in 2010. The initial thinking was that a community adjoint model could be completed in about two years; however, four years into the project, a CMAQ adjoint modeling system suitable for the project goals is not yet available.
 
Large-scale multi-year CMAQ simulations:
 
Table 5 includes detailed settings for the WRF v 3.1.1 model run for this project (see also Appendix 1).
 
 
Emissions were processed using the Sparse Matrix Operator Kernel (SMOKE) model version 2.5. Additional detail on emission modeling choices is found in Table 6.
 
 
The photochemical model version was CMAQ version 4.7.1. The computing environment used for the modeling was primarily the Helium cluster at University of Iowa. It has a total of 1,600 cores, of which 16-48 were used for the model runs. The processors are Intel(R) Xeon(R) CPU X5550 @ 2.67GHz with 8MB cash (4 cores, 48 GB Memory). Helium was running Linux version 2.6.18 / Intel Fortran version 11.1. Parallelization was done with Open MPI/MPICH.
 
Data assimilation of satellite AOD and surface PM2.5 measurements using Optimal Interpolation

Details of methods selection and results for satellite data assimilation are attached as Appendix 2. Appendix 3 contains methodological details and results for data assimilation of surface PM2.5 measurements.

Efforts towards development of the adjoint model of CMAQ

Details of methods and results for our development work regarding a new version of the CMAQ module AQCHEM using the Kinetics Preprocessor (KPP) tool with automatic generation of adjoint code can be found in Appendix 4.

 

Conclusions:

Progress has been made in all of these goals, but progress to date unfortunately remains short of our proposed outcomes. As the formal grant period is over, we are summarizing progress and outcomes to date in this report; however, we continue to work to achieve a greater fraction of the proposed goals and publish more manuscripts on the work related to our EPA grant. We will notify EPA of additional publications resulting from this project as they are accepted. Reasons for the incomplete objectives are discussed in the report.

This research demonstrates that computer models of atmospheric particulate matter, such as the CMAQ model, have a role to play in studies of the health effects of air pollution. However, biases in concentrations and composition of particulate matter need to be considered and addressed, if possible, using assimilation of measured data (the method studied in this work) or by other means such as improvement of models and model inputs.
 
One of the potential advantages demonstrated in this work of using modeled concentrations rather than measured concentrations is the ability to extend epidemiological studies to a wider population. In our work, we were able to estimate concentrations for 37% more subjects than in the original work that was restricted to locations close to monitors; furthermore, the added subjects are a relatively less studied group (since they live in lower population density areas and often do not have nearby monitors).
 
Of the data assimilation techniques studied in our project, the assimilation of surface PM2.5 measurements is quite feasible and optimal interpolation, although a simple method, is straightforward to implement, fast, and yields significant improvement in PM2.5 model-observation skill. The alternate method, assimilation of aerosol optical depth measured from the orbiting MODIS sensor, showed less improvement of model skill. Satellite aerosol measurements will continue to be an important part of health and climate studies – however, their incremental information content to inform outdoor surface concentrations is limited, especially in areas with dense surface observation networks and well- developed emissions inventories and meteorology networks for assimilation into transport or weather models.

 

References:

Krewski D, Jerrett M, Burnett RT, Ma R, Hughes E, Shi Y, Turner MC, Pope CA, Thurston G, Calle EE, Thun MJ, Beckerman B, DeLuca P, Finkelstein N, Ito K, Moore DK, Newbold KB, Ramsay T, Ross Z, Shin H, Tempalski B. 2007. Extended Follow-up and Spatial Analyses of the American Cancer Society Study Linking Particulate Air Pollution and Mortality. Cambridge MA, Health Effects Institute.
 
Morris RE, McNally DE, Tesche TW, Tonnesen G, Boylan JW, Brewer P. 2005. Preliminary evaluation of the community multiscale air, quality model for 2002 over the southeastern United States. <em>Journal of The Air & Waste Management Association<em/> 55(11):1694-1708.


Journal Articles on this Report : 9 Displayed | Download in RIS Format

Other project views: All 26 publications 9 publications in selected types All 9 journal articles
Type Citation Project Document Sources
Journal Article Baxter LK, Dionisio KL, Burke J, Sarnat SE, Sarnat JA, Hodas N, Rich DQ, Turpin BJ, Jones RR, Mannshardt E, Kumar N, Beevers SD, Ozkaynak H. Exposure prediction approaches used in air pollution epidemiology studies: key findings and future recommendations. Journal of Exposure Science & Environmental Epidemiology 2013;23(6):654-659. R833865 (Final)
R834799 (2014)
R834799 (2015)
R834799 (2016)
R834799 (Final)
R834799C004 (2013)
R834799C004 (2014)
R834799C004 (2015)
R834799C004 (Final)
  • Full-text from PubMed
  • Abstract from PubMed
  • Associated PubMed link
  • Full-text: ResearchGate-Full Text HTML
    Exit
  • Abstract: Nature Publishing-Abstract
    Exit
  • Other: ResearchGate-Full Text PDF
    Exit
  • Journal Article Fahey KM, Carlton AG, Pye HOT, Baek J, Hutzell WT, Stanier CO, Baker KR, Appel KW, Jaoui M, Offenberg JH. A framework for expanding aqueous chemistry in the Community Multiscale Air Quality (CMAQ) model version 5.1. Geoscientific Model Development 2017;10(4):1587-1605. R833865 (Final)
    R835041 (Final)
  • Full-text: GMD-Full Text PDF
    Exit
  • Abstract: GMD-Abstract
    Exit
  • Journal Article Kumar N, Chu AD, Foster AD, Peters T, Willis R. Satellite remote sensing for developing time and space resolved estimates of ambient particulate in Cleveland, OH. Aerosol Science and Technology 2011;45(9):1090-1108. R833865 (Final)
  • Full-text from PubMed
  • Abstract from PubMed
  • Associated PubMed link
  • Full-text: AST-Full Text PDF
    Exit
  • Abstract: AST-Abstract HTML
    Exit
  • Journal Article Liang D, Kumar N. Time-space Kriging to address the spatiotemporal misalignment in the large datasets. Atmospheric Environment 2013;72:60-69. R833865 (Final)
  • Full-text from PubMed
  • Abstract from PubMed
  • Associated PubMed link
  • Full-text: ScienceDirect-Full Text HTML
    Exit
  • Abstract: ScienceDirect-Abstract
    Exit
  • Other: ScienceDirect-Full Text PDF
    Exit
  • Journal Article Porter AT, Oleson JJ, Stanier CO. On the spatio-temporal relationship between MODIS AOD and PM2.5 particulate matter measurements. Journal of Data Science 2014;12(2):255-275. R833865 (2012)
    R833865 (Final)
  • Full-text: Journal of Data Science-Full Text PDF
    Exit
  • Abstract: Journal of Data Science-Citation
    Exit
  • Journal Article Zhang HL, Ying Q. Secondary organic aerosol from polycyclic aromatic hydrocarbons in Southeast Texas. Atmospheric Environment 2012;55:279-287. R833865 (Final)
  • Full-text: ScienceDirect-Full Text HTML
    Exit
  • Abstract: ScienceDirect-Abstract
    Exit
  • Other: ScienceDirect-Full Text PDF
    Exit
  • Journal Article Zhang HL, Zhang HL, Li JY, Ying Q, Guven BB, Olaguer EP. Source apportionment of formaldehyde during TexAQS 2006 using a source-oriented chemical transport model. Journal of Geophysical Research-Atmospheres 2013;118(3):1525-1535. R833865 (Final)
  • Full-text: Wiley-Full Text PDF
    Exit
  • Abstract: Wiley-Abstract & Full Text HTML
    Exit
  • Journal Article Zhang H, Li J, Ying Q, Yu JZ, Wu D, Chen Y, He K, Jiang J. Source apportionment of PM2.5 nitrate and sulfate in China using a source-oriented chemical transport model. Atmospheric Environment 2012;62:228-242. R833865 (Final)
    R833864 (2011)
    R833864 (2012)
    R833864 (2013)
    R833864 (Final)
  • Full-text: ScienceDirect-Full Text HTML
    Exit
  • Abstract: ScienceDirect-Abstract
    Exit
  • Other: ScienceDirect-Full Text PDF
    Exit
  • Journal Article Zhang H, Chen G, Hu J, Chen S-H, Wiedinmyer C, Kleeman M, Ying Q. Evaluation of a seven-year air quality simulation using the Weather Research and Forecasting (WRF)/Community Multiscale Air Quality (CMAQ) models in the eastern United States. Science of the Total Environment 2014;473-474:275-285. R833865 (Final)
    R833864 (2013)
    R833864 (Final)
  • Abstract from PubMed
  • Full-text: ScienceDirect-Full Text HTML
    Exit
  • Abstract: ScienceDirect-Abstract
    Exit
  • Other: ScienceDirect-Full Text PDF
    Exit
  • Supplemental Keywords:

    Air, health effects, human health, modeling

    Progress and Final Reports:

    Original Abstract
  • 2009 Progress Report
  • 2010 Progress Report
  • 2011 Progress Report
  • 2012 Progress Report