Grantee Research Project Results
2004 Progress Report: Integrating Numerical Models and Monitoring Data
EPA Grant Number: R829402C002Subproject: this is subproject number 002 , established and managed by the Center Director under grant R829402
(EPA does not fund or establish subprojects; EPA awards and manages the overall grant for this center).
Center: Center for Integrating Statistical and Environmental Science
Center Director: Stein, Michael
Title: Integrating Numerical Models and Monitoring Data
Investigators: Stein, Michael , Kotamarthi, V. Rao , Lesht, Barry , Schwab, David , Beletsky, Dmitry , Stroud, Jonathan , Nakamura, Noboru
Current Investigators: Stein, Michael , Kotamarthi, V. Rao , Lesht, Barry , Schwab, David , Beletsky, Dmitry , Stroud, Jonathan , Chen, Li , Nakamura, Noboru , Amit, Yali , Zhang, Zepu
Institution: University of Chicago , National Oceanic and Atmospheric Administration , University of Michigan , Argonne National Laboratory , University of Pennsylvania
Current Institution: University of Chicago , Argonne National Laboratory , National Oceanic and Atmospheric Administration , University of Michigan , University of Pennsylvania
EPA Project Officer: Packard, Benjamin H
Project Period: March 12, 2002 through March 11, 2007
Project Period Covered by this Report: March 12, 2004 through March 11, 2005
RFA: Environmental Statistics Center (2001) RFA Text | Recipients Lists
Research Category: Environmental Statistics , Ecological Indicators/Assessment/Restoration , Human Health , Aquatic Ecosystems , Air
Objective:
This project addresses statistical approaches to using both monitoring data and output from a physical model to assess the state of the physical environment. This work can be organized into eight sub-projects, and several sections of this report are divided into parts for each of these. These projects cover a broad range of environmental applications including air pollution monitoring, evaluation of the Community Multiscale Air Quality (CMAQ) model, stratospheric ozone, adjustment of emissions inventories, and hydrodynamics and sediment transport in Lake Michigan. The development of statistical models and methods for spatial-temporal processes is central to much of the Center for Integrating Statistical and Environmental Science’s (CISES) work and throughout environmental statistics. To this end, we have been addressing the theoretical, computational, and practical problems that arise with each perspective challenging and supporting the other. Sub-projects B and D (see lettered sections below) have active collaborations with U.S. Environmental Protection Agency (EPA) scientists and sub-project C grew out of work by an EPA scientist, Alice Gilliland, although there has not been any active collaboration in recent months. Graduate students associated with the project have visited EPA in Research Triangle Park and we will continue to encourage these visits. We would welcome visits to CISES by our present EPA collaborators as well as by new potential collaborators.
Progress Summary:
A. Space-time Covariance Functions
We are studying theoretical properties of covariance functions for processes varying in space and time, which is fundamental to the statistical analysis of environmental data. Two specific problems we have focused on in the past year are models when the region of space is the surface of a sphere and models for processes that are not symmetric in space-time (that is, models that do not look the same if one reverses the direction of time). In addition to model development, we have been working on computational methods related to fitting these models and diagnostics for assessing goodness of fit. Principal investigator (PI): Michael Stein. Graduate student: Mikyoung Jun.
B. Comparing CMAQ Output to Monitoring Data
Work on this sub-project is currently focused on developing and fitting some of the space-time models described in sub-project A to sulfate concentrations both before and after removing the variation in these data that can be explained by CMAQ output. In particular, we are studying how using the CMAQ output changes the nature of the space-time covariances. This work has close connections to the problem of space-time mapping of pollution levels in the Chicago area that is part of the CISES project “Air Quality and Reported Asthma Incidence in Illinois.” PIs: Rao Kotamarthi, Michael Stein. Graduate student: Mikyoung Jun. EPA collaborators: Peter Finkelstein, Robin Dennis.
C. Correcting Emissions by Comparing Model Output and Monitoring Data
Persistent disagreements between CMAQ output and monitoring data may be largely due to problems with the emissions inventory. In particular, this may be the case for ammonia, for which emissions are highly uncertain. We can then view the problem of finding emissions fields that produce the best agreement between CMAQ output and observations as an especially difficult inverse problem. One reason for the difficulty of the inverse problem is that solving the forward problem—that is, running CMAQ to go from emissions to concentrations—is computationally intensive. Our approach to reducing the computational burden is to develop simplified and relatively fast-running single and multiple tracer versions of CMAQ that, using statistical methods, can be combined with a very small number of full CMAQ runs to provide accurate surrogates for the concentration fields one would get if it were possible to do a large number of full CMAQ runs. Additional work on this project includes statistical inference for area averages of spatial processes, with its application to estimating region-wide corrections in emissions. PIs: Rao Kotamarthi, Michael Stein. Graduate student: Hae-Kyung Im. EPA collaborator: Alice Gilliland.
D. Statistical Issues Arising in the Study of High-Resolution Versions of CMAQ
Our EPA collaborator, Jason Ching, has been developing a version of CMAQ that runs at very high resolution. We have begun investigating methods for describing the space-time variations in air pollution from this model and for comparing this high-resolution output with lower resolution outputs. These methods focus on the use of empirical variograms and related quantities to describe the differences between the space-time variations of models at different resolutions. One specific application of this work is to quantify and understand subgrid scale variability in model output.
With the new PI, Noboru Nakamura, we seek to gain a better understanding of model errors by theoretical and numerical study of simple advection-diffusion models. PIs: Noboru Nakamura, Michael Stein. Graduate student: Xiaofeng Shao. EPA collaborator: Jason Ching.
E. Data Assimilation in Hydrodynamic Models
We are addressing a number of challenges in applying data assimilation methods to combining remotely sensed observations with a sediment transport program on Lake Michigan. Some of the challenges include the spatially dependent observation errors, the strongly nonlinear relationship between the observed reflectances and the sediment levels, the potentially large number of observations at a time (over 10,000), and the critical impact of re-suspensions on sediment levels. We are also considering data assimilation of the hydrodynamics program that is an input to the sediment transport program. The measurements available here are hourly current observations at 11 moorings, 10 of which are in the southeast corner of the lake. Ultimately, one would like to be able to carry out a coupled data assimilation of the meteorology, hydrodynamics, and sediment transport that makes simultaneous use of the satellite images, observed currents, and meteorological measurements. PIs: Dmitry Beletsky, Dave Schwab, Michael Stein, Jon Stroud.
F. Estimating Deformations of Isotropic Random Processes
Many environmental processes show evidence of nonstationarity in space. Most efforts to date on estimating such nonstationarities assume one has many replicates of the process over time. However, with remotely sensed or other large datasets for processes that are isotropic after a smooth spatial deformation, it may be feasible to estimate this deformation from a single realization of the process, thus removing the need to assume stationarity and/or independence across time. We are developing methods for representing spatial deformations, statistical approaches for estimating these deformations based on a single dense realization of the process, and fast algorithms for computing these estimates. We now have most of the pieces in place (both methods and software) for automatic and fast estimation of such deformations when the original observations are on a regular grid. PI: Michael Stein. Graduate student: Ethan Anderes.
G. Statistical Analysis of Phytoplankton in Lake Michigan
Except for completing the process of turning Leah Welty’s doctoral thesis into papers, no new work is being done on this project. PIs: Barry Lesht, Michael Stein. Graduate student: Leah Welty.
H. Combining Physical Models and Total Ozone Mapping Spectrometer (TOMS) Ozone Data for Assessing Stratospheric Ozone Trends
This sub-project fits more into the project “The Detection of a Recovery in Stratospheric and Total Ozone,” and it is described in more detail there. However, the notion of combining physical models for the stratosphere with observations does fit in with the theme of this project. PIs: Michael Stein, Don Wuebbles. Postdoctoral research associate: Serge Guillas. Graduate Student: Junjie Xia.
Results to Date
A. Space-time Covariance Functions. We have developed new ways of modeling space-time processes on the sphere, a topic that has received almost no attention to date in the statistical literature, but is transparently fundamental to modeling environmental processes on the global scale. A particular focus of our work is on processes that are not space-time symmetric, which will essentially always be the case for air pollution (and most other environmental processes) on appropriate scales in space and time. A paper on this topic is nearing completion.
Another major focus is on the development of modeling strategies, computational methods, and diagnostics for monitoring data collected at regular time intervals over a long period of time. Bringing together ideas from multiple time series and spatial statistics are keys to this work. In particular, we have introduced an approach to modeling monitoring data that is spectral in the time domain and spatial in the space domain and yields a flexible and relatively easy to apply approach to the analysis of regular monitoring data. We view this approach as an important alternative to elaborate parametric modeling of space-time covariance functions that has been the focus of most recent statistical research (including our own) in this area. We have studied the effectiveness of spectral and time domain approximations to the likelihood function for regular monitoring data. We have developed spectral and time domain diagnostics for assessing the goodness of fit of models for space-time covariance functions, and we demonstrated their effectiveness at illuminating misfits of models that cannot be seen through standard diagnostics from spatial statistics. A paper on this topic has recently been completed (Stein, 2004).
B. Comparing CMAQ Output to Monitoring Data. We have completed work (Jun and Stein, 2004) on numerical and graphical summaries to compare CMAQ output and monitoring data that has been published in Atmospheric Environment. We have been applying some of the models described in sub-project A to describe the relationship between modeled and observed sulfate levels. We find that, although the CMAQ output does explain some of the space-time dependencies in the observations, there is still substantial space-time structure in the residual variation.
C. Correcting Emissions by Comparing Model Output and Monitoring Data. We have completed Bayesian analyses of the average difference between observed and CMAQ-modeled monthly ammonia concentrations, which can then be used as a global correction factor for ammonia emissions in a month. However, we are unsatisfied with using the same correction factor throughout the study region and have thus moved our focus to methods that will allow a spatially varying emissions correction.
To this end, we have developed single- and multiple-tracer versions of the CMAQ model for inverse modeling applications. The CMAQ model uses anywhere from 70 to 100 different gas and aerosol species depending on the chemical scheme configuration used in the model setup. In order to make a large number of runs quickly and with modest deviation from the full model, a single tracer version of the CMAQ model was implemented at CISES. This model uses ammonia as the only trace gas, with no other reactive or inert tracers in the model. The model, as a result, does not compute any gas phase or aerosol phase chemistry. This single trace gas in the model is removed from the atmosphere by dry and wet deposition, the largest sinks for ammonia in the atmosphere. The model takes approximately 10 minutes for a 1-day simulation on the CISES Linux cluster, thus giving us an opportunity to make seasonal and year long runs in a short wall clock time.
A key component in the development of an inversion scheme is the development of source receptor relations for the various regions within the model domain to better evaluate measurements of wet deposited ammonia used to constrain the ammonia emissions in the model. To address this issue we have implemented a 100-tracer version of the CMAQ model, with 100 “colored” ammonia-like tracers in the model. All 100 tracers behave similar to ammonia in the single-tracer version of the model and undergo wet and dry deposition. The wet and dry deposited mass of each of these 100 ammonia-like tracers is separately accounted in the model for developing the source-receptor correlations. The 100 “tagged” tracers will have sources only from predefined regions within the model, with each tracer emitted from a selected region exclusively. Initial, boundary, and emission files for this pseudoLagrangian-type model have been developed, and the model is now being tested.
We have begun developing methods for the following statistical problem that arose out of this project: estimating the spectral density of an isotropic process observed at scattered locations. The goal is to create a flexible alternative to parametric approaches that are generally used in fitting spatial covariance functions while still getting estimates that are sensible (e.g., smooth) in the spectral domain. Our approach is to model the isotropic spectral density using splines at lower frequencies, assuming that the spectral density decays algebraically at high frequencies. We have implemented a fast Hankel transformation to calculate the covariance function from the spectral density rapidly, and we will soon apply this approach to the problem of estimating the average difference between observed and CMAQ-modeled monthly ammonia concentrations.
D. Statistical Issues Arising in the Study of High-Resolution Versions of CMAQ. Following up on work done last year, we have found theoretical arguments to support our empirical finding that a simple bilinear interpolation scheme of low-resolution output gives noticeably better agreement with high-resolution output than assuming that pollution levels are constant within a grid cell at the lower-resolution. A paper on this topic is nearing completion. At our recommendation, an EPA contractor carried out CMAQ runs that will allow us to investigate how changing model resolution and other aspects of CMAQ affect the output. Preliminary analyses of NO2 levels show intriguing patterns in the role of initial conditions depending on the time of day one considers.
We have begun putting together a list of models and associated software needed to undertake a deeper study of model errors. Simple advection-diffusion equations with many point sources allow a great deal of exact analysis, are very fast to simulate, and produce output that in many ways mimics what is seen for a model such as CMAQ, so these equations may provide a good case study for investigating model error.
E. Data Assimilation in Hydrodynamic Models. We now have a fully operational ensemble Kalman filter algorithm in place for combining a two-dimensional (2-D) sediment transport model with Sea-Viewing Wide Field-of-View Sensor (SeaWiFS) satellite images. The method allows for very high-dimensional data (10,000 or more observations at a time) and nonlinear observational relationships, and it provides real-time forecasts of unobserved sediment levels and probability-based measures of uncertainty. We have also implemented an ensemble Kalman smoothing algorithm for producing hindcasts of sediment levels based on the full sequence of images. We have made substantial strides in developing realistic statistical models for the observation errors and physical model errors. These models are generated by running the numerical model with ensembles of parameter values and perturbed advections and shear stress fields. Finally, we have made major advances in developing computational methods for on-line sequential parameter estimation in high-dimensional models.
We have undertaken some Empirical Orthogonal Function (EOF) analyses of observed and modeled currents at the available mooring sites. Interestingly, the second and third modeled EOFs match the corresponding observed EOFs better than the first modeled EOF. The first empirical mode resembles closely the first dynamic mode in southern Lake Michigan. Therefore the results might indicate that there is a bias/error in wind direction that is responsible for large-scale circulation patterns. Comparisons of the relative eigenvalues between the observed and modeled covariance matrices of the currents indicate substantial discrepancies (the ratio of the largest-to-smallest relative eigenvalue is over 300) that deserve further attention.
F. Estimating Deformations of Isotropic Random Processes. Our thinking about deformation estimation has evolved substantially in the last year, and we now have a much better understanding of the statistical, geometric, and computational issues associated with this problem. We now have implemented code for most aspects of this problem, and we hope to have a fully implemented, automatic procedure for estimating deformations in the near future. We hope these methods will be broadly useful in modeling of non-stationary spatial and spatial-temporal processes, which occur commonly in environmental applications. They are likely to be of particular value with large spatial data sets such as those often obtained with remote sensing.
G. Statistical Analysis of Phytoplankton in Lake Michigan. This sub-project is winding down and there are no new results to report.
H. Combining Physical Models and TOMS Ozone Data for Assessing Stratospheric Ozone Trends. We have demonstrated that model output of stratospheric ozone from the most recent version of the University of Illinois at Urbana-Champaign 2-D (UIUC 2-D) model can explain the temporal variations in total column ozone substantially better than a more empirical approach, especially at lower frequencies. More importantly, we have shown that by making use of model runs under different emissions scenarios, we can reduce the time needed to detect a turnaround in stratospheric ozone by up to a factor of 2 at some latitudes. The methods we developed should have broad applicability in using physical models to improve trend estimates. A paper on this topic (Guillas, et al., 2004) has recently been published.
Future Activities:
A. Space-time Covariance Functions
Although we plan to continue our development of new models for space-time covariance functions and our investigation of their properties, we intend to shift our focus to inferential and computational issues that arise in the application of these models. In particular, we will work on likelihood approximations and diagnostic methods for large datasets compiled from fixed, irregularly sited monitoring networks. Our work to date describes effective approaches to these problems when there are no missing data, and we seek to extend and modify these approaches so that they can be used when the fraction of missing data is relatively modest.
B. Comparing CMAQ Output to Monitoring Data
We will apply the methods we have developed to different pollutants and different versions of CMAQ to assess the various versions’ abilities to capture spatial-temporal variations in pollution levels. We plan to relate errors in CMAQ output to the meteorological inputs in order to learn more about conditions that give CMAQ particular trouble and to quantify the fraction of CMAQ errors that can be attributed to errors in meteorological inputs.
C. Correcting Emissions by Comparing Model Output and Monitoring Data
We will complete work on estimating the average difference between observed and modeled ammonia concentrations, including a comparison between the parametric and nonparametric approaches to covariance function estimation. Starting off with relatively straightforward regression methods, we will investigate how well the output from full CMAQ runs under multiple modified emissions scenarios can be predicted, based on a large number of single-tracer model runs and a very small number of full CMAQ runs. Once we have a working version of the multiple-tracer version of the CMAQ model, we will compare how well these results predict output from full CMAQ runs to the many runs of the single-tracer model. Once we have some understanding of whether the multiple- or single-tracer approach works better, we will concentrate on developing more sophisticated statistical approaches for predicting the results of full CMAQ runs. The ultimate goal is to then use this ability to handle the forward problem of going from emissions to depositions, to estimate corrections in the emissions scenarios by solving the inverse problem.
D. Statistical Issues Arising in the Study of High-Resolution Versions of CMAQ
We will continue to develop interpretable and useful statistical (both numerical and graphical) descriptions of what is lost when using lower resolution models. Understanding subgrid variability is key to these efforts. Specifically, it is essential to capture the spatial and temporal dependence in the subgrid variability to, for instance, give sensible inferences about the conditional distribution of extreme values of high-resolution output given lower resolution output. One interesting possibility we want to investigate is the use of a high-resolution emissions inventory in conjunction with a lower resolution model output to make predictions about maximum pollution levels.
A second focus of this sub-project is to conduct extensive further analyses of the extra CMAQ runs we now have for elucidating how runs at different resolutions diverge when given identical initial and boundary conditions. This work will be complemented by our in-depth study of model errors in simple advection-diffusion models. We expect that studying such simple models will give us insight into better statistical descriptions of model errors, for example, for statistical models of displacement errors (errors in location and/or time), rather than the more standard additive error models.
E. Data Assimilation in Hydrodynamic Models
With our new ensemble filtering algorithm in place, we plan to compare its performance to a Newtonian nudging scheme adapted by Schwab and Beletsky (2004). We expect the ensemble method to produce superior forecasts of sediment in coastal regions where the spatial gradient is sharpest and where the most missing data occur. To compare the procedures, we plan to look at accuracy in predicting both sediment levels and spatial gradients. We will also continue to develop and test new statistical models for the observation and model errors, and we plan to refine our computational algorithms for parameter estimation. We also plan to develop approximate likelihood methods for model selection and parameter estimation in high-dimensional space-time models.
We also hope to follow up on our work comparing observed and modeled currents with the goals of understanding the strengths and weaknesses of the model and providing error structures for data assimilation methods. In order to make substantial progress on this aspect of the sub-project, we will most likely need to recruit a new student or postdoctoral researcher to the effort. One of the new postdoctoral researchers has shown an interest in this project.
F. Estimating Deformations of Isotropic Random Processes
We should soon have an implemented version of our procedure for estimating deformations of an isotropic random process from a single realization. We plan to study the properties of our procedure through both theory and simulations, and as we learn more about how it works, we will modify its details to improve its statistical performance and computational efficiency. We also plan to apply our procedure to environmental data such as high-resolution model output or satellite-based measurements that would have the density of observations needed for the method to be feasible.
G. Statistical Analysis of Phytoplankton in Lake Michigan
This research was completed some time ago except for some writing activities that are now nearly complete.
H. Combining Physical Models and TOMS Ozone Data for Assessing Stratospheric Ozone Trends
Now that we have successfully demonstrated the effectiveness of using numerical model runs in trend estimation, we plan to explore the use of other models for the stratosphere to see how the results compare. It would be especially intriguing to try three-dimensional stratospheric models. We also plan to study the impact of the choice of emissions scenarios in the model runs on the conclusions.
Journal Articles on this Report : 4 Displayed | Download in RIS Format
Other subproject views: | All 29 publications | 28 publications in selected types | All 18 journal articles |
---|---|---|---|
Other center views: | All 120 publications | 74 publications in selected types | All 52 journal articles |
Type | Citation | ||
---|---|---|---|
|
Guillas S, Stein ML, Wuebbles DJ, Xia J. Using chemistry transport modeling in statistical analysis of stratospheric ozone trends from observations. Journal of Geophysical Research 2004;109(D22303), doi:10.1029/2004JD005049. |
R829402C001 (2004) R829402C001 (Final) R829402C002 (2004) |
Exit |
|
Jun M, Stein ML. Statistical comparison of observed and CMAQ modeled daily sulfate levels. Atmospheric Environment 2004;38(27):4427-4436. |
R829402 (Final) R829402C002 (2004) R829402C002 (Final) |
Exit Exit |
|
Stein ML. Space-time covariance functions. Journal of the American Statistical Association 2005;100(469):310-321. |
R829402 (Final) R829402C002 (2002) R829402C002 (2004) R829402C002 (Final) |
Exit |
|
Stein ML. Statistical methods for regular monitoring data. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2005;67(5):667-687. |
R829402 (Final) R829402C002 (2004) R829402C002 (2006) R829402C002 (Final) |
Exit Exit |
Supplemental Keywords:
RFA, Economic, Social, & Behavioral Science Research Program, Health, Scientific Discipline, PHYSICAL ASPECTS, Air, Geographic Area, Ecosystem Protection/Environmental Exposure & Risk, particulate matter, Applied Math & Statistics, Ecosystem/Assessment/Indicators, Ecosystem Protection, Health Risk Assessment, Risk Assessments, Monitoring/Modeling, Ecological Effects - Environmental Exposure & Risk, Environmental Monitoring, Physical Processes, Environmental Statistics, Ecological Risk Assessment, Engineering, Chemistry, & Physics, Environmental Engineering, EPA Region, Great Lakes, particulates, risk assessment, ecological effects, monitoring, health risk analysis, watersheds, emissions monitoring, ecological health, ozone , particulate, stratospheric ozone, ozone, sediment transport, computer models, exposure, air pollution, chemical transport modeling, chemical transport, trend monitoring, statistical models, human exposure, ecological risk, water, ecosystem health, environmental indicators, PM, ecological models, chemical transport models, Region 5, data models, air quality, statistical methodology, human health risk, stochastic modelsRelevant Websites:
http://www.stat.uchicago.edu/~cises/
Progress and Final Reports:
Original AbstractMain Center Abstract and Reports:
R829402 Center for Integrating Statistical and Environmental Science Subprojects under this Center: (EPA does not fund or establish subprojects; EPA awards and manages the overall grant for this center).
R829402C001 Detection of a Recovery in Stratospheric and Total Ozone
R829402C002 Integrating Numerical Models and Monitoring Data
R829402C003 Air Quality and Reported Asthma Incidence in Illinois
R829402C004 Quasi-Experimental Evidence on How Airborne Particulates Affect Human Health
R829402C005 Model Choice Stochasticity, and Ecological Complexity
R829402C006 Statistical Approaches to Detection and Downscaling of Climate Variability and Change
The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.
Project Research Results
18 journal articles for this subproject
Main Center: R829402
120 publications for this center
52 journal articles for this center