Grantee Research Project Results

Final Report: Statistical Methods for Ecological Assessment of Riverine Systems by Combining Information from Multiple Sources

EPA Grant Number: R828674
Title: Statistical Methods for Ecological Assessment of Riverine Systems by Combining Information from Multiple Sources
Investigators: Handcock, Mark S.
Institution: University of Washington
EPA Project Officer: Hahn, Intaek
Project Period: January 1, 2001 through December 31, 2003 (Extended to December 31, 2004)
RFA: Environmental Statistics (1999) RFA Text | Recipients Lists
Research Category: Aquatic Ecosystems , Environmental Statistics , Human Health

Objective:

The assessment of environmental risk and the evaluation of environmental policies requires information from diverse sources to be collected, organized, and combined. Evolutionary improvements in geographic information systems (GIS) now routinely allow the management and mapping of spatial-temporal information. However, there is a dearth of statistical methodology, not only to represent the complexities of the information but also to allow the uncertainty of the resulting inference to be quantified.

The development of statistical models to combine information of different types and spatial support is of vital importance. The primary objective of this research was to improve understanding of the biological integrity of stream and river systems in the United States Mid-Atlantic Region by combining information from separate monitoring surveys, available contextual information on hydrologic units, and remote sensing information.

Summary/Accomplishments (Outputs/Outcomes):

This project:

Developed a hierarchical spatial statistical model for environmental indicators of stream and river systems that can be adapted and applied to other regions of the United States;
Developed models that can be used to estimate indicators throughout a riverine system based on information from multiple sources and aggregate scales and provide a mechanism for combining information from separate monitoring surveys, available contextual information on hydrologic units, and remote sensing information;
Completed a case-study of the use of the model was applied to the United States Mid-Atlantic Region based on information underlying the Landscape Atlas of the Mid-Atlantic region produced by the U.S. Environmental Monitoring and Assessment Program (EMAP);
Added capabilities to the model to combined information from two overlapping separate monitoring surveys, the EMAP Stream and River Survey and the Maryland Biological Streams Survey;
Developed a general framework for comparative distributional analysis based on the concept of a relative spatial distribution; and
Completed a case-study of the comparative distributional analysis that was developed where a spatial model is used to predict spatial distributions and relative spatial distributions for a watershed in the United States Mid-Atlantic Region.

Technical Advances

The project developed statistical modeling methods for combining information from multiple sources. The purpose of these models is to create a stochastic representation for the measurement at each location on a riverine system. The underlying modeling approach is hierarchical to allow complex structure to be represented by a hierarchy of relatively simple model specifications. The idea is to model the spatial dependence indirectly through latent stochastic processes. This research directly builds on the results of previous studies including Besag (1974, 1975), Cressie (1995), Molliè and Richardson (1991), and Bernardinelli and Monotomoli (1992). Let The set of locations on the riverine system. be the set of locations on the riverine system. Let W(x) represent the hydrologic unit (watershed) that the location x belongs, and {Wi:i = 1,..., H} represent the set of all hydrologic units. The units form a partition of R. Let Z(x) be a measure at each location in R. We describe a model for the measure at each location. We write:

Measure at each location in R.

where the first three terms capture variation because of differences in covariates, the φ and η terms capture residual spatial variation, and the last term the unexplained variation. The terms are:

L(x) row vector of location-specific covariates at location x and are potentially spatially varying in a neighborhood of x. These measures are required to be known at each location in R. Examples, of covariates are latitude, longitude, and elevation. One should include here indicators for the monitoring survey that provide the measurement. Hence, systematic differences between the measurements of the surveys can be identified. These differences could be caused by variations in collection protocol or incongruent calibration. Clearly if more complicated calibration issues are envisaged, they also can be added here or in the stochastic components. This set of covariates is restricted because we also need to know them at each value in R.
C(x) row vector of contextual covariates related to location x. These measures are required to be known at each location in R, but can be areal. That is, they are a characteristic of an area associated with the location x. Examples of covariates are characteristics of the reaches available in the stream data base such as stream order and stream level. Variables on demographic characteristics, air pollution, agricultural usage, and human use index from the Landscape Atlas database are included here. This set also is restricted because we need to know them at each value in R.

The effects of political divisions can be investigated using contextual variables to indicate the location is within a given political division. The most direct example is the state (or states) in which the watershed resides. Although the watershed does not necessarily respect state boundaries, state and local government regulations may directly influence the environmental condition and human activities. Hence, the relative comparison of state-level effects is a very important way of assessing the role of institutions at the state level. This approach can be applied to other political division such as labor-market regions and counties.

S(x) row vector of complete coverage covariates related to location x. These measures are assumed to be known at each location in the region, including those at each location in R. Examples of covariates are biophysical features such as soil types from the U.S. Department of Agriculture Natural Resources Conservation Service soils database, forest habitat, riparian cover, and human population patterns available from the Landscape Atlas database and other satellite-based landscape indicators.

Note that this division between location-specific, contextual, and complete coverage covariates is not a requirement of the model. Although the division is artificial from a modeling perspective, it serves the theme of combining data sources via a model by clarifying the precise linkage of contextual, complete coverage, and location-specific data types with the random field Z(x). Within the model, the components are treated similarly. The taxonomy is mainly to aid the identification of factors from the component surveys and to group the factors for interpretation. Each of these terms appears in a linear functional form with regression coefficient vectors (β₁, β₂, β₃). The functional form of the covariate vectors themselves can be adapted so that this functional form is appropriate. Note the spatial variation terms represent the effects of unadjusted for, or unobserved, covariates as well as the effects of spatial proximity. Whether we believe in the existence of true spatial proximity effects depends on the philosophical interpretation of the model. If we believe the model is a causal representation, then the latent variables only approach is compelling. If we believe the model is descriptive, then there is room for the residual spatial proximity effects.

In addition to these effects, we explicitly model the spatial variation between and within watersheds.

Expression. latent between watershed effects. Each location within the same watershed receives the same effect. It represents the overall level differences between the units. We will consider two models for .The first represents them as fixed but unknown environmental characteristics (i.e., a classical fixed effects specification). This representation is of interest as the watershed are unchanging over the time scale of the study, and the watershed effects are themselves of direct scientific interest. Under the second specification, the Expression. form a spatial lattice random field. The simplest model has the values independent of each other. We use a neighborhood-based lattice pairwise-difference model (Cressie, 1993; Anselin and Florax, 1995). Consider a neighborhood system for the watershed based on spatial contiguity, that is, units that share a common boundary are neighbors. We capture this effect with a class of nonstationary Gaussian intrinsic autoregressions (Besag, et al., 1991; Bernardinelli and Monotomoli, 1992). Let v_ij be prescribed nonnegative weights, with v_ij = 0 unless watersheds i and j are neighbors and let λ_y be a scale parameter. The conditional distribution of Expression. given the other effects in the watershed is specified to be Gaussian:

Expression.

where WN_i represents the watersheds j that are neighbors of i and v_i+ is the sum over Expression. of v_ij. The joint distribution of the between watershed effects is then an intrinsic Gaussian random field. The basic continuity scheme is contiguity, although alternative length schemes can clearly and fruitfully be considered, for example, length of common boundary and percentage of common boundary. The parameter y includes λ_y and others necessary to further specify the weights.

latent within watershed effects within the watershed of location x. We model each as a spatial random field on the riverine system within each watershed. For simplicity, we shall initially specify that the interwatershed dependence is captured by and the within-watershed spatial fields are independent between watersheds. This assumption can be relaxed if significant variability can be explained by doing so. The model within each watershed is a pairwise-difference model (Besag, 1989). For example, consider a neighborhood system for x based on being on the same stream segment (according to RF3). That is, two locations are neighbors if, and only if, they belong to the same stream segment. One would expect that, all else being equal, two locations on the same stream would more likely have closer values on a measure than two locations on separate streams. We capture this effect with a modified class of nonstationary Gaussian intrinsic autoregressions. The riverine system represented by the RF3 is composed on a finite, albeit large, number of elements. Let s(x) be the stream element that is on, and there are a finite number M say of such elements. We specify that is constant over the stream element s(x). Although a continuum random field on the riverine system is more appealing in principle, the hybrid irregular lattice version proposed below is designed to parsimoniously capture the stream-to-stream spatial variation. The main disadvantage of the continuum approach based on geostatistical models is the difficulty of specifying the variogram because of a lack of information at local scales. However, for general processes the geostatistical approach has many advantages, as Zimmerman and Harville (1991) show with application to agricultural experiments. Progress on continuum models has been made (Kelsall and Wakefield, 1997; Moller, 1998; Ecker and Gelfand, 1997; Best, et al., 1998). See Besag and Higdon (1999) for additional references and a discussion of these issues.

Returning to our model, prescribed nonnegative weights, Expression with unless x and y are neighbors. As there are M stream elements, the values of form a symmetric matrix W. The conditional distribution of Expression. given the other effects in the watershed is specified to be Gaussian:

Expression

where N₁(x) represents the stream elements that are neighbors of x, λ_v is a scale parameter, and w(x,+) is the sum over Expression of . The joint distribution of the within watershed effects for each stream element is then an intrinsic Gaussian random field. The simplest choice of the weights is if x and y are on the same stream segment. However, we can explore choosing weights proportional to those from a continuous geostatistically motivated semivariogram model to additionally capture the decay with distance between the locations (Raftery and Banfield, 1991; and the discussion of Besag and Higdon, 1999).

A number of neighborhood schemes could drive the spatial variation. For example: N₁ segment: locations belong to the same stream segment; N₂ stream: locations belong to the same stream, at the same order; N₃ siblings: locations belong to the same stream, but at different orders of the stream; and N₄ cousins: locations belong to different streams, but have the same order and source.

The above model can be generalized to this case where the weights are adjusted accordingly. The parameter v defines the structure of the within watershed spatial variation and includes λ_v and others necessary to further specify the weights. We expect that the precise form of these neighborhood schemes and weights depends on the nature of the spatial variation identified during the data analysis process; this will be explored further in future work.

residual spatial variation. The residual spatial variation is assumed to be independent of the other factors in the model. The form of the variation depends on the models specified for the spatial dependence. If an auto-normal is used for the other terms, then will be assumed to be mean zero Gaussian with standard deviation σ.

The project developed inference for the model parameters β₁, β₂, β₃, γ, v based on the likelihood function under the Bayesian paradigm. In addition to Bayesian inference for the parameters, researchers will usually be interested in posterior distributions for the latent effects Expression and . Using the methods developed in this project, these can be plotted spatially and be used to create maps summarizing knowledge about Z(x) over the stream network. Of particular practical interest is inference for areal measures.

Summary

This project developed methodology to complement the mapping presented in the U.S. Environmental Protection Agency EMAP Landscape Atlas with new hierarchical spatial statistical models for environmental indicators on the streams and rivers that capture the spatial variation in the measures.

These models can be used to estimate the indicators through riverine systems based on information from multiple sources and aggregate scales. They allow the uncertainty in the estimates to be quantified.

The research also developed methods to visualize the resulting estimates and uncertainties. Finally, the project developed a general framework for comparative distributional analysis based on the concept of a relative spatial distribution.

References:

Anselin L, Florax R. New directions is spatial econometrics. Berlin: Springer-Verlag, 1995.

Besag J. Spatial interaction and the statistical analysis of lattice systems (with discussion). Journal of the Royal Statistical Society B 1974;36:192-236.

Besag J. Statistical analysis of non-lattice data. The Statistician 1975;24:179-195.

Besag J. Towards Bayesian image analysis. Journal of Applied Statistics 1989:395-407.

Besag J, York J, Molliè A. Bayesian image restoration, with two applications is spatial statistics (with discussion). Annals of the Institute of Statistics and Mathematics 1991:1-59.

Besag J, Higdon D. Bayesian analysis of agricultural field trials (with discussion). Journal of the Royal Statistical Society B 1999;61:691-746.

Bernardinelli L, Montomoli C. Empirical Bayes versus fully Bayesian analysis of geographical variation in disease risk. Statistics in Medicine 1992:983-1007.

Best NG, Ickstadt K, Wolpert RL. Spatial poisson regression for health and exposure data measured at disparate resolutions. Institute of Decision Sciences, Duke University, 1998, Discussion Paper 98-36.

Cressie NAC. Statistics for spatial data. New York: Wiley, 1993.

Cressie NAC. Bayesian smoothing of rates in small geographic areas. Journal of Regional Science 1995;659-673.

Cwik J, Mielniczuk J. Estimating density ratios with application to discriminant analysis. Communications in Statistics 1989:3057-3069.

Ecker MD, Gelfand AE. Bayesian variogram modeling for an isotropic process. Journal of Agricultural, Biological, and Environmental Statistics 1997:347-369.

Kelsall JE, Wakefield JC. Spatial modeling of disease risk (manuscript, 1997).

Molliè A, Richardson S. Empirical Bayes estimates of cancer mortality rates using spatial models. Statistics in Medicine 1991:95-112.

Moller J. Log Gaussian Cox processes. Scandinavian Journal Statistics 1998;25:451-482.

Raftery AE, Banfield JD. Stopping the gibbs sampler, the use of morphology, and other issues in spatial statistics. Annals of the Institute of Statistics and Mathematics 1991:1-59.

Zimmerman DL, Harville DA. A random field approach to the analysis of field-plot experiments and other spatial experiments. Biometrics 1991:223-239.

Journal Articles on this Report : 1 Displayed | Download in RIS Format

Publications Views
Other project views:	All 8 publications	1 publications in selected types	All 1 journal articles

Publications
Type	Citation	Project	Document Sources
Journal Article	Marzluff JM, Millspaugh JJ, Hurvitz P, Handcock MS. Relating resources to a probabilistic measure of space use: forest fragments and Steller’s Jays. Ecology 2004;85(5):1411-1427.	R828674 (Final)	not available

Supplemental Keywords:

spatial statistics, GIS, hierarchical models, relative distribution, EMAP, Bayesian statistics, Monte Carlo methods, environmental social science, relative distribution methods,

Relevant Websites:

http://www.stat.washington.edu/handcock Exit
http://www.stat.washington.edu/handcock/RelDist Exit

Progress and Final Reports:

Original Abstract

The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.