EPA Science Inventory

On the Effect of Preferential Sampling in Spatial Prediction

Citation:

Gelfand, A., S. Sahu, AND D. Holland. On the Effect of Preferential Sampling in Spatial Prediction. ENVIRONMETRICS. John Wiley & Sons, Ltd., Indianapolis, IN, 23:565-578, (2013).

Description:

The choice of the sampling locations in a spatial network is often guided by practical demands. In particular, typically, locations are preferentially chosen to capture high values of a response, for example, air pollution levels in environmental monitoring. Then, model estimation and prediction of the exposure surface become biased due to the selective sampling. Since prediction is often the main utility of the modeling, we suggest that the effect of preferential sampling lies more importantly in the resulting predictive surface than in parameter estimation. Our contribution is to offer a direct simulation-based approach to assessing the effects of preferential sampling. We compare two predictive surfaces over the study region, one originating from the notion of an ?operating? intensity driving the selection of monitoring sites, the other under complete spatial randomness. We can consider a range of response models. They may reflect the operating intensity, introduce alternative informative covariates, or just propose a flexible spatial model. Then, we can generate data under the given model. Upon fitting the model and interpolating (kriging), we will obtain two predictive surfaces to compare. It is important to note that we need suitable metrics to compare the surfaces and that the predictive surfaces are random, so we need to make expected comparisons.

Purpose/Objective:

The choice of the sampling locations in a spatial network is often guided by practical demands such as the need to monitor air pollution levels near their most likely sources and in areas of high population density. Air pollution surfaces constructed solely on the basis of data obtained from these networks are likely to be biased if they are not adjusted for the effects of the choice of the monitoring sites. For example, if, due to locations, monitors tend to record high levels of exposure, interpolation of levels for low population density areas or locations away from sources such as power stations are likely to be upwardly biased. That is, if the sampling locations are preferentially chosen to capture high (or low) values of a response, for example, air pollution levels, then subsequent model estimation and prediction of the exposure surface can become biased due to the selective sampling. In the sequel, we use the term ?bias? informally but with the intention of capturing departure from what the exposure surface would look like if we interpolated given that the locations were selected under complete spatial randomness, see e.g. Diggle (2003). We introduce metrics to assess this. Indeed, since prediction is often the main utility of the modeling, we suggest that the effect of preferential sampling lies more importantly in the resulting predictive surface than in parameter estimation.

URLs/Downloads:

HOLLAND ORD-000031 FINAL JOURNAL ARTICLE..PDF   (PDF,NA pp, 640.675 KB,  about PDF)

Record Details:

Record Type: DOCUMENT (JOURNAL/PEER REVIEWED JOURNAL)
Start Date: 07/17/2013
Completion Date: 07/17/2013
Record Last Revised: 07/31/2013
Record Created: 07/17/2013
Record Released: 07/17/2013
OMB Category: Other
Record ID: 257804

Organization:

U.S. ENVIRONMENTAL PROTECTION AGENCY

OFFICE OF RESEARCH AND DEVELOPMENT

NATIONAL EXPOSURE RESEARCH LAB

ENVIRONMENTAL SCIENCES DIVISION