Grantee Research Project Results
2003 Progress Report: National Research Program on Design-Based/Model-Assisted Survey Methodology for Aquatic Resources
EPA Grant Number: R829096Center: Center for Air, Climate, and Energy Solutions
Center Director: Robinson, Allen
Title: National Research Program on Design-Based/Model-Assisted Survey Methodology for Aquatic Resources
Investigators: Stevens, Don L. , Urquhart, N. Scott , Herlihy, Alan T. , Hughes, Robert , Lesser, Virginia
Current Investigators: Stevens, Don L. , Urquhart, N. Scott , Herlihy, Alan T. , Lesser, Virginia
Institution: Oregon State University
Current Institution: Oregon State University , Colorado State University
EPA Project Officer: Packard, Benjamin H
Project Period: October 15, 2001 through October 14, 2005 (Extended to October 13, 2006)
Project Period Covered by this Report: October 15, 2002 through October 14, 2003
Project Amount: $2,989,884
RFA: Research Program on Statistical Survey Design and Analysis for Aquatic Resources (2001) RFA Text | Recipients Lists
Research Category: Ecological Indicators/Assessment/Restoration , Tribal Environmental Health Research , Watersheds , Water , Aquatic Ecosystems
Objective:
The objectives of this research project are to develop and implement design-based/model-assisted statistical methods for aquatic surveys.
Progress Summary:
Overall Program Status
The Designs and Models for Aquatic Resources Surveys (DAMARS) program is very close to its planned schedule. Our specific objectives for the end of Year 2 of the project were to:
- establish working relationships with several state agencies;
- identify data sets to use in developing statistical methodology;
- and complete at least two published or accepted peer-reviewed articles and at least three presentations at professional meetings during the year.
All of these specific objectives have been met. We have a strong working relationship with the Oregon Department of Fish and Wildlife (ODFW), and are using their data to address several research topics identified in our proposal. Particular topics include: imputation of missing data, trend detection, and incorporation of nonsurvey data into the analyses of probability survey data. We are working with the San Francisco Estuary Institute (SFEI) on the continuing implementation of the Regional Monitoring Program for the San Francisco Estuary. The Program Director also is a member of the Core Development Team for the California Rapid Assessment Method (CRAM) for wetland condition. The Core Development Team includes representatives from the U.S. Environmental Protection Agency (EPA) Region 9, SFEI, the Southern California Coastal Water Research Project (SCCWRP), the California Conservation Core, the California Coastal Commission, and the University of California at Los Angeles (UCLA) . We have established collaboration with the Great Lakes Environmental Indicators (GLEI) Project (part of the EPA Science To Achieve Results [STAR] program) and have initiated several joint research efforts.
All of the statistical analyses methodology development efforts are on schedule, with preliminary findings presented at the Eastern North American Region of the International Biometric Society meeting and at the Joint Statistical meetings. We are almost at targeted staffing levels, although we still have an unfilled Graduate Research Assistant position. To fulfill our obligation to train and develop a cadre of future Environmental Statisticians, we anticipate involving several younger faculty members and see a role for them as postdoctoral fellows.
DAMARS, in cooperation with the Space-Time Aquatic Resource Modeling and Analysis Program (STARMAP), sponsored the Second Annual Conference on Statistical Survey Design and Analysis for Aquatic Resources at Oregon State University (OSU), August 11-13, 2003. This conference involved major contributions from both programs, by the programs’ Science Advisory Committee, by persons from the EPA, by other federal agencies, and by other interested participants, mainly from OSU. Students from both programs had major roles in this conference, including the presentation of posters and speakers.
Members of the DAMARS team have been active in publications and presentations, having completed three publications, with another three manuscripts have been accepted. The team has made 35 presentations, and has a number of manuscripts in various stages of development and publication.
Project 1: Direction and Administration
The Program Director monitored the progress of Projects 2-5, including oversight of their budgets, staffing, and coordination; planned and monitored subcontracts to Colorado State University, Washington State University, and Iowa State University; assembled and submitted quarterly and annual reports; and coordinated various matters with STARMAP, the Colorado State University program, and the EPA Project Officer.
The Director also organized the Second Annual Conference on Statistical Survey Design and Analysis for Aquatic Resources, Oregon State University, Corvallis, OR, August 11-12, 2003. That program is documented at http://oregonstate.edu/dept/statistics/epa_program/meeting.html.
Project 2: Integration and Extramural Outreach
The specific objective of this project is to develop and extend the expertise on design and analyses to states and tribes, via a combination of distance learning, seminars, workshops, and demonstrations. Both the DAMARS and STARMAP programs have the same objectives and share funding and personnel. To minimize overlap, the STARMAP program has concentrated on developing learning materials, while DAMARS has concentrated on developing demonstration projects. Both programs have sought to develop collaborative research with EPA Laboratories and other EPA STAR projects. Noteworthy accomplishments include:
- DAMARS has established a firm working relationship with the GLEI project, and the EPA STAR program. In several visits to GLEI, DAMARS has identified areas where its skills could complement the GLEI project, and initiated joint research on: (1) species-area curves and wetland loss, (2) probability distribution of condition indicators, and (3) statistical basis of ecological indicators.
- We continued working with the San Francisco Estuary Regional Monitoring Program (RMP) for Trace Substances. The Director participated in the redesign of the monitoring plan for the San Francisco Bay, as part of a design team consisting of representatives from the SFEI, EPA Region 9, DAMARS, the U.S. Geological Survey, and the San Francisco Bay Area Regional Water Resource Control Board. The design is an excellent example of using prior information to guide design. Separate designs were put in place for water column and sediment. The sediment design applies Rotating Panel Generalized Random Tessellation Stratified (GRTS) methodology. A draft report on redesign of the RMP is being circulated for internal review, and should be released early in 2004. Plans are to hold an analyses workshop later this fall/winter.
- The Director has continued working with the Core Development Team for the CRAM for wetland condition. The Core Development Team includes representatives from EPA Region 9, SFEI, the SCCWRP, the California Conservation Core, the California Coastal Commission, and UCLA . The CRAM is m odeled on the Ohio Rapid Assessment Method , and is being extended to cover wetland types in California (e.g., salt marshes, and wetlands with tidal influence). The Core Development Team is working on metric/indicator development, planning for the verification/validation study, and pilot assessment.
Both Projects 3 and 4 are using data from the ODFW program that is sampling coho salmon in Oregon coastal streams. Salmon viability is a high-visibility issue in the Pacific Northwest, with multiple states and federal agencies involved (OR, WA, CA, U.S. Department of Interior–Bureau of Land Management, Bonneville Power Administration, U.S. Forest Service, U.S. Fish and Wildlife Service, National Oceanic and Atmospheric Administration, and EPA). A rotating panel GRTS is the basic sampling design for the Oregon Plan for Salmon and Watersheds. The demonstration has been very productive, and has led to state-agency personnel from the ODFW, the Oregon Watershed Enhancement Board, and the Oregon Department of Environmental Quality becoming advocates of spatially balanced probability sampling. The demonstration also has resulted in ideal data sets for developing statistical methodology because of the availability of historical data from both probability and convenience samples, the population has a dynamic frame, there are substantial missing data (ignorable and nonignorable), ancillary data at various spatial scales are available, and the design is a rotating panel through time. Program Directors Don C. Stevens and N. Scott Urquhart have continued their frequent contact to coordinate the OSU and the Colorado State University (CSU) programs, and each has made external contacts on the part of both the DAMARS and STARMAP programs. Highlights include:
- Tribal needs assessment by Water Quality Technology continues; report was delivered and has been posted on the STARMAP Web Site.
- Contacts were made with several tribal personnel and Deborah Patton, Tribal Coordinator, Inter-Tribal Council of Arizona at the National Water Quality Monitoring Council Meeting in Phoenix, AZ.
- Both Directors attended the All-Estuarine and Great Lakes Program (EaGLes) meeting, December 4-5, 2002, Annapolis, MD.
- Dr. Stevens attended and made presentations at several workshops with an aquatic monitoring focus.
- Development of browser-based learning materials has continued with the design of the “site” by CSU’s Office of Instructional Services. An Alaskan-Native student in CSU’s Statistics Department has begun developing learning materials on “Why Sample?.” These materials will be tested at a workshop held in conjunction with the Joint Program Meeting.
Project 3: Survey Design Methodology for Aquatic Resources
Steve Carroll completed development of a Bayesian hierarchical model to adapt spatial interpolation techniques to obtain predicted values of environmental variables in smaller regions where observed data are sparse using the Environmental Monitoring and Assessment Program (EMAP) Mid-Atlantic Highlands Area (MAHA)/MAIA data set. The research was carried out in collaboration with Tony Olsen, EPA-Water Ecology Division (WED), and Dr. Mark Handcock, University of Washington. Steve Carroll incorporated a measure of the degree of watershed overlap in the covariance structure and modified the traditional spatial interpolation methodology to account for the uncertainty in covariance parameter estimates by using a Bayesian formulation. Dr. Stevens presented a preliminary method for using species abundance data to develop a condition metric based on expected abundance at the All-EaGLes Conference, Baltimore, MD, December 4-6, 2002. The approach estimates the distribution of individuals on landscape metrics. The conditional distribution can be updated based on observed site characteristics and species counts to give a posterior distribution, which in turn can be used to calculate a condition metric. The approach will be refined when Dr. Stevens visits the GLEI program in February.
Dr. Stevens also has developed a method for combining probability survey and convenience survey data. The method is an extension of post-stratification, and uses selection functions to infer a probability structure for the nonprobability survey. The selection functions are estimated as ratios of kernel density estimators of an ancillary variable using the complete population and the nonprobability sample elements. Preliminary application to data from the EMAP Northeast Lakes pilot study yielded promising results using Secchi depth as the response variable and lake area as the ancillary variable. Results were presented at the Spring Meeting of the Eastern North America Region of the International Biometric Society, Tampa, FL, March 30-April 2, 2003.
Rubén Smith is developing a Bayesian Hierarchical spatial model for counts of juvenile coho salmon in Oregon coastal streams. The ODFW conducts annual surveys using a sampling protocol developed with EMAP. The model developed by Rubén Smith will use these data to estimate coho abundance and create annual maps of relative abundance. The model presently includes a constant mean function and two random components to capture spatial variability and random error. The current effort is concentrated on selecting appropriate prior distribution and implementing the Monte Carlo Markov Chain (MCMC) estimation algorithm. Once computational algorithms have been tested, the model will be extended to include multiple years, temporal covariance, and covariates. Phil Larsen (EPA-WED) is cooperating in this effort.
Graduate student Cynthia Cooper has been working on compiling historical data on coho spawner abundance and ancillary data on ocean and watershed conditions. We expect to use these data in building trend detection models that build on Rubén Smith’s Bayesian spatial models.
Preliminary maps of abundance for spawners for years 1998-2001 are being obtained considering only a constant (systematic part), a spatial random component, and a nonspatial random component.
Dr. Stevens presented preliminary results on using selection functions to combine probability survey and convenience survey data at the Spring Meeting of the Eastern North America Region of the International Biometric Society, Tampa, FL, March 30-April 2, 2003. The selection functions are estimated as ratios of kernel density estimators of an ancillary variable using the complete population and the nonprobability sample elements. The method was applied to data from the EMAP Northeast Lakes pilot using Secchi depth as the response variable and lake area and location as the ancillary variables. The convenience survey data have substantial bias, most of which the method was able to remove. Currently, we are developing a data set with a measure of human disturbance or development that may account for much of the remaining bias.
Don L. Stevens, Breda Munoz-Wernánderz, and Rubén Smith visited GLEI to discuss collaboration between GLEI, DAMARS, and STARMAP. Several topics and associated data sets were identified, including the relationship between species-area curves and loss of wetlands, development of an avian species richness indicator, and exploring the statistical nuances of indicator development. GLEI has supplied several data sets on avian abundance to DAMARS. The species-area/wetland loss complements ongoing species-area curve research that focuses on using species area curves to explore the composition and species richness of amphibians, reptiles, mammals, birds, and fish in EMAP hexagon (660 sq km) data in Oregon and Washington and five Middle Atlantic States. Questions of interest include: (1) How do species-area curves differ by taxa, geographic region, and accumulation method? and (2) How do species-area curves determined from these data differ from those in other studies? Denis White and Josh Lawler (EPA-WED) are cooperating on the species-area curve research.
Graduate Student Susan Hornsby has begun looking at fitting logistic regression models to data collected by David Marks and Steven Price. Marks and Price are graduate students of Bob Howe, University of Wisconsin. Their work is being supported under the GLEI program.
Cynthia Cooper also has begun research on the convergence of design-based and model-based approaches to spatial sampling. Both approaches can incorporate prior information in sample design.
We have investigated cost models for ranked set sampling (RSS) procedures. These models take into account the cost of ranking versus the cost of direct measurement of a sample. Both balanced and unbalanced sampling designs, and two types of mean estimators (distribution-free [DF], and best linear unbiased estimator [BLUE]) have been considered for the following distributions: normal, log normal, and exponential. The minimum cost ratio necessary for RSS to be as cost effective as simple random sampling (SRS) depends on the underlying distribution of the data, as well as the allocation and type of estimator used. Most minimum necessary cost ratios are in the range of 1.0-6.0, and are lower for BLUEs than for DF estimators. The higher the prior knowledge of the distribution underlying the data, the lower the minimum necessary cost ratio and the more attractive RSS is over SRS. These methods were applied to Oregon stream habitat area data and were found to give lower cost ratio estimates than ones that had been previously calculated (previous estimates used only balanced allocations). RSS has excellent potential for many ecological/environmental field situations (e.g., where samplers like to incorporate human judgment as to where to sample, and where field logistics make sampling difficult).
As part of this project, we are evaluating different model-assisted sampling designs for spring chinook salmon. We have started evaluating different sampling approaches regarding a census of chinook red salmon locations in the Middle Fork of the Salmon River since 1995.
Project 4: Parametric Model Assisted Survey Methods for Environmental Surveys
This project has made substantial progress on an issue that is critical to EPA’s monitoring programs—how to adjust the analyses of probability survey data for data that are missing, especially data that are missing because of the lack of legal access permission. Such missing data often do not satisfy the Missing at Random assumption, necessitating sophisticated imputation techniques.
The empirical orthogonal function (EOF) method was applied to data from the Mid-Atlantic stream water chemistry survey from 1998 to 2000. Results were summarized in the draft paper titled, Design-based EOF Model for Environmental Monitoring Data Analysis, by Breda Munoz, Virginia M. Lesser, and Fred L. Ramsey.
Work is progressing on developing a user manual for environmental scientists who use probability surveys in which missing data are encountered. The initial version of the manual will address two adjustment techniques (weighting class adjustment and poststratification) for data that are not missing-completely at-random but are missing-at-random. The user manual will contain definite types of nonresponse, methods to determine which types of nonresponse may affect the survey data, approaches to deal with the nonresponse, and examples of nonresponse adjustment approaches on sample data sets from the ODFW coastal salmon survey.
An approach for handling missing data via multiple imputation using geostatistical models has been developed. This technique applies multiple imputation using kriging to data from the coastal salmon survey conducted by the ODFW.
Two approaches to handle nonignorable nonresponse are being investigated. Bayesian geostatistical models are used simultaneously with data augmentation algorithms to estimate the parameters of a generalized mixed linear model. The models consider missing data, spatial components, systematic parts, and random nonspatial components. We have estimated parameters using MCMC. In addition, we presently are working on the expectation maximization-MCMC constrained approach to get the maximum likelihood estimates of the parameters.
Project 5: Nonparametric Model Assisted Survey Estimation for Aquatic Resources
Small Area Estimation. Jay Breidt and Ph.D. student M.J. Delorey are working on the broad topic of small area estimation; investigating methods for incorporating spatial relationships. The small areas of interest, for now, consist of HUC8s in the Mid-Atlantic Highlands Region, and the responses focus on characteristics of water quality. They currently are using Bayesian methods to construct a set of ensemble estimates for all watersheds. Plans are to investigate the same problem using nonparametric/semiparametric methods.
Jay Breidt and M.S. students S. Everson-Stewart have investigated nonparametric regression estimators for two-stage samples, in which auxiliary information is available at the level of the primary sampling units. This work used the EMAP Northeast Lakes data set. These nonparametric methods may be useful for estimation in regions with small, but not extremely small, sample sizes.
Jay Breidt and colleague Jean Opsomer at Iowa State are collaborating on semiparametric model-assisted regression estimation, as well as model-based estimation. These methods can be useful for small area estimation, and will be compared to fully parametric procedures. This project has made substantial progress in developing model-assisted estimates of distribution functions, primarily through the M.S. project of Alicia Johnson (who completed her degree and now is employed in an environmental position with the Center for Communicable Diseases.) The estimator compares favorably to other parametric and nonparametric alternatives. Results have been presented at several statistical meetings, and a paper on this is in preparation to be submitted to Biometrika.
As an extension of the nonparametric model-assisted estimation project, Jean Opsomer and Jay Breidt have begun research in applying the nonparametric regression methodology to the small area estimation context, and are applying it to the Northeastern Lakes survey. Preliminary results based on this work were presented at the STARMAP/DAMARS Meeting at Oregon State University in August. A simulation study was conducted by M.S. student S. Everson-Stewart, who has completed her degree and taken a job with Amgen.
The choice of the smoothing parameter has an important effect on nonparametric regression estimators, and this also is true for model-assisted estimators. Jean Opsomer and Curtis Miller, graduate students at Iowa State University, have been developing a cross validation-based method for selecting the smoothing parameter, and presented preliminary results at the Joint Statistical Meetings in August.
Future Activities:
DAMARS will assist STARMAP in organizing the Graybill Conference at CSU in June 2004. We expect to organize at least one session and contribute several presentations. In addition, DAMARS expects to make two presentations at the joint meeting of the International Environmetrics Society and the Symposium on Spatial Accuracy in June 2004. We also plan on participating in the EMAP meeting to be held in May 2004. The joint STARMAP/DAMARS Meeting will be hosted by STARMAP and held in September 2004, in Fort Collins, CO. Future activities for the various projects are described below.
Journal Articles: 16 Displayed | Download in RIS Format
Other center views: | All 142 publications | 19 publications in selected types | All 16 journal articles |
---|
Type | Citation | ||
---|---|---|---|
|
Andrews B, Davis RA, Breidt FJ. Maximum likelihood estimation for all-pass time series models. Journal of Multivariate Analysis 2006;97(7):1638-1659. |
R829096 (2003) R829095 (Final) R829095C002 (2003) R829095C002 (2004) |
Exit Exit |
|
Breidt FJ, Hsu N-J. Best mean square prediction for moving averages. Statistica Sinica 2005;15(2):427-446. |
R829096 (2003) R829096 (2005) R829095 (Final) R829095C002 (2003) R829095C002 (2004) R829095C002 (2005) |
Exit Exit |
|
Breidt FJ, Hsu N-J, Coar W. A diagnostic test for autocorrelation in increment-averaged data with application to soil sampling. Environmental and Ecological Statistics 2008;15(1):15-25. |
R829096 (2005) |
Exit |
|
Buchanan RA, Conquest LL, Courbois J-Y. A cost analysis of ranked set sampling to estimate a population mean. Environmetrics 2005;16(3):235-256. |
R829096 (2002) R829096 (2003) R829096 (2004) R829096 (2005) |
Exit |
|
Cooper C. Sampling and variance estimation on continuous domains. Environmetrics 2006;17(6):539-553. |
R829096 (2005) |
Exit |
|
Courbois JP, Urquhart NS. Comparison of survey estimates of the finite population variance. Journal of Agricultural, Biological, and Environmental Statistics 2004;9(2):236-251. |
R829096 (2003) R829096 (2004) R829096 (2005) R829095 (2004) R829095 (2005) R829095 (Final) R829095C003 (2003) R829095C003 (2004) |
Exit |
|
Da Silva DN, Opsomer JD. Properties of the weighting cell estimator under a nonparametric response mechanism. Survey Methodology 2004;30(1):45-55. |
R829096 (2004) R829096 (2005) R829095 (2003) R829095 (2004) R829095 (2005) R829095 (Final) R829095C002 (2004) R829095C002 (2005) |
Exit Exit |
|
Montanari GE, Ranalli MG. Nonparametric model calibration estimation in survey sampling. Journal of the American Statistical Association 2005;100(472):1429-1442. |
R829096 (2004) R829096 (2005) R829095 (Final) R829095C002 (2004) R829095C002 (2005) |
Exit |
|
Munoz B, Lesser VM. Adjustment procedures to account for non-ignorable missing data in environmental surveys. Environmetrics 2006;17(6):653-662. |
R829096 (2005) |
Exit |
|
Munoz B, Lesser VM, Ramsey FL. Design-based empirical orthogonal function model for environmental monitoring data analysis. Environmetrics 2008;19(8):805-817. |
R829096 (2003) R829096 (2004) R829096 (2005) |
Exit |
|
Opsomer JD, Botts C, Kim JY. Small area estimation in a watershed erosion assessment survey. Journal of Agricultural, Biological, and Environmental Statistics 2003;8(2):139-152. |
R829096 (2004) R829096 (2005) R829095 (2004) R829095 (2005) R829095 (Final) R829095C002 (2004) |
Exit Exit |
|
Opsomer JD, Breidt FJ, Moisen GG, Kauermann G. Model-assisted estimation of forest resources with generalized additive models. Journal of the American Statistical Association 2007;102(478):400-409. |
R829096 (2003) R829096 (2004) R829096 (2005) R829095 (2004) R829095 (2005) R829095 (Final) R829095C002 (2004) R829095C002 (2005) |
Exit Exit |
|
Opsomer JD, Claeskens G, Ranalli MG, Kauermann G, Breidt FJ. Non-parametric small area estimation using penalized spline regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2008;70(1):265-286. |
R829096 (2005) R829095C002 (2005) |
Exit Exit |
|
Stevens Jr. DL, Olsen AR. Variance estimation for spatially balanced samples of environmental resources. Environmetrics 2003;14(6):593-610. |
R829096 (2003) R829096 (2005) |
Exit |
|
Stevens Jr. DL, Olsen AR. Spatially-balanced sampling of natural resources. Journal of the American Statistical Association 2004;99(465):262-278 |
R829096 (2002) R829096 (2004) |
not available |
|
Thomas DL, Johnson D, Griffith B. A Bayesian random effects discrete-choice model for resource selection: population-level selection inference. Journal of Wildlife Management 2006;70(2):404-412. |
R829096 (2005) R829095 (Final) |
Exit |
Supplemental Keywords:
public policy, decisionmaking, community-based, monitoring, risk assessment, watersheds, streams, rivers, estuaries, economic social and behavioral science research program, ecosystem protection, environmental exposure, risk, applied math, statistics, aquatic ecosystems, estuarine research, chemical mixtures, human health, economics, engineering, urban and regional planning, environmental biology, Bayesian approach, Environmental Monitoring and Assessment Program, EMAP, aquatic resources, empirical orthogonal functions, landscapes, model assisted estimation, model-based analysis, modeling, spatial analysis, statistical methodology, statistical tools, surface water, survey, salmon, chinook, coho,, RFA, Scientific Discipline, Ecosystem Protection/Environmental Exposure & Risk, Aquatic Ecosystems & Estuarine Research, Aquatic Ecosystem, Environmental Monitoring, EMAP, estuarine research, risk assessment, ecosystem monitoring, statistical survey design, spatial and temporal modeling, aquatic ecosystems, Environmental Monitoring and Assessment ProgramRelevant Websites:
http://oregonstate.edu/dept/statistics/epa_program/meeting.html Exit
Progress and Final Reports:
Original AbstractThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.