Grantee Research Project Results
Final Report: Model-based Clustering for Classification of Aquatic Systems and Diagnosis of Ecological Stress
EPA Grant Number: R831368Title: Model-based Clustering for Classification of Aquatic Systems and Diagnosis of Ecological Stress
Investigators: Smith, Eric , Orth, Donald J. , Yagow, Gene , Berkson, Jim , Brannan, Kevin , Mostaghimi, Saied , Bates, Samantha
Institution: Virginia Tech
EPA Project Officer: Packard, Benjamin H
Project Period: November 10, 2003 through November 9, 2006
Project Amount: $843,771
RFA: Development of Watershed Classification Systems for Diagnosis of Biological Impairment in Watersheds and Their Receiving Water Bodies (2003) RFA Text | Recipients Lists
Research Category: Watersheds , Water
Objective:
The objectives of this research are to develop methodologies for classifying watersheds and to evaluate the ability of this classification system to delineate areas of biological stress. The novel aspect of our classification system is in the grouping of watersheds or collections of stream segments by empirical relationships between watershed attributes and aquatic ecosystem conditions.
We proposed a classification system derived through model-based cluster analysis—a statistical approach that groups empirical relationships. In contrast to classification systems that group sites by similarity of attribute values, we will group sites by similarity of the empirical stressor-effect relationships. The classification procedure will be comprised of an indirect approach as well as a more direct approach based on methods such as canonical correspondence analysis to form regression relationships between abundance data and variables of interest. The classification system will be tested at a variety of levels using data on fish and benthic macroinvertebrates. The stressor-response relationships will be based on parametric and nonparametric regression analyses as well as multivariate analysis.
Model-based approaches will also be used to evaluate water quality based on different methods for testing if a site is impaired. By using information from previous studies or other locations, the ability to detect impairment may be improved. The effect of information availability on Total Maximum Daily Load (TMDL) analysis will also be investigated.
Summary/Accomplishments (Outputs/Outcomes):
Clustering approaches were developed using the classification likelihood, the mixture likelihood, and also using a randomization approach with a model index. Using a clustering approach based on the mixture and classification likelihoods, we have developed an algorithm that can cluster regression relationships. The method was applied to data from Ohio using the Index of Biotic Integrity (IBI) as the biological response and several environmental measurements as stressors. The result of the application was two main clusters, one associated with the northwest region of the state and a second with the remaining locations. The clusters primarily differentiate based on the average value of IBI and a habitat metric. A region associated with a single basin was found to form a separate cluster. The clustering program is part of a macro that allows a user to make use of the power of this program for data entry, manipulation and graphics. The algorithm uses a Markov chain Monte Carlo approach that allows for the estimation of uncertainty associated with the clustering.
Another approach for clustering stressor-response relationships is based on the use of randomization and tessellations. In this approach, a spatial region is randomly divided into non-overlapping regions of different size. Within each region a model between stressors and biological responses is fitted. The procedure is then repeated a large number of times to find an optimal solution based on some summary criterion. The modeling approach is flexible in that univariate methods, such as parametric or nonparametric regression analysis, may be used or multivariate models applied. These methods were applied to a number of data sets. In one example, the method was used to subdivide the eastern United States into regions based on logistic regression models of brook trout presence and extirpation. Using this approach, we were able to improve prediction of brook trout by roughly 10% over standard models. The analysis is used to select subwatersheds for preservation and restoration.
Model-based approaches to evaluation of water quality were investigated using Bayesian and frequentist approaches. We showed that information from previous studies or from related sites may be used to improve on the ability to detect impairment. Evidence from our studies suggest that using the traditional approach for assessment (10% rule) has poor statistical properties, and an approach based on tolerance limits is a superior approach.
In another study, we investigated impacts of alternative land use sources, reference watersheds, and the water quality model used on the final TMDL for watersheds with benthic impairments. Questions considered in this research included: Do the different land use sources ( D igital O rthophoto Q uarter Q uadrangles [DOQQ] and National Land Cover Database [NLCD]) result in different stressor loadings? Does the use of alternative water quality models (G eneralized W atershed L oading F unctions [GWLF] and Soil and Water Assessment Tool [SWAT]) result in different stressor loadings? Is there a difference in stressor loadings when different reference watersheds are used? Stroubles Creek, a benthic impaired water body on Virginia’s 1998 303d list, was selected for study. Sediment is the primary benthic stressor and therefore the target for stressor reductions. Study results showed that the land use source used for determining land use parameters, the model used to determine sediment loads, and the reference watershed selected to determine the target load all have marked effects on resulting stressor load reduction requirements. U sing different land use sources, regardless of the reference watershed, resulted in required stressor reductions that were different by greater than 10%. With respect to water quality model selection, in two of the three scenarios considered, a difference in stressor load reduction requirements of greater than 10% resulted from using different water quality models. In one scenario, 2.8 times greater reductions were required with GWLF modeling than with SWAT modeling. Finally, different reference watersheds resulted in a difference of as much as 73% in required reductions of sediment in the impaired watershed. Since TMDL reports become legal documents, it is crucial to be able to consistently and scientifically determine the required reductions of stressor loading in an impaired watershed.
We also developed a new model: the Dynamic Agricultural Non-point Source Assessment Tool (DANSAT), which is a distributed-parameter, process-oriented, continuous-simulation watershed-scale model. DANSAT simulates long-term impacts of agricultural best management practices (BMPs) on the hydrologic and water quality. The model considers both spatial and temporal changes of BMPs. User interface was used to derive i ntensive spatial input parameters based on ArcView ASCII data format. The input GIS data include digital elevation model (DEM), detailed soil survey map, and surveyed land use data. DANSAT is applied to an agricultural watershed in Virginia to validate the model components and evaluate the capability of the model. Hydrology was calibrated on using data from the entire watershed. Parameters used for calibrating sediment load for internal subwatersheds are channel related parameters, including soil percent of stream bed and fraction of unerodible channel soil. The model, which is calibrated at the outlet of the watershed, appropriately simulates internal subwatersheds.
Journal Articles on this Report : 2 Displayed | Download in RIS Format
Other project views: | All 3 publications | 2 publications in selected types | All 2 journal articles |
---|
Type | Citation | ||
---|---|---|---|
|
Boone E, Ye K, Smith E. Using data augmentation via the Gibbs Sampler to incorporate missing covariate structure in linear models for ecological assessments. ENVIRONMENTAL AND ECOLOGICAL STATISTICS 2009;16(1):75-87. |
R831368 (Final) |
Exit Exit |
|
Boone E, Ye K, Smith E. Assessing environmental stressors via Bayesian Model Averaging in the presence of missing data. ENVIRONMETRICS 2011;22(1):13-22. |
R831368 (Final) |
Exit Exit |
Supplemental Keywords:
water, ecological effects, stressors, heavy metals, ecosystem indicators, model-based clustering, modeling, watershed management,, RFA, Scientific Discipline, Water, ECOSYSTEMS, Ecosystem Protection/Environmental Exposure & Risk, Water & Watershed, Aquatic Ecosystems & Estuarine Research, Monitoring/Modeling, Aquatic Ecosystem, Terrestrial Ecosystems, Environmental Monitoring, Ecological Risk Assessment, Ecology and Ecosystems, Watersheds, risk assessment, ecosystem modeling, anthropogenic stress, watershed classification, watershed, ecosystem monitoring, decision making, water quality, model based cluster anaysis, ecological risk, aquatic ecosystems, environmental stress, stressor effect relationships, ecological indicators, ecology assessment models, ecosystem stress, watershed assessment, ecological models, water monitoring, adaptive implementation modeling, stress responseRelevant Websites:
http://www.stat.vt.edu/~strclstr/index.shtml Exit
Progress and Final Reports:
Original AbstractThe perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.