Final Report: A Biologically Driven National Classification Scheme for U.S. Streams and Rivers

EPA Grant Number: R829498
Title: A Biologically Driven National Classification Scheme for U.S. Streams and Rivers
Investigators: Herlihy, Alan T. , Hughes, Robert , Pan, Yangdong
Institution: Oregon State University , Portland State University
EPA Project Officer: Hiscock, Michael
Project Period: February 1, 2002 through January 31, 2005 (Extended to January 31, 2006)
Project Amount: $747,541
RFA: Development of National Aquatic Ecosystem Classifications and Reference Conditions (2001) RFA Text |  Recipients Lists
Research Category: Ecosystems , Water , Aquatic Ecosystems


Analyzing stream biological assemblage data at a national scale is extremely difficult and rarely attempted due to the problems of compiling the necessary database.  Our goal was to assemble a national database for the conterminous 48 U.S. states of stream/river fish, macroinvertebrate and periphyton assemblages derived from regional scale synoptic surveys. Our objectives were to 1) use our national database to develop 10-30 biologically driven national "classes" of stream systems; 2) within each class, separate natural from anthropogenic effects on stream ecological condition; and 3) establish quantitative relationships between catchment and riparian condition and water body condition (structure and function). 

Summary/Accomplishments (Outputs/Outcomes):

1.0 Approach and Overview
National scale fish and macroinvertebrate lotic assemblage databases were compiled from available Environmental Monitoring and Assessment Program (EMAP), Regional EMAP (REMAP), state agency, and U.S. Geological Survey (USGS) National Water Quality Assessment Program (NAWQA) data (section 2). Cluster analysis (Bray-Curtis distance) and indicator species analysis were used to cluster the data, identify clusters, and describe them. Analyses were done on all sites and separately for only least disturbed reference sites to investigate the effect of human disturbance on classification. Biological classification strength was also compared to the classification strength of commonly used ecoregional, hydrological, and political spatial classifications (Section 3). In doing these large-scale analyses, it became apparent that there were a number of issues with regard to macroinvertebrates that required further investigation. In-depth analyses were done on the effect of macroinvertebrate taxonomic resolution and sample habitat type on regional bioassessment (section 4). For the last objective of this project, landscape-biotic interactions were examined for stream periphyton in California, Western fish assemblages, and large river fish assemblages in section 5.
2.0 National Database Creation (Herlihy et al., 2006)
Biological assemblage data from federal and state agencies were obtained for 12,685 different sites throughout the 48 conterminous United States (Figure 1). All told, data were obtained from 16 different state agencies, 9 REMAP surveys and 4 EMAP surveys and NAWQA. Along with the biological data, we also compiled available site level physical and chemical habitat data. Availability of this information varied enormously both in terms of what was measured and in field methods across all the surveys. This was a major limiting factor in what data we could relate to the biologically derived cluster information. In addition, from site latitude/longitude, we used available GIS layers to obtain site elevation, and spatial classification information (e.g., ecoregion, hydrologic unit, or physiographic class) for every site. For sites that had multiple sampling visits, we used the most recent sampling as our index sample so that each site was only represented by one sampling event.
Lotic vertebrate assemblage data were available for 6,336 of the national sites and contained 647 different species. We decided to analyze only data for fish (not amphibians) and only for sites that had > 75% of the individuals identified to species. The working fish data had 530 different species from 5,951 unique sampling sites. Of these, 63 species were “non-rare” (found at > 5% of the sites) and were used in most multivariate analyses.
Macroinvertebrate data were available for 9,040 sites in 26 different surveys. Six of these surveys were not suitable for our purposes because they used very different field sampling methods (Hester-Dandy plates, rock bags), had insufficient taxonomic resolution (mostly with the Chironomidae), or collected only presence/absence data. It also was necessary to reconcile taxa naming across databases to a consistent level of distinct taxa to address the ambiguous taxa issue and naming conventions. Originally, there were 9,647 different taxa names in the combined data. One of the most time consuming activities of this project was taking this taxa list and making it consistent (using ITIS as a common denominator) across surveys. This process reduced the number of different taxa to 3,481; 17% of them were identified to species, 70% to genus, and 95% to family. Data for most taxa were lumped up to genus or family for analysis depending on the taxonomic resolution that was typical (over 85% of the samples) for that group. For analysis, rare taxa were not used (those at <5% of sites). The final database had 4,949 unique sites with 173 non-rare macroinvertebrate taxa.
To examine the effects of human disturbance on national biological clusters, we also did the analyses using only least-disturbed reference sites. Reference site designations were obtained in two ways. For state and NAWQA data, sites identified in those databases as reference were considered reference in our study. For EMAP/REMAP data, a series of chemical and physical habitat screens were used to drop sites as reference candidates. Sites meeting all screening criteria were considered reference in these data. There were 1,184 reference sites used for the fish analyses and 1,123 reference sites used in the macroinvertebrate analyses.
Figure 1. Location of sample sites with biological assemblage information in our national database.
3. Clustering of Biotic Assemblages and Their Relationship to Existing Spatial Classification Schemes and Human Disturbance
3.1 National Fish Assemblages (Herlihy et al., 2006)
Using the national-scale lotic fish assemblage data, we used cluster analysis (Bray-Curtis distance) and indicator species analysis to cluster the data, identify clusters, and describe them. We developed 12 national clusters of fish assemblage groups that were well-described by indicator fish species and predicted using both discriminant function analysis and classification tree analysis. We also examined the relationship of existing landscape classification schemes to fish assemblage similarity. Existing schemes captured about half the within-group similarity expressed in biologically derived clusters. Schemes based on ecoregion, physiography, hydrologic units, and geopolitical boundaries had very similar mean within-group fish assemblage similarities. Cluster and mean similarity analyses were not strongly influenced by using data subsets that removed non-native fish species and disturbed sites. This suggests that the underlying mechanisms responsible for controlling fish assemblage patterns at the national scale are fairly robust to the effects of non-native species and anthropogenic disturbances.
3.2 National Macroinvertebrate Assemblages (Sifneos et al., 2005 - NABS Poster)
From the national-scale database of lotic macroinvertebrate assemblages, we developed 15 national-scale clusters of macroinvertebrate assemblages that were well-described by indicator species and predictable using both discriminant function analysis and classification tree analysis. Cluster and mean similarity analyses were not strongly influenced by using data subsets that removed disturbed sites. This suggests that the underlying mechanisms controlling the pattern in national scale macroinvertebrate assemblages have strong natural drivers and are fairly robust to the effects of anthropogenic disturbance. Existing landscape classification schemes captured about half of the within-group similarity expressed in biologically derived clusters. Schemes based on ecoregions, basins, and political boundaries had almost identical mean within-group assemblage similarities.
3.3 Western Aquatic Vertebrate Assemblages (Hughes et al. In Preparation)
Our objective in this study was to use data from the EMAP Western Pilot to examine the effects of three spatial scales on the fish clusters and their explanatory variables. We examined patterns at the scales of the 12 conterminous western USA states, the western mountains, and the mountains of western Oregon. Cluster analysis yielded 12-15 groups described by indicator species and predicted by discriminant function analysis and classification tree analysis. State, basin, and ecoregion classifications of the sites, as well as site and catchment quantitative data, each captured less than half the within-group similarity expressed in the biologically derived clusters. The explanatory variables most useful for explaining the group patterns varied by scale and included both site and catchment (including basin and ecoregion) data.
3.4 Longitudinal Classification of Oregon River Fish Assemblages (McGarvey and Hughes., in review)
The aim of this study was to detect longitudinal zonation within riverine fish assemblages, to determine whether these longitudinally structured assemblages (‘ichthyofaunal zones’) constitute distinct, segment-scale species pools, and to apply these results to species-area and assemblage saturation tests. Our study areas were the Willamette, Umpqua, Deschutes, and John Day river basins (Oregon, U.S.A.). Relative abundance data from the Environmental Protection Agency’s Environmental Monitoring and Assessment Program (EMAP) were used to characterize longitudinal patterns in fish assemblage structure along each of the four rivers; Bray-Curtis ordinations and cluster analyses identified distinct fish assemblages, and mean similarity tests verified the significance of these zonation patterns. Discrete, regional species pools were then compiled in each basin by combining EMAP data with long-term, geo-referenced records from the Oregon State University ichthyology museum, and determining which species occurred in each longitudinal zone. We examined the effects of these within-basin species pools on species-area and species-volume relationships by comparing plots that discriminated amongst longitudinally structured species pools with plots that did not. We also compared assemblage saturation (local versus regional richness) plots that recognized longitudinal zonation with ones that did not. Three distinct, longitudinally structured assemblages (‘lower,’ ‘middle’ and ‘upper’ ichthyofaunal zones) were detected in each basin. This zonation, which included consistent patterns in habitat use, feeding, and reproductive behavior, was coincident with longitudinal changes in physical habitat (mean gradient, discharge, and temperature). Species-area tests showed total basin water volume to be a better predictor of total basin richness than basin area. Region-specific water volume (approximate volume of water within each longitudinal zone) was, however, the most robust and most significant predictor of regional species pool richness. Assemblage saturation tests revealed little evidence of saturation when all regional pools were combined. But when regional pools were stratified by longitudinal zone, we found evidence of saturation within the middle and upper assemblages. We also found that the lower assemblages may be approaching saturation, due to an influx of non-native species. We concluded that longitudinally structured fish assemblages in Oregon Rivers are, in fact, distinct regional species pools. We also found that discriminating amongst within-basin, regional species pools can significantly enhance species-area and assemblage saturation analyses, by providing researchers with more robust datasets, and allowing them to detect differential (zone-specific) patterns.
4. Macroinvertebrate Data Analysis Issues
4.1 Effects of Taxonomic Resolution on Interpreting Regional Survey Data (Waite et al., 2004)
We used the EMAP mid-Atlantic Highlands (MAH) stream data set to evaluate the importance of differing levels of macroinvertebrate taxonomic resolution in bioassessments by comparing the ability of family versus genus to detect differences among sites classified by type and magnitude of human impact and by stream size. We divided the MAH into two physiographic regions: the Appalachian Plateau where mine drainage (MD) and acidic deposition are major stressors, and the Ridge and Valley where nutrient enrichment is a major stressor. Stream sites were classified into 3 or 4 impact classes based on water chemistry and habitat. We used stream order (1st–3rd Strahler order) in each region as a coarse estimate of stream size. Ordination, 2x2 Chi-square and richness metrics were used to compare the ability of family and genus to detect differences among both stressor and size classes. With one notable exception, there were only a small number of different genera per family (interquartile range = 1-4). Chironomidae, however, contained 123 different genera. As a result, significant information loss occurred when this group was only classified to family. The Chironomidae did not discriminate among the predefined classes but many chironomid genera did: by chi-square analysis, 10 and 28 chironomid genera were significant in discriminating MD and nutrient impacts, respectively. Family and genus data were similar in their ability to distinguish among the coarse impacts (e.g., most severe versus least severe impact classes) for all cases. Though genus data in many cases distinguished the subtler differences (e.g., mixed/moderate impacts versus high or low impacts) better than family, differences in significance levels between family and genus analyses were relatively minor. However, genus data detected differences among stream orders in ordination analyses that were not revealed at the family level. In the ordinations, both family and genus levels of analysis responded to similar suites of environmental variables. Our results suggest that identification to the family level is sufficient for some bioassessment purposes. However, identifications to genus do provide more information in genera-rich families like Chironomidae. Genus or finer levels of identification are important for investigating natural history, stream ecology, biodiversity, and indicator species. Decisions about the taxonomic level of identification need to be study specific and depend on available resources (cost) and study objectives.
4.2 Effects of Sample Habitat on Regional Macroinvertebrate Assessments (Gerth and Herlihy, 2006)
One of the dilemmas in designing any large-scale macroinvertebrate bioassessment is deciding where to sample within streams. Streams contain a wide variety of habitats with varying macroinvertebrate assemblages, yet there has to be consistency in sampling protocol in order to interpret results across sites in a region. The Environmental Monitoring and Assessment Program (EMAP) conducted large regional probability surveys in the mid-Atlantic (1993-1998) and the western United States (2000-2001). In these surveys, EMAP collected 2 macroinvertebrate sample-types at each site, pool and riffle in the mid-Atlantic, and reachwide and riffle in the West. We analyzed data from sites where both types of samples were collected (206 mid-Atlantic and 293 Western) to examine the effects of sample-type on typical metric and multivariate analyses done in bioassessments. EPT taxon richness was higher in mid-Atlantic riffle samples than in corresponding pool samples and total taxon richness was higher in western reachwide samples than in riffle-only samples. Assemblage dissimilarities between sample-types were detectable in the mid-Atlantic data, but were considerably less in the western US. Nonetheless, in ordination analyses, sample-type differences did not obscure the overall pattern, nor did they influence detection of important environmental gradients. In addition, bioassessments based on EPT richness showed that regional assessments differed little with sample-type. Our analyses indicate that typical bioassessment methods are relatively robust with respect to sample-type in regional surveys.
5.0 Landscape-Biota Interactions
5.1 Diatom assemblage patterns in Central California Valley REMAP streams (Pan et al., 2006)
Streams and rivers in the California Central Valley Ecoregion have been substantially modified by human activities. This study examined distributional patterns of benthic diatom assemblages in relation to environmental characteristics in streams and rivers in the California Central Valley ecoregion. Benthic diatoms, water quality, and physical habitat conditions were characterized from 53 randomly selected sites. The stream sites were characterized by low mid-channel canopy cover and high channel substrate embeddedness. Sampled sites varied considerably in stream types and physical and chemical habitat. Only 28% of sampled sites were modified to relatively unmodified natural stream or river channels while 66% of all sites were man-made waterways (ditches and drains). The waters at these sites were enriched with minerals and turbidity varied from 1.3 to 185.0 NTU with an average of 13.5 NTU. A total of 249 diatom taxa were identified. Average taxa richness was 41 with a range of 7–76. The assemblages were dominated by Staurosira construens (11%), Epithemia sorex (8%), Cocconeis placentula (7%), and Nitzschia amphibia (6%). Multivariate analyses (cluster analysis, classification tree analysis, and canonical correspondence analysis) all showed that benthic diatom assemblages were mainly affected by channel morphology, in-stream habitat, and riparian conditions. The 1st CCA axis negatively correlated with mean wetted channel width (r = -0.66) and thalweg depth (r = -0.65) (Table 4). The 2nd axis correlated with % coarse substrates (r = 0.60). Our results indicate that benthic diatoms in the Central Valley are governed more by physical, than chemical, habitat.
5.2 Anthropogenic and Natural Controls of Pacific Northwest Fish Assemblages (Kaufmann et al., 2005)
Physical habitat degradation has been implicated as a major contributor to the historic decline of salmonids in Pacific Northwest streams. Native aquatic vertebrate assemblages in the Oregon and Washington Coast Range consist primarily of coldwater salmonids, cottids, and amphibians. This region has a dynamic natural disturbance regime, in which mass failures, debris torrents, fire, and tree-fall are driven by weather but are subject to human alteration. The major land uses in the region are logging, dairy farming, and roads, but there is disagreement concerning the effects of those activities on habitat and fish assemblages. To evaluate those effects, we examined associations among physical and chemical habitat, land use, geomorphology, and aquatic vertebrate assemblage data from a regional survey. In general, those data showed that most variation in aquatic vertebrate assemblage composition and habitat characteristics is predetermined by drainage area, channel slope, and basin lithology. To reveal anthropogenic influences, we first modeled the dominant geomorphic influences on aquatic biotic assemblages and physical habitat in the region. Once those geomorphic controls were factored out, associations with human activities were clarified. Streambed instability and excess fines were associated with riparian disturbance and road density, as was a vertebrate assemblage index of biotic integrity (IBI). Low stream IBI values, reflecting lower abundances of salmonids and other sediment-intolerant and coldwater fish and amphibian taxa, were associated with excess streambed fines, bed instability, higher water temperature, higher dissolved nutrient concentrations, and lack of deep pools and cover complexity. Anthropogenic effects were more pronounced in streams draining erodible sedimentary bedrock than in those draining more resistant volcanic terrain. Our findings suggest that the condition of fish and amphibian assemblages in Coast Range streams would be improved by reducing watershed activities that exacerbate erosion and mass-wasting of sediment; protecting and restoring multilayered structure and large, old trees in riparian zones; and managing landscapes so that large wood is delivered along with sediment in both natural and anthropogenic mass-wasting events. These three measures are likely to increase relative bed stability and decrease excess fines by decreasing sediment inputs and increasing energy-dissipating roughness from inchannel large wood and deep residual pools. Reducing sediment supply and transport to sustainable rates should also ensure adequate future supplies of sediment. In addition, these measures would provide more shade, bankside cover, pool volume, colder water, and more complex habitat structure.
5.3 Historic Changes in Fish Assemblages (Hughes et al., 2005)
The objective of this synthesis was to summarize patterns in historical changes in the fish assemblages of selected large American rivers, to document causes for those changes, and to suggest rehabilitation measures. Although not a statistically representative sample of large rivers, the book chapters indicated that physical and biological stressors usually had a greater impact on fish assemblages than chemical stressors (where point sources were treated). In particular, flow and channel regulation combined with invasive species were key factors affecting large river fish assemblages. And these factors were most pronounced for southwestern U.S. rivers.


Conclusions and Contribution to Environmental Understanding and Problem Solving:

Conducting biological assessments of stream condition at a national scale requires a classification scheme to report results, define reference conditions and expectations, and interpret data.  Given that such assessments are based on stream biota, we think that the classification should be driven by biology as well.  In this study, we examined site groupings based on clustering national fish and macroinvertebrate data sets. Our findings include:

  • Biological clusters did not form spatial adjacent groupings that can be mapped as discrete units.
  • Biological clusters were definable and predictable from indicator species.
  • Existing spatial classifications (e.g., ecoregions, drainage basins), as well as site and landscape data, explained less than half of the biological variability that can be explained by biological classifications.
  • Spatial schemes based on ecoregion, drainage basins, physiography, and political boundaries all had very similar classification strengths with respect to defining biological similarity indicating that the similarity was mostly strongly driven by site proximity.
  • Predicting biological clusters worked best when both site and catchment scale data were included.
  • Results were not strongly influenced by using data subsets that removed disturbed sites and alien species suggesting that the underlying mechanisms controlling the pattern in national scale fish and macroinvertebrate assemblages have strong natural drivers and are fairly robust to the effects of anthropogenic disturbance and introduced species.

Recently, EPA conducted a National Wadeable Streams Assessment (EPA 841-B-06-002).  Results and databases from this STAR project were used to help derive reporting units, reference conditions, and reference site data for that national EPA assessment.

Journal Articles on this Report : 4 Displayed | Download in RIS Format

Other project views: All 44 publications 11 publications in selected types All 4 journal articles
Type Citation Project Document Sources
Journal Article Gerth WJ, Herlihy AT. Effect of sampling different habitat types in regional macroinvertebrate bioassessment surveys. Journal of the North American Benthological Society 2006;25(2):501-512. R829498 (2004)
R829498 (Final)
  • Abstract: BioOne-Abstract
  • Journal Article Munn MD, Waite IR, Larsen DP, Herlihy AT. The relative influence of geographic location and reach-scale habitat on benthic invertebrate assemblages in six ecoregions. Environmental Monitoring and Assessment 2009;154(1-4):1-14. R829498 (Final)
  • Abstract from PubMed
  • Full-text: SpringerLink - full text PDF
  • Abstract: SpringerLink
  • Journal Article Waite IR, Herlihy AT, Larsen DP, Urquhart NS, Klemm DJ. The effects of macroinvertebrate taxonomic resolution in large landscape bioassessments: an example from the Mid-Atlantic Highlands, U.S.A. Freshwater Biology 2004;49(4):474-489. R829498 (2003)
    R829498 (Final)
    R829095 (Final)
    R829095C003 (2004)
  • Abstract: Wiley - Abstract
  • Journal Article Pan Y, Hill BH, Husby P, Hall RK, Kaufmann PR. Relationships between environmental variables and benthic diatom assemblages in California Central Valley streams (USA). Hydrobiologia 2006;561(1):119-130. R829498 (Final)
  • Abstract: Springer
  • Supplemental Keywords:

    Streams, rivers, fish, macroinvertebrates, periphyton, aquatic indicators, stream ecology, EMAP, classification, monitoring  , RFA, Scientific Discipline, Ecosystem Protection/Environmental Exposure & Risk, Hydrology, Aquatic Ecosystems & Estuarine Research, Aquatic Ecosystem, Ecology and Ecosystems, anthropogenic stress, bioassessment, classifying reference conditions, streams, anthropogenic impact, rivers, national classification system, aquatic ecosystems, water quality, biological indicators, ecological classification

    Progress and Final Reports:

    Original Abstract
  • 2002 Progress Report
  • 2003 Progress Report
  • 2004 Progress Report