Analyzing Data
M.1. Scatter Plots
M.1. Scatter Plots
- 1. What are scatter plots?
- 2. How do I use scatter plots in Stressor Identification?
- 3. Can I use scatter plots with my data?
- 4. Helpful tips
- Authors
- P. Shaw-Allen
- G.W. Suter II
- S.M. Cormier
- All CADDIS authors, contributors, and reviewers
Links to Methods
- Click to Expand/Collapse
M.1.1. What are Scatter Plots?
Scatter plots are graphical displays of matched data plotted with one variable on axis X and the other variable on axis Y. Data are plotted with measures of an influential parameter on the x-axis (independent variable) and measures of an attribute that may respond to the influential parameter on the y-axis (dependent variable).
Scatter plots are a useful first step in any analysis because they help you:
Choose which relationships to model – A scatter of points that suggests the attribute responds to changes in the independent variable would be explored further using correlation or regression methods, while a scatter of points without any apparent relationship is unlikely to provide insights into relationships, even using multivariate analyses.
Select a model - The distribution of points in a scatter plot may suggest whether the relationship is, for example, (A) linear, (B) a higher-order polynomial (quadratic shown), (C) exponential, or (D) logarithmic (Figure M.1-1). The distribution of points also may reveal apparent thresholds or discontinuities in the relationship.
M.1.2. How Do I Use Scatter Plots in Stressor Identification?
For stressor identification, the independent variable in a scatter plot most commonly represents the candidate causal stressor, such as temperature or ionic strength. Other useful independent variables represent sources or other links in the causal chain. The dependent variable is usually a measure of a biological attribute related to the impairment, such as the number of brook trout or of Ephemeroptera, Plecoptera and Trichoptera (EPT) species. Scatter plots can also be used to explore relationships among environmental variables along a conceptual model pathway (e.g., nutrients and dissolved oxygen) or the influence of natural or background factors that could be minimized through classifying sites or normalizing data.
Plots of typical field-collected data, such as Figure M.1-2, rarely show tight linear associations, yet they contain much extractable information. The tight cluster of points circled in red at zero percent fines may indicate an unusual condition associated with low sediment supply or high-powered streams with resistant bedrock substrate. The observation-free areas denoted by the yellow triangles and the changing breadth of EPT species richness over the stressor gradient (e.g., broadest at 20% fines) should be considered when selecting analysis options.
Scatter plots also may reveal features of the data that can interfere with effective modeling or lead to misleading results, such as outliers or non-normal data distributions. For example, Figure M.1-3 presents a statistically significant linear regression. However, the scatter plot, with its wedge-shaped scatter of data points and wedge-shaped data-free area, suggests that the residual variance is not homogeneous. Instead, the relationship between the number of EPT taxa (Y-axis) and chemical oxygen demand (X-axis) appears to depend upon the mean value of the response variable. Generalized linear models or quantile regression may be more appropriate.
Step 2: Listing Candidate Causes
A matrix of scatter plots, as in Figure M.1-4, may be used to identify stressors as candidate causes. The scatter in each cell can be visually inspected to determine if a relationship between the stressor and the biological attribute exists (see the plots circled in blue). In practice, many more relationships would be included in the matrix. As with correlation analysis, this approach helps to ensure that an investigator does not overlook a plausible candidate causal stressor. However, scatter plots should not be used to remove a candidate causal stressor from the list if other evidence supports its inclusion.
Step 3: Evaluating Data from the Case
Scatter plots using data from the case can provide evidence of stressor-response relationships or steps in the causal pathway. Plot observations of biological attributes against a stressor gradient to determine whether any attributes change in the expected direction. For example, the left plot of Figure M.1-5 does not suggest that minimum DO causes a change in relative fish weight while the middle and right plots suggest that increases in minimum DO decreases the proportion of DELT anomalies and increases the proportion of mayflies (Little Scioto Creek Case study). These plots provide evidence that supports DO as a candidate cause of increased DELT anomalies and decreased proportion mayflies. Data supporting or weakening a causal pathway, such as the relationship between nutrients and dissolved oxygen, can be evaluated in a similar fashion.
Step 4: Evaluating Data from Elsewhere
Scatter plots can also be used to generate evidence for stressor-response relationships from other field studies and stressor-response relationships from laboratory studies. Scatter plots of regional monitoring data will reveal whether biological attributes of interest change over stressor gradients occurring in the region. Comparing data from the assessment site with these plots can identify whether the biological response at the assessment site is consistent with biological responses at regional sites with similar levels of a given stressor. Scatter plots of laboratory data assembled from one or more sources may be used to confirm that the biological attribute changes over a stressor gradient in the absence of other stressors.
M.1.3. Can I Use Scatter Plots with My Data?
Creating scatter plots is a simple first step in data exploration which can be used with any matched data.
M.1.4. Helpful Tips
- Using points with different colors or shapes in a scatter plot helps distinguish between categories of data, such as data from different seasons or tributaries. This can also help identify appropriate classification strategies when dealing with regional data by revealing how distinct data from different ecoregions or altitudes are related.
- Parametric or nonparametric regression lines may be overlaid upon scatter plots to help clarify relationships.
- Scatter plots do not need to be limited to two dimensions. A third axis can be used to explore whether two environmental variables potentially interact. This type of outcome, in turn, may influence method selection, classification, or normalization approaches.
- If it is not feasible to plot in three dimensions, it may be possible to categorize points according to differing levels of the second environmental parameter and plot these in different colors or shapes to reveal interactions.
Data Analysis Methods Home Previous Page Next Page
![[logo] US EPA](http://www.epa.gov/epafiles/images/logo_epaseal.gif)