Analyzing Data
DA.6. Using Statistics Responsibly
DA.6. Using Statistics Responsibly
- Authors
- G.W. Suter II
- S.B. Norton
- S.M. Cormier
- P. Shaw-Allen
- All CADDIS authors, contributors, and reviewers
Links to Fundamentals of Data Analysis
- Click to Expand/Collapse
Once data have been gathered, scrutinized for quality assurance, and potentially grouped or normalized according to appropriate parameters, the analysis of the trends and relationships may begin. This page provides advice on the proper use of statistical analysis within the framework of Stressor Identification (SI).
DA.6.1. Know the Data
It is important to know your data in order to avoid making errors in applying analytical methods or interpreting output.
Summary statistics and graphics facilitate comparisons, reveal the distribution of the data, help you decide whether to transform data, and provide insights into which analyses to use. Graphical methods (e.g. scatter plots and box plots) often help to determine whether the data support or weaken a candidate cause.
Quantify relationships between effects variables and measures of candidate causes, and among variables representing steps in a causal sequence. Correlation analyses measure the degree of association between variables. Regression analysis is the foundational method for quantifying relationships among variables. Other methods (e.g., species sensitivity distributions (SSDs), predicting environmental conditions from biological observations, and data normalization) rely on regression.
DA.6.2. Interpreting Differences
Caution is required when interpreting differences between site and reference observations or changes over a stressor gradient. Differences should be interpreted in terms of magnitude and consistency rather than statistical significance. Use caution when testing hypotheses that site observations differ from reference observations or that a biological response changes over a stressor gradient because:
-
Statistical significance does not equal biological significance.
-
Statistical significance:
- Only tells us whether the observed effect is greater than one we would expect due to random variation,
- Does not include sources of variance other than sampling error,
- Does not tell us whether variability in observations is caused by the stressor being analyzed, and
- Does not tell us whether variability in observations is biologically relevant.
-
Lack of statistical significance may be due to “beta error” rather than lack of a causal relationship.
-
Beta or Type II error occurs when sampling error is so high relative to the sample size that a biologically relevant difference between sites is not detected in a statistical hypothesis test. If you must apply hypothesis testing, first consider whether the minimum detectable difference (MDD) for your data is appropriate. The MDD is the minimum difference between sites that would lead to the rejection of the null hypothesis that site differences arise only from random sampling variation. If the MDD for the data in question is much larger than a biologically relevant difference, the results of your analysis may fail to detect biologically relevant effects. If the MDD is much smaller than a biologically relevant difference, the results of your analysis may indicate that sites are statistically different, but that difference may not be biologically relevant.
6.3. Jumping to Conclusions
Concluding that a candidate stressor is or is not the cause based on hypothesis testing results or the strength of a statistical relationship (e.g., a correlation coefficient) is inappropriate because:
- Stressors often covary with each other and with natural environmental attributes. A strong relationship between the biological response and candidate cause could reflect a covarying stressor or natural factor other than the candidate cause,
- Hypothesis testing was designed for interpreting controlled experiments with replicates and random assignment of treatments, and
- Field data from observational studies rarely include replicates and "treatments" are not randomly assigned, therefore
- Even strong associations do not prove causation.
Rather than relying on a statistical result, use all available types of evidence in the CADDIS inferential logic.
Fundamentals of Data Analysis Home Previous Page Next Page
![[logo] US EPA](http://www.epa.gov/epafiles/images/logo_epaseal.gif)