Analyzing Data
M.7. Quantile Regression
M.7. Quantile Regression
- 1. What is quantile regression?
- 2. How do I use quantile regression in Stressor Identification?
- 3. Can I use quantile regression with my data?
- 4. Helpful tips
- Authors
- P. Shaw-Allen
- G.W. Suter II
- S.M. Cormier
- L.L. Yuan
- All CADDIS authors, contributors, and reviewers
Links to Methods
- Click to Expand/Collapse
M.7.1. What is Quantile Regression?
Quantile regression models the relationship between a specified quantile of a dependent (response) variable and an independent (explanatory) variable. For example, modeling the 50th quantile of a response variable produces the median line under which 50% of the observed responses are located and modeling the 90th quantile produces a line under which 90% of the observed responses are located (Figure M.7-1).
M.7.2. How Do I Use Quantile Regression in Stressor Identification?
The chief application of quantile regression in a causal assessment is in Step 4: Evaluate data from elsewhere, where it may be used to provide evidence of stressor-response relationships from other field studies. Quantile regression provides a means of estimating the location of the upper boundary of a scatter plot (e.g., the 90th percentile line in Figure M.7-1). This upper boundary may approximate the effects of a single stressor for data in which many different stressors co-occur, and all of these stressors have negative effects on the biological response. Inference is based on the proximity of observations from the site of the impairment to this upper boundary.
Inferences based on quantile regression are qualitative and comparative. In the example shown in Figure M.7-2, data from the impaired site (open red circles) are plotted on scatter plots comparing regional EPT richness with two candidate stressors (increased percent sand/fines and increased total nitrogen). Because the plots show the impaired site closer to the upper boundary of the percent sand/fines relationship compared to the total nitrogen relationship, we conclude that percent sand/fines exerts a stronger influence on the observed EPT richness at the site in question. This analysis would support the case for percent sand/fines as the cause of the observed impairment and weaken the case for total nitrogen.
M.7.3. Can I Use Quantile Regression with my Data?
Quantile regression requires matching data points and the assumption that the data wedge is the result of other stressors co-occurring with the modeled stressor which cause additional decline in biological response over the stressor gradient.
M.7.4. Helpful Tips
-
While quantile regression is a parametric analysis, assumptions of normality and homogeneity of variance are relaxed. Quantile regression is robust to outliers in dependent (Y-axis) variables, but is sensitive to points sparsely distributed toward the extremes of the independent (X-axis) variable. In cases where such leverage points are present, use a weighted quantile regression.
-
The influence of outliers, censored data, data clusters, and leverage points may be evaluated by comparing plots after removing (or, in the case of leverage points, weighting) these points. Any data pruning of this nature must be transparently described. The points should remain on the plot with flags indicating whether they were weighted or omitted from the model.
-
If data from the impaired site are located far outside the upper boundary determined from regional data, it may be an indication that the comparison to the regional data is not valid. This situation can arise for a variety reasons. For example, field sampling methods applied at the impaired site may differ significantly from those applied to collect the regional data. In general, large outliers should be inspected carefully to determine whether they can be usefully compared to regional data.
-
Classification and normalization decisions made to control competing stressors or natural variability may influence the results of quantile regression. This influence can be seen by comparing quantile regression results before and after data pruning. In cases where site classification has been applied, plots using different colors or symbols for the different classes can help identify and characterize potentially influential clusters or points.
-
CADDIS includes examples of quantile regression analyses for community metrics, sediments, and metals in the Field Stressor-Response Association Gallery.
Data Analysis Methods Home Previous Page Next Page
![[logo] US EPA](http://www.epa.gov/epafiles/images/logo_epaseal.gif)