Science Inventory

Evaluation of censoring-related bias in the mean

Citation:

George, BJ, K. Broms, L. Gains-Germain, K. Black, J. Simmons, AND M. Hays. Evaluation of censoring-related bias in the mean. Symposium on Detection Limits, Washington, DC, October 25 - 26, 2018.

Impact/Purpose:

Bias in means is of concern because modern environmental data are characterized by many values near or below reporting limits where additive and nonadditive effects may occur.

Description:

Censoring at detection and reporting limits focuses on repeatable detection by protecting against false positives at the potential cost of bias. This work examines censoring-related bias in estimates of the mean through a case study of non-censored dibenzo[a,h]anthracene data (n=47) from a recent EPA study of polycyclic aromatic hydrocarbons (PAH) in particulate matter emissions from residential cookstoves. Bias in means is of concern because modern environmental data are characterized by many values near or below reporting limits where additive and nonadditive effects may occur. In this case study, data were censored at two levels: method detection limit (MDL) (n=39 detected) and calibration curve lowest value (CCLV) (n=21 detected). Means were estimated using conventional approaches: complete case (omitting non-detect observations), substitution of MDL/2 or CCLV/2, maximum likelihood estimation (a parametric approach), robust regression on order statistics (a semi-parametric approach), and Kaplan-Meier analysis (a non-parametric approach). With 17% of the data censored at the MDL, substitution of MDL/2 yielded the least-biased mean while complete case, maximum likelihood estimation, and regression on order statistics analyses yielded the most-biased. With 55% of the data censored at the calibration curve lowest value, substitution of CCLV/2 and maximum likelihood estimation yielded the least-biased means while complete case and Kaplan-Meier analyses yielded the most-biased. A simulation study of two lognormal distributions extended the case study: one mildly skewed and the other highly skewed to correspond to the dibenzo[a,h]anthracene data. For each combination of sample sizes n=20 and n=50 with 30% and 50% of the observations non-detects, 1000 data sets were simulated, means estimated, and bias calculated to assess censoring relative to using all available data. Substitution of MDL/2 or CCLV/2 was least-biased for the dibenzo[a,h]anthracene experimental data whereas maximum likelihood estimation was generally least-biased in the larger simulation study. (This abstract does not reflect EPA policy.)

Record Details:

Record Type:DOCUMENT( PRESENTATION/ SLIDE)
Product Published Date:10/25/2018
Record Last Revised:11/01/2018
OMB Category:Other
Record ID: 343035