You are here:
STATISTICAL SAMPLING AND DATA ANALYSIS
The overall objective of the chemometrics and environmetrics program and this task is to examine and evaluate the statistical procedures and methods used in the measurement or experimentation process and to improve those procedures and methods (if deemed inadequate) by investigating, developing, and evaluating statistical methods, algorithms, and software to reduce data uncertainty. The measurement or experimentation process encompasses: decision objectives and design, sampling design, sampling, experimental design, quality control, data collection, signal processing and data manipulation, data analysis, validation, and decision analysis. Other general objectives of the program are to: evaluate certain existing, developed, or potential performance measurements for information content, relevancy, and cost-effectiveness. The objectives of the sampling research area are to provide the Agency with improved state-of-the-science guidance, strategies, and techniques to more accurately and effectively collect solid particulate field and laboratory subsamples that best represent the extent and degree of contamination at a given site.
Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other heterogeneous particulate solids are being investigated in order to obtain more representative subsamples and to reduce errors that commonly occur during sample collection and handling processes. The sampling research will evaluate the Pierre Gy particulate sampling theory for both laboratory and field subsampling practices. Robust statistical methods are being developed to better analyze and interpret data, and to reduce data uncertainty. The robust methods will focus on approaches that allow for a more accurate assessment of the dominant population or distribution of a data set while removing the influence of individual, or groups of, data points (outliers) that do not belong within that dominant population (i.e., the influence function approach), with an emphasis on graphical visualization. Those methods are applied to: outlier testing, distribution testing, principal component analysis, discriminant analysis, censored (truncated) data, statistical intervals, regression, parallel coordinates analysis, and geostatistical analyses, as well as other commonly used classical chemometric and environmetric methods. Such methods have general application and are useful to many environmental applications, such as: characterization and monitoring, ecological studies, risk assessment, exposure models, decisions on the extent of contamination, and remediation strategy and evaluation. Because of their complexity, those methods are integrated into a developed software package, Scout, so that users can easily apply those methods to their work. Various measurement designs, geostatistical strategies, and computer algorithms are also being developed for improving: the cost-effectiveness of sampling, the estimation of concentration values in areas between known points, and the decision process in characterizing and remediating solid wastes.