Impact/Purpose:
The overall objective of the chemometrics and environmetrics program and this task is to examine and evaluate the statistical procedures and methods used in the measurement or experimentation process and to improve those procedures and methods (if deemed inadequate) by investigating, developing, and evaluating statistical methods, algorithms, and software to reduce data uncertainty. The measurement or experimentation process encompasses: decision objectives and design, sampling design, sampling, experimental design, quality control, data collection, signal processing and data manipulation, data analysis, validation, and decision analysis. Other general objectives of the program are to: evaluate certain existing, developed, or potential performance measurements for information content, relevancy, and cost-effectiveness. The objectives of the sampling research area are to provide the Agency with improved state-of-the-science guidance, strategies, and techniques to more accurately and effectively collect solid particulate field and laboratory subsamples that best represent the extent and degree of contamination at a given site.
Description:
Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other heterogeneous particulate solids are being investigated in order to obtain more representative subsamples and to reduce errors that commonly occur during sample collection and handling processes. The sampling research will evaluate the Pierre Gy particulate sampling theory for both laboratory and field subsampling practices. Robust statistical methods are being developed to better analyze and interpret data, and to reduce data uncertainty. The robust methods will focus on approaches that allow for a more accurate assessment of the dominant population or distribution of a data set while removing the influence of individual, or groups of, data points (outliers) that do not belong within that dominant population (i.e., the influence function approach), with an emphasis on graphical visualization. Those methods are applied to: outlier testing, distribution testing, principal component analysis, discriminant analysis, censored (truncated) data, statistical intervals, regression, parallel coordinates analysis, and geostatistical analyses, as well as other commonly used classical chemometric and environmetric methods. Such methods have general application and are useful to many environmental applications, such as: characterization and monitoring, ecological studies, risk assessment, exposure models, decisions on the extent of contamination, and remediation strategy and evaluation. Because of their complexity, those methods are integrated into a developed software package, Scout, so that users can easily apply those methods to their work. Various measurement designs, geostatistical strategies, and computer algorithms are also being developed for improving: the cost-effectiveness of sampling, the estimation of concentration values in areas between known points, and the decision process in characterizing and remediating solid wastes.
Keywords:
CHEMOMETRICS, ENVIRONMETRICS, STATISTICAL DATA ANALYSIS, ROBUST STATISTICS, OUTLIERS, GEOSTATISTICS, SAMPLING ERROR, REPRESENTATIVE SAMPLING, DECISION ANALYSIS, HAZARDOUS WASTE, STATISTICS,
Project Information:
Progress
:Evaluation of Pierre Gy Sampling and Subsampling Errors for Laboratory Samples: Experimental designs have been developed for the laboratory subsampling pilot studies to examine the effectiveness of Pierre Gy's particulate sampling theory on environmental samples. Several sets of laboratory experiments were done as a collaborative effort between the U.S. EPA NERL ESD-LV and the U.S. EPA NEIC in Denver, Colorado, to investigate the effect of various sample size reduction practices on estimating analyte concentrations of particulate samples. Particulate mixtures of known composition were prepared and repeatedly subsampled and analyzed. Results indicated that: grab sampling showed extremely poor performance and is not recommended; sectorial splitting gave the best performance and is a recommended method; incremental subsampling; riffle splitting; and paper cone riffle splitting gave good results and performed better than cone-and-quartering or fractional shoveling. Controlled experiments showed that the fundamental sampling error accounts for about 50% to 70% of the total measurement error. The summary experimental data agree with Gy sampling theory and demonstrate that Gy theory should be followed if one wants to meet preset goals with respect to precision and bias when particulate samples are involved in environmental studies. Several reports, peer-reviewed papers, and a guidance document have been completed on obtaining representative laboratory subsamples from particulate materials have been completed. An invited critical review of representative sampling was submitted for publication in a special issue of Environmental Forensics dedicated to the theme of representativeness (see the publications list, below).
Robust Statistical Methods Development: Simulations were run to assess the accuracy of the critical values for some of the distance metrics previously developed for the PROP-type influence procedures. Software was written to perform those tests and to test the ruggedness of the PROP procedure and compare its performance to other robust methods under a variety of conditions. Those tests were only done so far for small (less than 60 observations) univariate normal distributions. Several draft manuscripts are being written and submitted for publication.
Scout for Windows Software Development: The code for the Scout for Windows software has been optimized for efficiency and interaction. Robust principal component and discriminant, regression, censored data, and parallel axes modules based on our developed theory are being incorporated into Scout. The latest version (v. 2.5.8) is ready for alpha testing. More meetings were held with the world's leading authority, Professor Alfred Inselberg (Tel Aviv University), to develop the parallel coordinates module. Technology Support Bulletin, "Scout: A Data Analysis Program" was revised.
Guidance on Robust Statistics: Several chapters (about 40% complete) for a guidance document on robust statistics (much of which is based on our research) have been prepared in draft form. Research to support this guidance is ongoing.
Sponsored and Produced Conferences: The proceedings with selected papers of the Third International Conference on Chemometrics and Environmetrics were published in a special conference edition of Chemometrics and Intelligent Laboratory Systems. The Fourth International Conference on Environmetrics and Chemometrics was held in Las Vegas, NV, on September 18-20, 2000. The proceedings and selected chemometrics papers were published in a special conference edition of Chemometrics and Intelligent Laboratory Systems, and selected environmetrics papers were published in two volumes of Environmetrics (see conferences listing, below). We also co-sponsored the Eighth International CAC conference for September 2002 and selected conference papers were published in a special conference edition of Analytica Chimica Acta . Two papers were pr
Relevance
:Environmental data quality could be vulnerable if the use of statistical design procedures is limited, leading to an inability to fully evaluate the quality (total uncertainty) of analytical data submitted for decision making. One of the keys to improving the effectiveness and efficiency of Agency programs is the development of cost-effective integrated chemometric and environmetric robust methods and procedures that can be implemented by any experiment or measurement study which provides scientifically and legally defensible data. The development and implementation of such methods should: improve environmental data quality at the US EPA, improve the usefulness of statistical design procedures, improve the Agency's ability to fully evaluate the quality (total uncertainty) of analytical data submitted for decision making, and reduce data uncertainty. Basic research in this area continues to be needed to address the Agency's mission. General, theoretical, statistically-designed and experimentally-verified approaches should improve and provide a better understanding of our data and assure the quality of our performance measurements. Certainly, a thorough understanding of the quality and usefulness of data is pivotal to any measurement, characterization, or monitoring program and for decision analysis.
Relationship to NERL and ORD Research Strategies. Research will be conducted under this task to meet the strategic needs of several of ORD's high-priority research needs and ORD's long-term goals and objectives categories. Research will be performed "to develop scientifically sound approaches to assessing and characterizing risks to human health and the environment" (ORD long-term goal #1) associated with improper sample collection and handling techniques (ORD Strategic Plan, 1997). Through the development of proven state-of-the-science protocols for the selection of where to sample (i.e., network and measurement design), how to collect a representative subsample (i.e., Gy theory), and how to statistically analyze (i.e., robust statistical methods and Scout software) the resultant data, decision makers will be provided with more accurate and precise data, resulting in more cost-effective and accurate decisions. Further, through the development of improved representative subsampling techniques, our ability to more accurately detect, assess, characterize, and monitor the extent of hazards in the environment will be markedly improved. The resulting data can then, in turn, be applied to validate and verify source-exposure models where accurate data are critical to perform meaningful and useful health risk assessments. Improved soil sampling methods and measurement designs are being developed "to provide common sense, cost-effective approaches for preventing and managing risks" (ORD long-term goal #3). One of the primary objectives of this task is to develop improved and statistically valid sampling designs and networks that allow site characterization to be done more effectively and cheaply without sacrificing the accuracy of site assessment. The exchange of scientific information through technical transfer is an integral function of the research performed under this project. Such efforts are well described by ORD long-term goal #4, "to provide credible, state-of-the-science risk assessments, methods, models, and guidance," and ORD long-term goal #5, "to exchange reliable scientific, engineering, and risk assessment/risk management information among private and public stakeholders." Technical transfer will be accomplished through conferences, journal articles, reports, guidance documents, books, courses, and software. Because of the complexity of the methods developed under this research, it is critical that software is developed for the clients; otherwise, there would be no motivation to apply those methods. Research being conducted under this task also addresses issues in ORD's high priority category of
Clients
:Dana Tulis, OERR, All 10 Regional Offices; Nancy Wentworth - OEI;John Warren - OEI; Jan Young - OSW; Carl Daly - OSW; Deanna Crumbling - TIFS/OSRTI/EPA
Project IDs:
ID Code
:4879
Project type
:OMIS