Science Inventory

Scout 2008 Version 1.0 User Guide

Citation:

Singh, A., R. Maichle, N. Armbya, A. K. Singh, AND J. M. NOCERINO. Scout 2008 Version 1.0 User Guide. U.S. Environmental Protection Agency, Washington, DC, EPA/600/R-08/038, 2008.

Impact/Purpose:

The SCOUT 2008 upgrade contains the ProUCL 4.0 and parallAX software, plus a wide range of other statistical tests for environmental use by the agency and the worldwide scientific community. Development and implementation of such methods should improve environmental data quality, usefulness of statistical procedures, the ability to evaluate the quality (total uncerainty) of analytical data submitted for decsion making, and reduce data uncertainty.

Description:

The Scout 2008 version 1.0 software package provides a wide variety of classical and robust statistical methods that are not typically available in other commercial software packages. A major part of Scout deals with classical, robust, and resistant univariate and multivariate outlier identification, and robust estimation methods that have been available in the statistical literature over the last three decades. Outliers in a data set represent those observations which do not follow the pattern displayed by the majority (bulk) of the data. It should be pointed out that all of the outlier identification methods are meant to identify outliers in a data set typically representing a single population. Outlier identification methods are not meant to be used on clustered data sets representing mixture data sets, especially when more than two clusters may be present in the data set. On data sets having several clusters, other methods such as cluster analysis and principal component analysis may be used. Several robust estimation and outlier identification methods that have been incorporated into Scout 2008 include: the iterative classical method, the iterative influence function (e.g., Biweight, Huber, PROP)-based M-estimates method, the multivariate trimming (MVT) method, the least median-of-squared residuals (LMS) regression method, and the minimum covariance determinant (MCD) method. Some initial choices for the iterative estimation of location and scale are also included in Scout 2008, including the orthogonalized Kettenring and Gnanadesikan (OKG) method; the median, median absolute deviation (MAD), or interquartile range (IQR)-based methods; and the MCD method. Scout offers classical and robust methods to estimate: the multivariate location and scale, univariate robust intervals, multiple linear regression parameters, principal components (PCs), and discriminant (Fisher, linear, and quadratic) functions (DFs). The discriminant analysis module of Scout can perform cross validation using several methods, including leave-one-out (LOO), split samples, M-fold validation, and bootstrap methods. Below detection limit observations or nondetect (ND) data are inevitable in many environmental and chemometrics applications. Scout has several univariate graphical and inferential methods that can be used on uncensored full data sets and also on left-censored data sets with below detection limit (DL) observations. Specifically, Scout can be used to: compute various interval estimates, perform typical univariate goodness-of-fit (GOF) tests, and perform single and two-sample hypothesis tests on uncensored data sets and left-censored data sets with nondetect observations. Classical univariate statistical inference methods (e.g., intervals and hypothesis testing) in Scout 2008 can also handle data sets with below detection limit observations. In Scout 2008, emphasis is given to graphical displays of multivariate data sets. Both two-dimensional and three-dimensional graphs can be generated using Scout. The classical and robust methods listed above are supplemented with formal multivariate classical and robust graphical displays, including the quantile-quantile (Q-Q) plots of the Mahalanobis distances (MDs), index plots of the MDs, distance-distance (D-D) plots, scatter plots of raw data, PC scores, and DF scores with prediction or tolerance ellipsoids superimposed on the respective scatter plots. Those graphical displays can be drawn using the critical values of the MDs obtained using the exact scaled beta distribution of the MDs or an approximate chi-square distribution of the MDs. Some graphical classical and robust methods comparison tools are also available in Scout so that one can graphically compare the performances of those methods. Scout can be used to display tolerance ellipsoids or prediction ellipsoids for the various outlier identification methods on the same graph and to display robust regression lines for the various regression methods on the same graph. Scout 2008 also offers some GOF test statistics to assess multivariate normality. Several GOF test statistics, including the multivariate kurtosis, the skewness, and the correlation coefficient between the ordered MDs and the scaled beta (or chi-square) distribution quantiles, are displayed on a Q-Q plot of the MDs. The associated critical values of those GOF test statistics (obtained via extensive simulation experiments) are also displayed on the graphical displays of the Q-Q plots of the MDs. Two standalone software packages, ProUCL 4.00.02 and ParallAX, have also been incorporated into Scout 2008. ProUCL 4.00.02 is a statistical software package developed to address environmental applications, whereas the ParallAX software offers graphical and classification tools to analyze multivariate data using the parallel coordinates.

Record Details:

Record Type:DOCUMENT( PUBLISHED REPORT/ USER'S GUIDE)
Product Published Date:03/31/2008
Record Last Revised:12/10/2009
OMB Category:Other
Record ID: 189812