Science Inventory

ESTIMATING UNCERTAINITIES IN FACTOR ANALYTIC MODELS

Citation:

EBERLY, S. I., P. PAATERO, AND P. K. HOPKE. ESTIMATING UNCERTAINITIES IN FACTOR ANALYTIC MODELS. Presented at American Association of Aerosol Research 2005, Atlanta, GA, February 07 - 11, 2005.

Impact/Purpose:

To deliver improved, documented, and tested receptor models for use by State and local air pollution staff, as tools for SIP development.

Description:

When interpreting results from factor analytic models as used in receptor modeling, it is important to quantify the uncertainties in those results. For example, if the presence of a species on one of the factors is necessary to interpret the factor as originating from a certain emission source category or source region, then that species should be present on that factor with high certainty. In this work, we examine three methods for determining uncertainties of computed factors, F, and their respective time series, G, in the non-negatively constrained factor analytic model X=G*F+E, as applied in atmospheric sciences. The methods include linear error propagation of uncertainties of X separately to G and separately to F, bootstrapping, and bootstrapping enhanced with random rotational forcing to provoke rotations in G and F.

Linear error propagation is simple to describe and implement. However, since the errors in X are not always well known in environmental applications and since the factor analytic model is non-linear by definition, the uncertainty estimates from linear error propagation are questionable. Bootstrapping is also simple to describe although computer intensive when implemented, but it, similar to linear error propagation, assumes that the solution is unique and therefore ignores the uncertainty that arises from the rotational ambiguity in the results.

The methods are applied to simulated data where X, F, and G, and the uncertainties in X are known. By using simulated data, we can compare the F and G uncertainties, as estimated by the three methods, with the differences (estimated value minus true value) of F and G. When tested on simulated data with low errors and without rotational freedom, all three methods agree favorably. When the error levels are increased, the bootstrap methods produce better uncertainty estimates than linear error propagation. Finally, when rotational freedom is introduced to the simulated data, bootstrapping with rotational forcing reproduces the uncertainties in computed F and G most accurately.

Next the methods are applied to a real atmospheric data set where unknown errors are present. Real aerosol data from Phoenix (1995-1998) are analyzed. Computed error limits are compared with source profiles that have been used in previous source apportionment studies for Phoenix. Bootstrapping with rotational forcing is applied with parameters as optimized based on the simulated data. The computed confidence intervals mostly cover the true values, although a confidence percentage cannot be quoted.

Lastly, we show how the uncertainty estimates are impacted by the assumed errors in X. For some species, even if the assumed errors are doubled, the uncertainty estimates from bootstrapping remain the same.

Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official Agency policy.

Record Details:

Record Type:DOCUMENT( PRESENTATION/ ABSTRACT)
Product Published Date:02/09/2005
Record Last Revised:06/21/2006
Record ID: 116156