2005 Progress Report: Development of Advanced Factor Analysis Methods for Carbonaceous PM Source Identification and Apportionment

EPA Grant Number: R831078
Title: Development of Advanced Factor Analysis Methods for Carbonaceous PM Source Identification and Apportionment
Investigators: Hopke, Philip K. , Henry, Ronald C. , Paatero, P. , Spiegelman, C.
Institution: Clarkson University , Texas A & M University , University of Southern California
Current Institution: Clarkson University , Texas A & M University , University of Helsinki , University of Southern California
EPA Project Officer: Chung, Serena
Project Period: October 1, 2003 through September 30, 2006 (Extended to December 31, 2006)
Project Period Covered by this Report: October 1, 2004 through September 30, 2005
Project Amount: $450,000
RFA: Measurement, Modeling, and Analysis Methods for Airborne Carbonaceous Fine Particulate Matter (PM2.5) (2003) RFA Text |  Recipients Lists
Research Category: Air , Air Quality and Air Toxics , Particulate Matter

Objective:

Because of controls on precursor gases that lead to sulfate and nitrate formation, carbonaceous particles are becoming a larger fraction of the fine particle aerosol. Accurate source identification and apportionment will be important for developing effective control strategies for areas found to be out of attainment of the fine particulate matter (PM2.5) standard. In addition, there is increasing interest in epidemiological studies to relate adverse health effects to apportioned source contributions. Thus, the objective of this research project is to combine the best features of the two advanced factor analysis models, UNMIX and Positive Matrix Factorization (PMF), and to test the effectiveness of this improved factor analysis methodology by analysis of the data developed in the various Supersites, with an emphasis on data from the New York City Supersite and other data from New York State.

Progress Summary:

Previously, we developed a new geometrical view of factor analysis in either the source profile or contribution space (Henry, 2005) and a graphical method to examine the results of the rotation in PMF (Paatero, et al., 2005). To assist with the identification of local sources, the use of the conditional probability function (CPF) and non-parametric regression (NPR) have been compared by Kim and Hopke (2004).

Another aspect of the problem of rotational ambiguity is the use of expanded models to resolve factors. This approach uses two sets of modeling equations. We are exploring if the use of the additional equations begins to provide a measure of identifiability in the solution. We have applied the expanded model to a number of Speciation Trends Network data sets from the Midwestern United States. In general, the expanded model has not provided increased resolution of sources or better specificity in point source identification. We have been disappointed in the limited utility that this model has apparently shown, and we continue to explore the reasons why we are not getting the results that were anticipated.

We are continuing to work on error estimation methods. Dr. Spiegelman and Dr. Byron Gajewski have developed a new jackknife approach that should accurately evaluate estimation uncertainties for all rotationally invariant pollution sources that are found by receptor models. The main exceptions are those pollution sources that require tracer species to be identified. A manuscript on this approach is in preparation. The method was described during an invited discussion at the Joint Statistical Meetings in Minneapolis, Minnesota, in August 2005.

Drs. Paatero and Hopke are working in conjunction with Shelly Eberly of the U.S. Environmental Protection Agency on an alternative bootstrapping method in which there also is random pulling of the source profile elements to permit the assessment of both the measurement error and the extent of rotation ambiguity, which can exist in the profiles without unduly distorting the fit to the data. A manuscript is in preparation describing the initial results.

Another problem that arises in data analysis is the presence of missing data. Dr. Henry is working on the imputation of missing values using time series methods. Typical air quality time series consist of consecutive hourly observations of concentrations of ozone, nitrogen oxides, carbon monoxide, and other gaseous pollutants as well as hourly observations of concentrations of airborne particles such as PM10. More often, time series of concentrations of airborne particulate mass and composition are 24-hour averages observed every third or sixth day. These time series may exhibit periodic daily, weekly, and seasonal variations. The power spectrum of the Discrete Fourier Transform (DFT) is a natural choice for quantitative analysis of the periodicities in air quality time series.

An alternative to the Fast Fourier Transform (FFT) for estimating the DFT is a least-squares fit of sines and cosines to the data. Before this approach can be practical, some major difficulties must be overcome. The FFT was invented to avoid the computational inefficiency of repeatedly calculating sines and cosines as required by the least-squares approach. Also, the standard least-squares equations are not time invariant, a serious theoretical and practical problem. With ever increasing computing power, the computational penalties of the least-squares approach are less and less of a problem. A time-invariant version of the least-squares fit equations was developed by the astrophysicist N. Lomb and used to directly estimate the power spectrum without explicitly calculating the DFT. In statistics, an estimate of the power spectrum at a finite set of frequencies is known as a periodogram. A paper that discusses the application of the Lomb periodogram to air quality time series is in preparation.

Future Activities:

We will continue to work on an approach to determine the uncertainties in the resolved source profiles and contributions that includes both sampling and measurement uncertainties as well as rotational ambiguities. Such an approach will not be valid for highly time resolved data in which there is substantial serial correlation. For such situations, the bootstrapping will need to be modified by blocks of data that preserve a sufficient amount of the serial correlation for a valid measure of the uncertainties. In conjunction with Dr. Spiegelman, we hope to develop a set of empirical rules that will permit us to adjust the block size in the bootstrap to a block that is sufficiently large as to reduce the effect of serial correlation to a negligible level.

We will continue to explore various kinds of data to ascertain the utility of expanded models to resolve additional sources and potentially reduce the rotational ambiguity in the resulting solutions.

The main focus of this year’s work will be a study of the St. Louis Supersite data in collaborations with Dr. James J. Schauer of the University of Wisconsin and Dr. Jay Turner of Washington University. At the St. Louis Supersite, daily integrated samples were collected and analyzed for elements and carbonaceous species. Organic carbon (OC) and elemental carbon (EC) were measured by both the ACE-Asia variant of the NIOSH 5040 method and the IMPROVE protocol. In addition, 200 of the roughly 700 samples were analyzed by GC/MS for molecular marker elements such that the sources of the primary organic matter can be identified and apportioned using chemical mass balance methods. We will be able to compare the results of the PMF analysis of the integrated data using the IMPROVE data with the analysis of the same elemental data and the ACE-Asia OC/EC. For the subset of samples for which the molecular markers are available, we can analyze the data sets with the molecular markers as additional species in the analysis. These results can be compared to the CMB and PMF analyses of only the molecular marker species. Such a set of intercomparisons will help to identify the relative information content of the different OC/EC protocols as well as the ability of the PMF analysis to separate sources relative to what was obtained in the CMB analysis. This comparison will be particularly of interest with respect to the separation of diesel from gasoline vehicle emissions.


Journal Articles on this Report : 8 Displayed | Download in RIS Format

Other project views: All 37 publications 21 publications in selected types All 21 journal articles
Type Citation Project Document Sources
Journal Article Henry RC. Duality in multivariate receptor models. Chemometrics and Intelligent Laboratory Systems 2005;77(1-2):59-63. R831078 (2004)
R831078 (2005)
R831078 (Final)
  • Full-text: ScienceDirect-Full Text HTML
    Exit
  • Abstract: ScienceDirect-Abstract
    Exit
  • Other: ScienceDirect-Full Text PDF
    Exit
  • Journal Article Kim E, Hopke PK. Comparison between conditional probability function and nonparametric regression for fine particle source directions. Atmospheric Environment 2004;38(28):4667-4673. R831078 (2005)
    R831078 (Final)
  • Full-text: ScienceDirect-Full Text HTML
    Exit
  • Abstract: ScienceDirect-Abstract
    Exit
  • Other: ScienceDirect-Full Text PDF
    Exit
  • Journal Article Kim E, Hopke PK, Qin Y. Estimation of organic carbon blank values and error structures of the speciation trends network data for source apportionment. Journal of the Air & Waste Management Association 2005;55(8):1190-1199. R831078 (2005)
    R831078 (Final)
  • Abstract from PubMed
  • Full-text: Taylor & Francis-Full Text PDF
    Exit
  • Abstract: Taylor & Francis-Abstract
    Exit
  • Other: ResearchGate-Full Text PDF
    Exit
  • Journal Article Kim E, Hopke PK. Identification of fine particle sources in Mid-Atlantic US area. Water, Air, & Soil Pollution 2005;168(1-4):391-421. R831078 (2005)
    R831078 (Final)
  • Abstract: Springer-Abstract
    Exit
  • Other: ResearchGate-Abstract
    Exit
  • Journal Article Kim E, Hopke PK. Improving source apportionment of fine particles in the eastern United States utilizing temperature-resolved carbon fractions. Journal of the Air & Waste Management Association 2005;55(10):1456-1463. R831078 (2005)
    R831078 (Final)
  • Abstract from PubMed
  • Full-text: Taylor&Francis-Full Text PDF
    Exit
  • Abstract: Taylor&Francis-Abstract
    Exit
  • Journal Article Ogulei D, Hopke PK, Wallace LA. Analysis of indoor particle size distributions in an occupied townhouse using positive matrix factorization. Indoor Air 2006;16(3):204-215. R831078 (2005)
    R831078 (Final)
  • Abstract from PubMed
  • Full-text: ResearchGate-Abstract & Full Text PDF
    Exit
  • Abstract: Wiley-Abstract
    Exit
  • Journal Article Ogulei D, Hopke PK, Zhou L, Pancras JP, Nair N, Ondov JM. Source apportionment of Baltimore aerosol from combined size distribution and chemical composition data. Atmospheric Environment 2006;40(Suppl 2):396-410. R831078 (2005)
    R831078 (Final)
  • Full-text: ScienceDirect-Full Text HTML
    Exit
  • Abstract: ScienceDirect-Abstract
    Exit
  • Other: ScienceDirect-Full Text PDF
    Exit
  • Journal Article Paatero P, Hopke PK, Begum BA, Biswas SK. A graphical diagnostic method for assessing the rotation in factor analytical models of atmospheric pollution. Atmospheric Environment 2005;39(1):193-201. R831078 (2004)
    R831078 (2005)
    R831078 (Final)
  • Full-text: ScienceDirect-Full Text HTML
    Exit
  • Abstract: ScienceDirect-Abstract
    Exit
  • Other: ScienceDirect-Full Text PDF
    Exit
  • Supplemental Keywords:

    PM2.5, receptor models, Positive Matrix Factorization, PMF, UNMIX, advanced factor models, bootstrap, rotational ambiguity, source resolution, source apportionment,, RFA, Scientific Discipline, Air, Ecosystem Protection/Environmental Exposure & Risk, particulate matter, Air Quality, air toxics, Environmental Chemistry, Air Pollution Effects, Monitoring/Modeling, Environmental Monitoring, Engineering, Chemistry, & Physics, Environmental Engineering, air quality modeling, health effects, particle size, carbon aerosols, atmospheric dispersion models, particulate organic carbon, atmospheric particulate matter, atmospheric measurements, model-based analysis, chemical characteristics, environmental measurement, PM 2.5, positive matrix factorization, atmospheric particles, aerosol particles, mass spectrometry, motor vehicle emissions, emissions monitoring, air quality models, airborne particulate matter, diesel exhaust, emissions, thermal desorption, air sampling, carbon particles, air quality model, mobile sources, ultrafine particulate matter, particulate matter mass, PM2.5, modeling studies, aersol particles, aerosol analyzers, chemical speciation sampling, measurement methods, particle size measurement, carbonaceous particulate matter

    Progress and Final Reports:

    Original Abstract
  • 2004 Progress Report
  • 2006
  • Final Report