2003 Progress Report: Computational Requirements of Statistical Learning within a Decision-Making Framework for Sustainable Technology.

EPA Grant Number: R828207
Title: Computational Requirements of Statistical Learning within a Decision-Making Framework for Sustainable Technology.
Investigators: Chen, Victoria C.P.
Institution: The University of Texas at Arlington
Current Institution: Georgia Institute of Technology
EPA Project Officer: Hahn, Intaek
Project Period: July 1, 2000 through June 30, 2003 (Extended to June 30, 2005)
Project Period Covered by this Report: July 1, 2002 through June 30, 2003
Project Amount: $335,000
RFA: Technology for a Sustainable Environment (1999) RFA Text |  Recipients Lists
Research Category: Sustainability , Pollution Prevention/Sustainable Development


The objective of this research project is to address the development of a decision-making framework (DMF) for creating more sustainable urban environments. The development of this DMF requires a novel collaboration of current research in sustainability, optimization, and statistics. The objective of the DMF is to explore hypothetical paradigms on various scales, subject to technical and societal constraints, and measure their effect on other internal and external systems. The possibility of DMF prototypes in two important arenas will be investigated:

  1. water quality—comparison of current and emerging technologies in a wastewater treatment system;
  2. and air quality—evaluation of spatial and temporal actions for reducing ground-level ozone pollution.

The DMF will be based on a stochastic dynamic programming (SDP) approach that permits optimization of a system changing over time. Although dynamic programming has proved optimal and has been successful in many applications, it is highly computationally intensive. A critical objective of the proposed research project is to develop computationally-practical, high-dimensional, continuous-state SDP solution methods for use within a DMF for sustainable technology. Our methodology will follow along the same lines as the orthogonal array (OA)/multivariate adaptive regression splines (MARS) method (Friedman, 1991). The generalized solution method utilizes statistical experimental designs through which statistical learning within the SDP is achieved. Prior applications of OA/MARS have demonstrated that memory requirements are well within the capacity of modern technology. The computational effort, however, required by the learning process (i.e., MARS), although polynomial in growth, may not be practical for very large problems. The development of statistical learning for this research project is separated into four categories:

  1. restructuring MARS to reduce computation with minimal loss in learning;
  2. investigating other flexible methods of statistical learning such as artificial neural networks (ANNs);
  3. employing smaller statistical experimental designs such as Latin hypercube designs;
  4. and parallelizing the statistical learning process.

A successful DMF for air quality will push the boundaries of SDP research.

Progress Summary:

The general DMF was presented at an invited talk as part of the National Science Foundation Integrative Graduate Education and Research Traineeship at Columbia University. The wastewater treatment DMF has appeared in the operations research literature (Tsai, et al., 2004).

For the statistical learning process of SDP, two literature reviews of statistical methods for computer experiments, including metamodeling, have been accepted for publication (Chen, et al., 2003; Tsai and Chen, to appear, 2004). Of the statistical modeling methods, ANNs are a very competitive alternative to MARS for SDP (Cervellera, et al., in revision, 2004; Cervellera, et al., 2004; Cervellera, et al., to appear, 2004). Variants of MARS developed by Dr. Julia Tsai in her dissertation are employed (Tsai, et al., in review, 2003; Tsai and Chen, to appear, 2004), and pure OA, OA-based Latin hypercube, and number-theoretic experimental designs have been studied for SDP (Cervellera, et al., in revision, 2004; Tsai, et al., 2004; Tsai, et al., in review, 2003; Tsai and Chen, to appear, 2004; Cervellera, et al., 2004; Cervellera, et al., to appear, 2004).

Regarding the DMF prototype for ozone pollution, we are focusing on the day July 31, 1987, within an urban Atlanta episode that remains one of the worst on record. Our work on the DMF modules, particularly the Atmospheric Chemistry Module (ACM), is described in Yang, et al., (2004) and Yang, et al. (in revision, 2004). The Atlanta Urban Airshed Model (UAM, U.S. EPA-450/4-90-007A-E, 1990) is employed within the ACM to construct the relationships that are key to modeling the transition of state variables in the SDP formulation. After significant dimension reduction within the ACM, an SDP solution that maintained ozone at or below the U.S. Environmental Protection Agency’s standard of 0.12 ppm for various initial conditions was acquired in about 55 hours on a Dual 2.4-GHz Intel Xeon Workstation.


Friedman JH. Multivariate adaptive regression splines. Annals of Statistics 1991;19(1):1-141.

Future Activities:

The DMF prototypes for ozone pollution are nearly completed. We will conduct sensitivity testing of the ozone pollution DMF and submit this work to journals.

Journal Articles on this Report : 4 Displayed | Download in RIS Format

Other project views: All 48 publications 16 publications in selected types All 11 journal articles
Type Citation Project Document Sources
Journal Article Cervellera C, Chen VCP, Wen A. Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. European Journal of Operational Research 2006;171(3):1139-1151. R828207 (2002)
R828207 (2003)
R828207 (Final)
  • Abstract: Science Direct
  • Other: Pre-publication paper
  • Journal Article Tsai JCC, Chen VCP, Chen J, Beck MB, Chen J. Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Annals of Operations Research 2004;132(1-4):207-221 (Special Issue entitled CUSTOM [Center for Uncertain Systems:Tools for Optimization and Management] Conference on Applied Optimization under Uncertainty). R828207 (2002)
    R828207 (2003)
    R828207 (Final)
  • Abstract: Springer
  • Journal Article Tsai JCC, Chen VCP, Lee EK, Johnson EL. Parallelization of the MARS value function approximation in a decision-making framework for wastewater treatment. Journal of Statistical Computation and Simulation. R828207 (2002)
    R828207 (2003)
    not available
    Journal Article Tsai JCC, Chen VCP. Flexible and robust implementations of multivariate adaptive regression splines within a wastewater treatment stochastic dynamic program. Quality and Reliability Engineering International 2005;21(7):689-699. R828207 (2003)
    R828207 (Final)
  • Abstract: Wiley
  • Supplemental Keywords:

    pollution prevention, sustainable development, risk management, clean technologies, modeling, cost benefit, air, sustainable industry, water, applied math and statistics, technology for a sustainable environment,, RFA, Scientific Discipline, Air, Water, Sustainable Industry/Business, Applied Math & Statistics, air toxics, cleaner production/pollution prevention, Mathematics, Wastewater, Sustainable Environment, Technology for Sustainable Environment, Economics and Business, tropospheric ozone, computational simulations, cost reduction, cleaner production, waste reduction, stratospheric ozone, statistical research, wastewater reuse, wastewater treatment plants, stochastic dynamic programming, computer generated alternatives, optimization, sustainable urban environment, water quality, industrial innovations, pollution prevention, source reduction

    Progress and Final Reports:

    Original Abstract
  • 2001
  • 2002 Progress Report
  • 2004 Progress Report
  • Final Report