2002 Progress Report: Computational Requirements of Statistical Learning within a Decision-Making Framework for Sustainable Technology.

EPA Grant Number: R828207
Title: Computational Requirements of Statistical Learning within a Decision-Making Framework for Sustainable Technology.
Investigators: Chen, Victoria C.P.
Institution: The University of Texas at Arlington
Current Institution: Duke University
EPA Project Officer: Karn, Barbara
Project Period: July 1, 2000 through June 30, 2003 (Extended to June 30, 2005)
Project Period Covered by this Report: July 1, 2001 through June 30, 2002
Project Amount: $335,000
RFA: Technology for a Sustainable Environment (1999) RFA Text |  Recipients Lists
Research Category: Sustainability , Pollution Prevention/Sustainable Development


The objective of this research project is to address the development of a decisionmaking framework (DMF) for creating more sustainable urban environments. The development of this DMF requires a novel collaboration of current research in sustainability, optimization, and statistics. The goal of the DMF is to explore hypothetical paradigms on various scales, subject to technical and societal constraints, and measure their effects on other internal and external systems. The possibility of DMF prototypes in two important arenas will be investigated: (1) water quality—comparison of current and emerging technologies in a wastewater treatment system; and (2) air quality—evaluation of spatial and temporal actions for reducing ground-level ozone pollution.

The DMF will be based on a stochastic dynamic programming (SDP) approach that permits optimization of a system changing over time. Although dynamic programming is proven optimal and has been successful in many applications, it is highly computationally intensive. A critical objective of the research project is to develop computationally practical, high-dimensional, continuous-state SDP solution methods for use within a DMF for sustainable technology. Our methodology will follow along the same lines as the orthogonal arrays multivariate adaptive regression splines (OA/MARS) method (Friedman, 1991). The generalized solution method utilizes statistical experimental designs, through which statistical learning within the SDP is achieved. Prior applications of OA/MARS have demonstrated that memory requirements are well within the capacity of modern technology. The computational effort, however, required by the learning process (i.e., MARS), although polynomial in growth, may not be practical for very large problems. The development of statistical learning for this research is separated into four categories: (1) restructuring MARS to reduce computation with minimal loss in learning; (2) investigating other flexible methods of statistical learning such as artificial neural networks (ANNs); (3) employing smaller statistical experimental designs such as Latin hypercube designs; and (4) parallelizing the statistical learning process. A successful DMF for air quality will push the boundaries of SDP research.

Progress Summary:

The DMF for the 17-level downstream subsystem presented by Chen and Beck (1997) has been completed. The complete liquid and solid treatment DMF has been presented (Tsai, et al., submitted, 2003; Tsai, 2002). In linking the solid line to the liquid line, a conversion from the liquid state variables to the solid state variables was needed, and the technology dependencies had to be modeled. Details on the unique SDP formulation used to handle these issues are given in Tsai, et al., submitted, 2003.

For the statistical learning process of SDP, the major accomplishments are: (1) two literature reviews of statistical methods for computer experiments (Chen, et al., 2003a; Chen, et al., in review, 2003); (2) the parallelization of MARS; (3) an automatic stopping rule for MARS; and (4) a more robust algorithm for MARS. The last three items are described in Chen, 2002b; Tsai, et al., in review, 2002; Tsai, 2002; and Tsai, et al., 2003. Regarding alternatives to MARS, studies of ANNs (Cervellera and Chen, in review, 2002; Cervellera and Chen, in review, 2003) continue to reveal a genuine competitor to MARS.

Regarding the DMF prototype for ozone pollution, two phases of empirical studies on the relationships between ozone and nitrogen oxides for the Atlanta urban airshed model (UAM, U.S. Environmental Protection Agency-450/4-90-007A-E, 1990) have been conducted. Multiple linear regression metamodels based on the UAM have been constructed to predict maximum ozone concentrations. Through the metamodels, the minimum required number of state variables and decision variables was identified. In particular, the state variable dimension specifies the size of the SDP. Therefore, the reduction of state variables from 176 to 25 was extremely significant. To solve this 25-dimensional SDP problem, ASR-MARS and Niederreiter-Xing number-theoretic sequences (Cervellera and Chen, in review, 2003) are employed to reduce the computational requirements. Coding of the SDP is nearly complete, and some of the ongoing work has been documented in Chen, et al., 2003b and Chen, et al., 2003c.

Julia Tsai, funded fall 2000-summer 2002, completed her dissertation at the Georgia Institute of Technology (GIT) in the summer of 2002 (Tsai, et al., 2003). GIT doctoral student Terrence Murphy was funded for the spring semester of 2001, to develop the multiple linear regression models approximating UAM relationships. His work will be used to develop the initial ozone pollution SDP transition function module. University of Texas–Arlington (UTA) doctoral student Zehua Yang began working on the project in the summer of 2002. He is developing the DMF prototype for ozone pollution and is expected to graduate in the spring of 2004.


Friedman, JH. Multivariate adaptive regression splines. Annals of Statistics 1991;19(1):1-141.

Chen J, Beck MB. Towards designing sustainable urban wastewater infrastructures: a screening analysis. Water Science and Technology 1997;35(9):99-112.

Future Activities:

We will complete the ozone pollution DMF prototype. The immediate task is to complete the debugging of the SDP code. Once the SDP solution has been obtained, it will be tested thoroughly via the UAM simulation. In particular, the SDP solution will tell us if the July 31-August 1, 1987, episode could have been avoided. We also will attempt to utilize MARS metamodels as transition functions. The complication is that the SDP transition function must be monotonic to maintain the convexity of the future value functions. We also have plans to implement a parallel computing version of the ozone pollution SDP using UTA's high-performance computing network (http://hpc.uta.edu/ Exit ).

Journal Articles on this Report : 4 Displayed | Download in RIS Format

Other project views: All 39 publications 8 publications in selected types All 5 journal articles
Type Citation Project Document Sources
Journal Article Cervellera C, Chen VCP, Wen A. Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. European Journal of Operational Research 2006;171(3):1139-1151. R828207 (2002)
R828207 (2003)
R828207 (Final)
  • Abstract: Science Direct
  • Other: Pre-publication paper
  • Journal Article Cervellera C, Wen AH, Chen VCP. Neural network and regression spline value function approximations for stochastic dynamic programming.. Technometrics. . R828207 (2002)
    not available
    Journal Article Tsai JCC, Chen VCP, Chen J, Beck MB, Chen J. Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Annals of Operations Research 2004;132(1-4):207-221 (Special Issue entitled CUSTOM [Center for Uncertain Systems: Tools for Optimization and Management] Conference on Applied Optimization under Uncertainty). R828207 (2002)
    R828207 (2003)
    R828207 (Final)
  • Abstract: Springer
  • Journal Article Tsai JCC, Chen VCP, Lee EK, Johnson EL. Parallelization of the MARS value function approximation in a decision-making framework for wastewater treatment. Journal of Statistical Computation and Simulation. R828207 (2002)
    R828207 (2003)
    not available

    Supplemental Keywords:

    pollution prevention, sustainable development, risk management, clean technologies, modeling, cost benefit, air, sustainable industry, sustainable business, water, applied math and statistics, economics and business, mathematics, sustainable environment, technology for a sustainable environment, TSE, wastewater, air toxics, tropospheric ozone, cleaner production, computational simulations, computer generated alternatives, cost reduction, industrial innovations, optimization, source reduction, statistical research, stochastic dynamic programming, stratospheric ozone, sustainable urban environment, waste reduction, wastewater reuse, wastewater treatment plants., RFA, Scientific Discipline, Air, Water, Sustainable Industry/Business, Applied Math & Statistics, Wastewater, Sustainable Environment, Mathematics, air toxics, cleaner production/pollution prevention, Technology for Sustainable Environment, Economics and Business, tropospheric ozone, computational simulations, cost reduction, cleaner production, waste reduction, statistical research, stratospheric ozone, wastewater reuse, wastewater treatment plants, stochastic dynamic programming, sustainable urban environment, computer generated alternatives, optimization, water quality, water treatment, industrial innovations, pollution prevention

    Relevant Websites:

    http://hpc.uta.edu/ Exit

    Progress and Final Reports:

    Original Abstract
    2003 Progress Report
    2004 Progress Report
    Final Report