Final Report: Computational Requirements of Statistical Learning within a Decision-Making Framework for Sustainable Technology.

EPA Grant Number: R828207
Title: Computational Requirements of Statistical Learning within a Decision-Making Framework for Sustainable Technology.
Investigators: Chen, Victoria C.P.
Institution: Georgia Institute of Technology
EPA Project Officer: Hahn, Intaek
Project Period: July 1, 2000 through June 30, 2003 (Extended to June 30, 2005)
Project Amount: $335,000
RFA: Technology for a Sustainable Environment (1999) RFA Text |  Recipients Lists
Research Category: Sustainability , Pollution Prevention/Sustainable Development


This research project addressed the development of a decision-making framework (DMF) for creating more sustainable urban environments. The development of this DMF required a novel collaboration of current research in sustainability, optimization, and statistics. The objective of the DMF was to explore hypothetical paradigms on various scales, subject to technical and societal constraints, and measure their effect on other internal and external systems. In particular, the DMF will be instrumental in evaluating databases of emerging technologies and identifying the most promising directions for sustainable solutions.

The DMF will have the ability to extract key information from environmental deterministic and statistical models that would otherwise be overlooked using current approaches. Decision-makers use models to examine systems and to predict system responses to prescribed stimulants. Those who develop deterministic and statistical models may use the DMF developed in this research to guide them in building an interface between their models and the user. Such an interface will enable the users to utilize the models more efficiently and effectively when exploring how a system responds to a potential action or set of actions. It is expected that the users have a desired optimum course of action either to maximize expected benefits or minimize expected costs while maintaining other state conditions.

The specific objectives of the research project were to investigate the possibility of DMF prototypes in two important arenas: (1) water quality¾comparison of current and emerging technologies in a wastewater treatment system; and (2) air quality¾evaluation of spatial and temporal actions for reducing ground-level ozone pollution. For wastewater treatment, we considered the downstream subsystem presented by Chen and Beck (1997). Because the necessary modules for the included technologies have already been constructed, we are confident that a DMF for this system will be successful. By contrast, to construct an efficient air pollution module, the complex urban airshed encompassing the issue of ozone pollution requires additional exploratory studies through an advanced three-dimensional, photochemical air quality grid model, such as the urban airshed model (UAM, EPA-450/4-90-007A-E, 1990) or the recently released MODELS-3.

The DMF will be based on a stochastic dynamic programming (SDP) approach that permits optimization of a system changing over time. Although dynamic programming is proven optimal and has been successful in many applications, it is highly computationally intensive. Development of a DMF for sustainable technology is not straightforward because most environmental problems involve continuous variables that are subject to uncertainty and require the added dimensionality of space (in addition to time). The most promising high-dimensional continuous-state SDP solution method to date is orthogonal arrays (OA)/multivariate adaptive regression splines (MARS), which has accurately solved higher dimensional problems than was previously possible with prior methods. A critical objective of the research was to develop computationally-practical, high-dimensional, continuous-state SDP solution methods for use within a DMF for sustainable technology. Our methodology followed the OA/MARS method. The generalized solution method utilizes statistical experimental designs through which statistical learning within the SDP is achieved. Prior applications of OA/MARS have demonstrated that memory requirements are well within the capacity of modern technology. The computational effort required by the learning process (i.e., MARS), although polynomial in growth, may not be practical for very large problems. It should be noted that recent test runs on a 550 MHz Pentium II have demonstrated up to a 15-fold increase in computational speed compared to the original OA/MARS runs on a Sun SPARCstation 10 model 51.

The development of statistical learning for this research is separated into four categories: (1) restructuring MARS to reduce computation with minimal loss in learning; (2) investigating other flexible methods of statistical learning, such as artificial neural networks (ANNs); (3) employing smaller statistical experimental designs, such as Latin hypercube designs; and (4) parallelizing the statistical learning process. A successful DMF for air quality will push the boundaries of SDP research.

Summary/Accomplishments (Outputs/Outcomes):

SDP Statistical Learning Process

For the statistical learning process of SDP, two literature reviews of statistical methods for computer experiments were completed (Chen, et al., 2003; Tsai and Chen, 2004). One literature review on statistical data mining was completed Tsai, et al., 2005). All the methods described are potentially applicable to the DMF for designing sustainable systems. Those that we proposed to study included MARS, ANNs, and kriging for statistical modeling, and OAs, Latin hypercubes (LH), OA-based Latin hypercubes (OA-LH), and some number-theoretic methods (NTMs) for experimental design.

Of the statistical modeling methods, kriging and ANNs were studied. The preliminary kriging study did not illustrate enough improvement in accuracy to justify the computational burden (Chen and Welch, 2001). ANNs, however, are a good competitor to MARS for SDP. Although further computational studies are still warranted, either method appears to work well in SDP, and the choice mainly depends on a user’s preference. Variants of MARS developed by Dr. Julia Tsai in her dissertation include Parallel-MARS, ASR-MARS with an automatic stopping rule, and Robust-MARS (Tsai, 2002). Of the experimental designs, pure OA and OA-based Latin hypercube designs were studied for SDP. Finally, number-theoretic methods, specifically Niederreiter-Xing and SobolĀ¢ sequences, for experimental design were studied for the first time in SDP work.

Wastewater Treatment

As an initial prototype, a DMF for the 17-level downstream subsystem Chen and Beck (1997) was completed. First, the liquid line was solved separately (Tsai, 2002; Tsai, et al., 2003; Tsai and Chen, 2004; Tsai, et al., 2005), then the complete liquid and solid treatment DMF was solved (Tsai, 2002; Tsai, et al., 2004). In linking the solid line to the liquid line, a conversion from the liquid state variables to the solid state variables was needed, and technology dependencies listed had to be modeled. Details on the unique SDP formulation used to handle these issues are given in Tsai, et al. (2004).

In addition to constraints and penalty costs forcing “cleanliness” of the exiting liquid and solid, six optimization objectives may be explored: minimize economic cost (which includes both capital and operating costs), and minimize odor emissions, minimize land area, minimize volume, maximize global desirability, and maximize robustness. Our proposed plan was to study them separately, but our results primarily utilize economic cost. The other measures were not as accurately represented by Chen and Beck (1997). The DMF results will only be as good as the modeling of the technologies involved.

Ozone Pollution

As a complex prototype, a DMF for ozone pollution was completed. We studied an episode in urban Atlanta on July 31-August 1, 1987, which remains one of the worst episodes on record. The key day to control this episode was July 31, and our ozone pollution DMF prototype focuses on this day. The modules of our DMF are illustrated in Figure 1. The entire work is described in Yang (2004), and preliminary modeling is presented in Chen, et al. (2003). Unlike the wastewater treatment application, for which the system dynamics were already modeled, significant work was needed to build the Atmospheric Chemistry Module in Figure 1. Once the Atmospheric Chemistry Module was constructed, the DMF solved for SDP control policies, which were then tested via the UAM (Yang, 2004; Yang, et al., 2005a and 2005b).

The Atlanta urban airshed model simulates hourly emissions and ozone concentrations on a 40x40 grid over the spatial region and emissions from 102 point sources. If we included ozone and NOx at every point source (102) and every grid region (40x40) and each hour (24), the number of state variables would still be large. Thus, to achieve a computationally-tractable DMF, a critical component of the Atmospheric Chemistry Module in Figure 1 is dimension reduction, conducted by three phases: initialization, mining, and metamodeling. The relationships constructed by these phases are the key to modeling the transition of states in the SDP.

In the initialization phase, the 40x40 grid used by the UAM was aggregated into a 5x5 grid and the 24 hourly time periods were aggregated into five three-hour time periods covering 4:00 AM to 7:00 PM. The last four time periods corresponded to four stages in the SDP, and the first time period initialized the SDP. We focused on controlling maximum ozone at the four specific Photochemical Assessment Monitoring Stations (PAMS) located in the Atlanta metropolitan area. NOx emissions are considered separately in the DMF for each of the grid squares and point sources during each of the four SDP stages. Several different initial conditions in the first time period (4:00-7:00 AM) were considered during testing of the DMF solutions.

A Modular DMF

Figure 1. A Modular DMF

In the mining phase, data collected from the UAM was used to identify those point sources and grid squares that had the most influence. Different time periods were not explored. We used a 149-point Latin hypercube experimental design to scale NOx emissions in different regions and at different point sources from zero up to the basecase level. These were input into the UAM and resulting ozone concentrations across the 40x40 UAM modeling grid were collected. Using our aggregated 5x5 grid, the maximum ozone concentration was determined for each grid square containing a PAMS site. Mining via regression analysis was conducted separately for each PAMS site. Of the 25 grid regions, 16 were statistically significant. Of the original 102 point sources only 15 were statistically significant.

In the metamodeling phase, the SDP transitions of the state variables were constructed. We chose to maintain only those 15 significant point sources identified in the mining phase due to the tremendous reduction in dimension, however, all 25 grid squares were maintained. Similar to the mining phase, data was collected through the UAM. Based on the input NOx emissions and the output from the UAM, statistical methods were applied to build the metamodels that represent the ozone SDP transition functions. Through the metamodels, the minimum required number of state variables and decision variables was identified. In particular, the state variable dimension specifies the size of the SDP, thus, the reduction from 524 to 25 state variables was extremely significant.

To solve this 25-dimensional SDP problem, both Latin hypercube and number-theoretic experimental designs were tested, with the latter performing slightly better. Consequently, the SDP methodology for the ozone pollution DMF utilizes 2000-point Niederreiter-Xing number-theoretic sequences with Dr. Tsai’s ASR-MARS. The SDP code was run on a Dual 2.4-GHz Intel Xeon Workstation and a solution was acquired in about 55 hours. Control strategies for the full range of initial conditions can be obtained from the SDP solution, which is what makes it dynamic. In particular, we tested the basecase initial conditions for July 31 and 50 hypothetical scenarios varying the initial conditions. In all cases, the DMF was able to maintain ozone levels at or below the U.S. Environmental Protection Agency (EPA) standard. Figure 2 shows the SDP control policy for the basecase initial conditions. The amount of NOx emission reduction is seen to vary by location and time, demonstrating the focused nature of the control policy.

For the 50 hypothetical scenarios (on the initial conditions), out of 38 emission variables that were controlled in the DMF, 13 had no change, 10 has some variation, and 15 had large variation. The 13 emission variables with no change were those that were either reduced 100 percent or 0 percent. The fact that 15 emission variables had large variation from scenario to scenario demonstrates the dynamic nature of the problem and the SDP solution. In other words, if the initial conditions of the day are different, then the optimal control strategies will be different. In addition, we compared the optimal emission reductions to the typical “across-the-board” control strategy that dictates the same percent reduction in emissions everywhere and all day. The across-the-board strategy required 60 percent reduction to achieve the EPA one-hour standard of 125 ppb. Our DMF required a maximum of 55 percent reduction in total emissions during 7:00 a.m. - 10:00 a.m., with lower percent reductions required during the later time periods, and no reductions taken during any other times (i.e., before 7:00 a.m. and after 7:00 p.m.). This demonstrates the potential cost effectiveness of targeted control strategies, as well as the potential to solve real world problems using our DMF.

Atlanta NOx Emission Reductions Plotted Over the 5x5 Grid

Figure 2. Atlanta NOx Emission Reductions Plotted Over the 5x5 Grid. Stage 1 is 7:00 AM – 10:00 AM. Stage 2 is 10:00 AM – 1:00 PM. Stage 3 is 1:00 PM – 4:00 PM. Stage 4 is 4:00 PM – 7:00 PM. Note: Point source #k reductions are combined with area reductions in grid square (x, y) as follows: #1-4 with (1, 4); #5, #6, #63, #64 with (3, 3); #21, #23, #25 with (1, 2); #9, #12 with (2, 2); #30 with (2, 3); #37 with (4, 3).


Chen J, Beck MB. Towards designing sustainable urban wastewater infrastructures: a screening analysis. Water Science and Technology 1997;35(9):99-112.

Journal Articles on this Report : 9 Displayed | Download in RIS Format

Other project views: All 48 publications 16 publications in selected types All 11 journal articles
Type Citation Project Document Sources
Journal Article Cervellera C, Chen VCP, Wen A. Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. European Journal of Operational Research 2006;171(3):1139-1151. R828207 (2002)
R828207 (2003)
R828207 (Final)
  • Abstract: Science Direct
  • Other: Pre-publication paper
  • Journal Article Cervellera C, Wen A, Chen VCP. Neural network and regression spline value function approximations for stochastic dynamic programming. Computers & Operations Research 2007;34(1):70-90. R828207 (Final)
  • Full-text: Penn State University-Abstract and Full Text PDF
  • Abstract: ScienceDirect-Abstract
  • Journal Article Chen VCP. Measuring the goodness of orthogonal array discretizations for stochastic programming and stochastic dynamic programming. SIAM Journal on Optimization 2002;12(2):322-344. R828207 (Final)
  • Abstract: SIAM-Abstract
  • Journal Article Chen VCP, Gunnther D, Johnson EL. Solving for an optimal airline yield management policy via statistical learning. Journal of the Royal Statistical Society: Series C (Applied Statistics) 2003;52(1):19-30. R828207 (Final)
  • Abstract: Wiley-Abstract
  • Other: Econ Papers-Abstract
  • Journal Article Chen VCP, Tsui K-L, Barton RR, Meckesheimer M. A review on design, modeling and applications of computer experiments. IIE Transactions 2006;38(4):273-291. R828207 (Final)
  • Abstract: Taylor&Francis-Abstract
  • Other: Prepublication Full Text PDF
  • Journal Article Tsai JCC, Chen VCP, Chen J, Beck MB, Chen J. Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Annals of Operations Research 2004;132(1-4):207-221 (Special Issue entitled CUSTOM [Center for Uncertain Systems:Tools for Optimization and Management] Conference on Applied Optimization under Uncertainty). R828207 (2002)
    R828207 (2003)
    R828207 (Final)
  • Abstract: Springer
  • Journal Article Tsai JCC, Chen VCP. Flexible and robust implementations of multivariate adaptive regression splines within a wastewater treatment stochastic dynamic program. Quality and Reliability Engineering International 2005;21(7):689-699. R828207 (2003)
    R828207 (Final)
  • Abstract: Wiley
  • Journal Article Yang Z, Chen VCP, Chang ME, Murphy TE, Tsai JCC. Mining and modeling for a metropolitan Atlanta ozone pollution decision-making framework. IIE Transactions 2007;39(6):607-615. R828207 (Final)
  • Abstract: Taylor&Francis-Abstract
  • Other: Prepublication Full Text PDF
  • Journal Article Yang Z, Chen VCP, Chang ME, Sattler ML, Wen A. A decision-making framework for ozone pollution control. Operations Research 2009;57(2):484-498. R828207 (Final)
  • Full-text: SemanticScholar-Full Text PDF
  • Abstract: Operations Research-Abstract
  • Supplemental Keywords:

    pollution prevention, sustainable development, risk management, clean technologies, modeling, cost benefit,, RFA, Scientific Discipline, Air, Water, Sustainable Industry/Business, Applied Math & Statistics, Wastewater, Sustainable Environment, Mathematics, air toxics, cleaner production/pollution prevention, Technology for Sustainable Environment, Economics and Business, tropospheric ozone, computational simulations, cost reduction, cleaner production, waste reduction, statistical research, stratospheric ozone, wastewater reuse, wastewater treatment plants, stochastic dynamic programming, sustainable urban environment, computer generated alternatives, optimization, water quality, water treatment, industrial innovations, pollution prevention

    Progress and Final Reports:

    Original Abstract
  • 2001
  • 2002 Progress Report
  • 2003 Progress Report
  • 2004 Progress Report