Final Report: Computational Requirements of Statistical Learning within a Decision-Making Framework for Sustainable Technology.EPA Grant Number: R828207
Title: Computational Requirements of Statistical Learning within a Decision-Making Framework for Sustainable Technology.
Investigators: Chen, Victoria C.P.
Institution: Duke University
EPA Project Officer: Karn, Barbara
Project Period: July 1, 2000 through June 30, 2003 (Extended to June 30, 2005)
Project Amount: $335,000
RFA: Technology for a Sustainable Environment (1999) RFA Text | Recipients Lists
Research Category: Sustainability , Pollution Prevention/Sustainable Development
This research project addressed the development of a decision-making framework (DMF) for creating more sustainable urban environments. The development of this DMF required a novel collaboration of current research in sustainability, optimization, and statistics. The objective of the DMF was to explore hypothetical paradigms on various scales, subject to technical and societal constraints, and measure their effect on other internal and external systems. In particular, the DMF will be instrumental in evaluating databases of emerging technologies and identifying the most promising directions for sustainable solutions.
The DMF will have the ability to extract key information from environmental deterministic and statistical models that would otherwise be overlooked using current approaches. Decision-makers use models to examine systems and to predict system responses to prescribed stimulants. Those who develop deterministic and statistical models may use the DMF developed in this research to guide them in building an interface between their models and the user. Such an interface will enable the users to utilize the models more efficiently and effectively when exploring how a system responds to a potential action or set of actions. It is expected that the users have a desired optimum course of action either to maximize expected benefits or minimize expected costs while maintaining other state conditions.
The specific objectives of the research project were to investigate the possibility of DMF prototypes in two important arenas: (1) water quality¾comparison of current and emerging technologies in a wastewater treatment system; and (2) air quality¾evaluation of spatial and temporal actions for reducing ground-level ozone pollution. For wastewater treatment, we considered the downstream subsystem presented by Chen and Beck (1997). Because the necessary modules for the included technologies have already been constructed, we are confident that a DMF for this system will be successful. By contrast, to construct an efficient air pollution module, the complex urban airshed encompassing the issue of ozone pollution requires additional exploratory studies through an advanced three-dimensional, photochemical air quality grid model, such as the urban airshed model (UAM, EPA-450/4-90-007A-E, 1990) or the recently released MODELS-3.
The DMF will be based on a stochastic dynamic programming (SDP) approach that permits optimization of a system changing over time. Although dynamic programming is proven optimal and has been successful in many applications, it is highly computationally intensive. Development of a DMF for sustainable technology is not straightforward because most environmental problems involve continuous variables that are subject to uncertainty and require the added dimensionality of space (in addition to time). The most promising high-dimensional continuous-state SDP solution method to date is orthogonal arrays (OA)/multivariate adaptive regression splines (MARS), which has accurately solved higher dimensional problems than was previously possible with prior methods. A critical objective of the research was to develop computationally-practical, high-dimensional, continuous-state SDP solution methods for use within a DMF for sustainable technology. Our methodology followed the OA/MARS method. The generalized solution method utilizes statistical experimental designs through which statistical learning within the SDP is achieved. Prior applications of OA/MARS have demonstrated that memory requirements are well within the capacity of modern technology. The computational effort required by the learning process (i.e., MARS), although polynomial in growth, may not be practical for very large problems. It should be noted that recent test runs on a 550 MHz Pentium II have demonstrated up to a 15-fold increase in computational speed compared to the original OA/MARS runs on a Sun SPARCstation 10 model 51.
The development of statistical learning for this research is separated into four categories: (1) restructuring MARS to reduce computation with minimal loss in learning; (2) investigating other flexible methods of statistical learning, such as artificial neural networks (ANNs); (3) employing smaller statistical experimental designs, such as Latin hypercube designs; and (4) parallelizing the statistical learning process. A successful DMF for air quality will push the boundaries of SDP research.
SDP Statistical Learning Process
For the statistical learning process of SDP, two literature reviews of statistical methods for computer experiments were completed (Chen, et al., 2003; Tsai and Chen, 2004). One literature review on statistical data mining was completed Tsai, et al., 2005). All the methods described are potentially applicable to the DMF for designing sustainable systems. Those that we proposed to study included MARS, ANNs, and kriging for statistical modeling, and OAs, Latin hypercubes (LH), OA-based Latin hypercubes (OA-LH), and some number-theoretic methods (NTMs) for experimental design.
Of the statistical modeling methods, kriging and ANNs were studied. The preliminary kriging study did not illustrate enough improvement in accuracy to justify the computational burden (Chen and Welch, 2001). ANNs, however, are a good competitor to MARS for SDP. Although further computational studies are still warranted, either method appears to work well in SDP, and the choice mainly depends on a user’s preference. Variants of MARS developed by Dr. Julia Tsai in her dissertation include Parallel-MARS, ASR-MARS with an automatic stopping rule, and Robust-MARS (Tsai, 2002). Of the experimental designs, pure OA and OA-based Latin hypercube designs were studied for SDP. Finally, number-theoretic methods, specifically Niederreiter-Xing and Sobol¢ sequences, for experimental design were studied for the first time in SDP work.
As an initial prototype, a DMF for the 17-level downstream subsystem Chen and Beck (1997) was completed. First, the liquid line was solved separately (Tsai, 2002; Tsai, et al., 2003; Tsai and Chen, 2004; Tsai, et al., 2005), then the complete liquid and solid treatment DMF was solved (Tsai, 2002; Tsai, et al., 2004). In linking the solid line to the liquid line, a conversion from the liquid state variables to the solid state variables was needed, and technology dependencies listed had to be modeled. Details on the unique SDP formulation used to handle these issues are given in Tsai, et al. (2004).
In addition to constraints and penalty costs forcing “cleanliness” of the exiting liquid and solid, six optimization objectives may be explored: minimize economic cost (which includes both capital and operating costs), and minimize odor emissions, minimize land area, minimize volume, maximize global desirability, and maximize robustness. Our proposed plan was to study them separately, but our results primarily utilize economic cost. The other measures were not as accurately represented by Chen and Beck (1997). The DMF results will only be as good as the modeling of the technologies involved.
As a complex prototype, a DMF for ozone pollution was completed. We studied an episode in urban Atlanta on July 31-August 1, 1987, which remains one of the worst episodes on record. The key day to control this episode was July 31, and our ozone pollution DMF prototype focuses on this day. The modules of our DMF are illustrated in Figure 1. The entire work is described in Yang (2004), and preliminary modeling is presented in Chen, et al. (2003). Unlike the wastewater treatment application, for which the system dynamics were already modeled, significant work was needed to build the Atmospheric Chemistry Module in Figure 1. Once the Atmospheric Chemistry Module was constructed, the DMF solved for SDP control policies, which were then tested via the UAM (Yang, 2004; Yang, et al., 2005a and 2005b).
The Atlanta urban airshed model simulates hourly emissions and ozone concentrations on a 40x40 grid over the spatial region and emissions from 102 point sources. If we included ozone and NOx at every point source (102) and every grid region (40x40) and each hour (24), the number of state variables would still be large. Thus, to achieve a computationally-tractable DMF, a critical component of the Atmospheric Chemistry Module in Figure 1 is dimension reduction, conducted by three phases: initialization, mining, and metamodeling. The relationships constructed by these phases are the key to modeling the transition of states in the SDP.
In the initialization phase, the 40x40 grid used by the UAM was aggregated into a 5x5 grid and the 24 hourly time periods were aggregated into five three-hour time periods covering 4:00 AM to 7:00 PM. The last four time periods corresponded to four stages in the SDP, and the first time period initialized the SDP. We focused on controlling maximum ozone at the four specific Photochemical Assessment Monitoring Stations (PAMS) located in the Atlanta metropolitan area. NOx emissions are considered separately in the DMF for each of the grid squares and point sources during each of the four SDP stages. Several different initial conditions in the first time period (4:00-7:00 AM) were considered during testing of the DMF solutions.
Figure 1. A Modular DMF
In the mining phase, data collected from the UAM was used to identify those point sources and grid squares that had the most influence. Different time periods were not explored. We used a 149-point Latin hypercube experimental design to scale NOx emissions in different regions and at different point sources from zero up to the basecase level. These were input into the UAM and resulting ozone concentrations across the 40x40 UAM modeling grid were collected. Using our aggregated 5x5 grid, the maximum ozone concentration was determined for each grid square containing a PAMS site. Mining via regression analysis was conducted separately for each PAMS site. Of the 25 grid regions, 16 were statistically significant. Of the original 102 point sources only 15 were statistically significant.
In the metamodeling phase, the SDP transitions of the state variables were constructed. We chose to maintain only those 15 significant point sources identified in the mining phase due to the tremendous reduction in dimension, however, all 25 grid squares were maintained. Similar to the mining phase, data was collected through the UAM. Based on the input NOx emissions and the output from the UAM, statistical methods were applied to build the metamodels that represent the ozone SDP transition functions. Through the metamodels, the minimum required number of state variables and decision variables was identified. In particular, the state variable dimension specifies the size of the SDP, thus, the reduction from 524 to 25 state variables was extremely significant.
To solve this 25-dimensional SDP problem, both Latin hypercube and number-theoretic experimental designs were tested, with the latter performing slightly better. Consequently, the SDP methodology for the ozone pollution DMF utilizes 2000-point Niederreiter-Xing number-theoretic sequences with Dr. Tsai’s ASR-MARS. The SDP code was run on a Dual 2.4-GHz Intel Xeon Workstation and a solution was acquired in about 55 hours. Control strategies for the full range of initial conditions can be obtained from the SDP solution, which is what makes it dynamic. In particular, we tested the basecase initial conditions for July 31 and 50 hypothetical scenarios varying the initial conditions. In all cases, the DMF was able to maintain ozone levels at or below the U.S. Environmental Protection Agency (EPA) standard. Figure 2 shows the SDP control policy for the basecase initial conditions. The amount of NOx emission reduction is seen to vary by location and time, demonstrating the focused nature of the control policy.
For the 50 hypothetical scenarios (on the initial conditions), out of 38 emission variables that were controlled in the DMF, 13 had no change, 10 has some variation, and 15 had large variation. The 13 emission variables with no change were those that were either reduced 100 percent or 0 percent. The fact that 15 emission variables had large variation from scenario to scenario demonstrates the dynamic nature of the problem and the SDP solution. In other words, if the initial conditions of the day are different, then the optimal control strategies will be different. In addition, we compared the optimal emission reductions to the typical “across-the-board” control strategy that dictates the same percent reduction in emissions everywhere and all day. The across-the-board strategy required 60 percent reduction to achieve the EPA one-hour standard of 125 ppb. Our DMF required a maximum of 55 percent reduction in total emissions during 7:00 a.m. - 10:00 a.m., with lower percent reductions required during the later time periods, and no reductions taken during any other times (i.e., before 7:00 a.m. and after 7:00 p.m.). This demonstrates the potential cost effectiveness of targeted control strategies, as well as the potential to solve real world problems using our DMF.
Figure 2. Atlanta NOx Emission Reductions Plotted Over the 5x5 Grid. Stage 1 is 7:00 AM – 10:00 AM. Stage 2 is 10:00 AM – 1:00 PM. Stage 3 is 1:00 PM – 4:00 PM. Stage 4 is 4:00 PM – 7:00 PM. Note: Point source #k reductions are combined with area reductions in grid square (x, y) as follows: #1-4 with (1, 4); #5, #6, #63, #64 with (3, 3); #21, #23, #25 with (1, 2); #9, #12 with (2, 2); #30 with (2, 3); #37 with (4, 3).
Chen J, Beck MB. Towards designing sustainable urban wastewater infrastructures: a screening analysis. Water Science and Technology 1997;35(9):99-112.
Journal Articles on this Report : 3 Displayed | Download in RIS Format
|Other project views:||All 39 publications||8 publications in selected types||All 5 journal articles|
||Cervellera C, Chen VCP, Wen A. Optimization of a large-scale water reservoir network by stochastic dynamic programming with efficient state space discretization. European Journal of Operational Research 2006;171(3):1139-1151.||
||Tsai JCC, Chen VCP, Chen J, Beck MB, Chen J. Stochastic dynamic programming formulation for a wastewater treatment decision-making framework. Annals of Operations Research 2004;132(1-4):207-221 (Special Issue entitled CUSTOM [Center for Uncertain Systems: Tools for Optimization and Management] Conference on Applied Optimization under Uncertainty).||
||Tsai JCC, Chen VCP. Flexible and robust implementations of multivariate adaptive regression splines within a wastewater treatment stochastic dynamic program. Quality and Reliability Engineering International 2005;21(7):689-699.||
Supplemental Keywords:pollution prevention, sustainable development, risk management, clean technologies, modeling, cost benefit,, RFA, Scientific Discipline, Air, Water, Sustainable Industry/Business, Applied Math & Statistics, air toxics, cleaner production/pollution prevention, Mathematics, Wastewater, Sustainable Environment, Technology for Sustainable Environment, Economics and Business, tropospheric ozone, computational simulations, cost reduction, cleaner production, waste reduction, stratospheric ozone, statistical research, wastewater reuse, wastewater treatment plants, stochastic dynamic programming, computer generated alternatives, optimization, sustainable urban environment, water quality, industrial innovations, pollution prevention, source reduction
Progress and Final Reports:Original Abstract
2002 Progress Report
2003 Progress Report
2004 Progress Report