Grantee Research Project Results

Final Report: Advancement of Environmental Decision Support Systems Through High Performance Computing Communication

EPA Grant Number: R825196
Title: Advancement of Environmental Decision Support Systems Through High Performance Computing Communication
Investigators: Brill, E. Downey , Ranjithan, S. Ranji , Fine, Steven S. , Baugh, John W. , Loughlin, Daniel
Institution: North Carolina State University
EPA Project Officer: Aja, Hayley
Project Period: October 1, 1996 through September 30, 1999 (Extended to September 30, 2000)
Project Amount: $599,932
RFA: High Performance Computing (1996) RFA Text | Recipients Lists
Research Category: Human Health , Aquatic Ecosystems , Environmental Statistics

Objective:

We envision the role of an environmental management decision support system (DSS) as facilitating an iterative decision-making process in which a decision-maker incrementally learns about an environmental problem during the solution process. As knowledge is gained, previous assumptions may be challenged, and, though a number of iterations, one or more designs can be incrementally developed and refined. In addition to providing a flexible design framework, a DSS simplifies the design process by insulating the user from tedious tasks, such as accessing data, setting up modeling studies, performing quality assurance checks, etc. Also, tools may be provided to support the user in important, user-driven activities such as developing, testing, and comparing alternative control strategies. A potentially important component of such a system is optimization. Optimization can be used to generate benchmark cost-effective alternatives that meet user-defined goals and objectives. These alternatives may be good starting places for further analysis and may provide the user with valuable insights about how to efficiently solve the problem.

While this vision of an environmental management DSS is readily articulated, implementations that fit this description are rare. One reason is the complexity of many problems faced in environmental management. These problems often can be characterized as having large amounts of data, much of which may be variable or uncertain. Also, pollutant behavior in the environment may be exceedingly complex and often counterintuitive (i.e., NOx scavenging in ozone formation). Although we have tools to understand pollutant behavior, such as process models and Monte Carlo simulation techniques, these approaches are generally very compute-intensive, often precluding a comprehensive examination of decision alternatives.

Complex, nonlinear pollutant behavior also introduces an additional drawback: traditional optimization procedures that might be used to generate benchmark solutions are generally ill- quipped to solve such problems. Nontraditional optimization approaches such as genetic algorithms (GAs) and simulated annealing (SA) are able to consider nonlinear pollutant behavior, but do so by incorporating a process model into an iterative optimization process. As a result, the process model must be evaluated thousands of times during the course of an optimization run. The computational intensity of these approaches has limited their practicality and has been the bottleneck that has kept their use from being widespread.

One of the goals of our ongoing research program was to explore how HPCC capabilities such as distributed computing can make single and multiple objective optimization, uncertainty analysis, and alternative generation practical for complex environmental problems. We also were interested in how HPCC capabilities can be integrated into DSSs and in developing extensible decision support frameworks that can be applied readily to a variety of environmental and non-environmental problems. The specific focus of this STAR grant was to address the following objectives: (1) improving and developing optimization algorithms for multiobjective analysis, uncertainty analysis-based design, and alternative generation, (2) solving complex environmental problems using optimization in a distributed computing environment, and (3) identifying and prototyping potential features and architectures for environmental decision support.

Summary/Accomplishments (Outputs/Outcomes):

The accomplishments attained through the course of this project are divided into the following categories: (1) improving optimization algorithms for supporting environmental decision making, (2) improving the practicality of using optimization for complex problems through the use of distributed computing, (3) decreasing optimization time through conjunctive-use approaches, and (4) identifying, prototyping, and evaluating architectures and features in environmental decision support systems.

1. Algorithmic Improvements for Supporting Environmental Decision Support

We identified and evaluated optimization approaches for addressing the complex, nonlinear problems often encountered in environmental management. Genetic algorithms (GAs), simulated annealing (SA), and particle swarm optimization (PSO) were compared based on the following criteria: (1) proficiency in global search, (2) the ability to readily distribute the computational load over a number of processors or networked computers, and (3) the ability to handle complex problems with as many as hundreds or thousands of decision variables. We found GAs to meet our criteria the best and focused primarily on GAs-based optimization.

Of particular interest were identifying or developing variations to standard GA implementations to allow them to be used in the decision making processes of: (1) multiobjective optimization? characterizing tradeoffs among design objectives, (2) uncertainty analysis based-optimization, or chance-constrained optimization?identifying optimal solutions that perform well with respect to uncertainties, and (3) alternative generation. GA implementations for multiobjective optimization and chance-constrained optimization had been explored in the literature. Our intuition was that published multiobjective approaches could be improved considerably. We implemented our own multiobjective GA to address this. Also, chance-constrained GAs that were reported in the literature required several orders of magnitude more computations than standard GAs. We explored different implementations and identified the combination of using Latin Hypercube sampling and sample sizes of 10-20, generated each generation, to reduce the computational burden considerably with little or no loss in solution quality. Lastly, there previously had been no published research on the topic of using GAs to generate very different alternatives. We created and tested a new GA approach for this purpose.

2. Improving Optimization Performance Using Distributed Computing

To demonstrate the use of GAs for a complex environmental problem, we applied the approach to the control of tropospheric ozone in the Charlotte, NC, region. Ozone, a constituent of photochemical smog, is not released directly into the atmosphere, but is instead formed through complex photochemical reactions between nitrous oxides (NOx) and volatile organic compounds (VOCs). To reduce ozone levels, control technologies can be applied to VOC and NOx sources, including electric generation utilities, vehicles, and various industrial boilers and other processes. Generally, reductions in emissions lead to lower ozone concentrations downwind. This relationship is nonlinear, however, and depending on the relative VOC and NOx concentrations, reductions can lead to increases in ozone also.

GAs are able to consider nonlinear ozone behavior by integrating a photochemical air quality model directly into the optimization search process. In our application, we used the simple photochemical trajectory box model called the Empirical Kinetic Modeling Approach (EKMA) to predict ozone formation. This model requires less than 2 seconds on a low end UNIX workstation. Over the course of a typical GA run, however, the model was executed as many as 20,000 times, requiring a total of nearly 11 hours per GA run. While this application successfully demonstrated the ability of GAs to consider complex, nonlinear chemistry, it strongly pointed out the computational requirements. We were also interested in using a regulatory-scale model, the Urban Airshed Model, within the GA framework. Execution of one run of UAM required nearly 20 minutes, however. Assuming 20,000 runs were required in a GA run, the total GA duration would be over 275 days! Clearly, this was impractical. To address computational intensity, we explored: (1) implementing the GA in a distributed fashion on a multiprocessor computer or on a network of computers, (2) conjunctively using simple and complex air quality models in the GA, (3) conjunctively using GAs and local search algorithms, and (4) using adaptive GA search operators.

GAs are inherently parallel algorithms, and, as such, are readily modified for parallel execution on a multiprocessor computer or across a computer network. For example, GAs search by modeling the evolution of a population of potential solutions. In any particular generation of a generational GA, the evaluations of these solutions are independent from each other. As a result, each solution can be evaluated on a different processor or even a different computer. The generational GA must synchronize itself after all of a generation's evaluations are complete, after which operators such as mutation and elitism can be performed. Dependent on the population size, number and homogeneity of available processors, communication costs, etc., parallelizing a GA via this approach has the potential to reduce overall computational time by up to a factor of the number of processors available. For example, if 20 processors were available, and if the computational load were distributed evenly over those processors, it would be possible to reduce the GA-UAM overall runtime from 275 days to less than 2 weeks.

To test the viability of this approach, we explored several distributed GA implementations, including a parallelized generational GA, a steady state GA, and an asynchronous generational GA. We had concerns about premature convergence in steady state GAs, so our focus shifted to the other two approaches. Both approaches parallelized the GA by placing each individual that needed to be evaluated in a pool-of-tasks. This pool-of-tasks resided on the computer housing the main GA code. This computer is also known as the client. Computers that are available for computations, the servers, periodically contact the pool-of-tasks to determine if there are any tasks that need to be evaluated. If there is a task waiting in the pool, the server grabs the task, completes the calculations, and then returns the result to the GA. This pool-of-tasks architecture performs automatic load balancing between the servers since servers that are faster are able to query the pool of tasks more often. The asynchronous GA differed from the synchronous GA by using a data-flow graph that represents the flow of data through the GA. For any calculation that needs to be performed, the dataflow graph identifies the critical data dependencies for this calculation. As soon as these dependencies are resolved, the calculation is performed. This allows some function evaluations in subsequent generations to be performed before the current generation evaluation has completed, reducing the idle time of workstations in the distributed network. Testing of data-flow GA has shown the approach to reduce runtime by as much as 20 percent on a heterogeneous network of computers. The data-flow GA is a component of the Vitri distributed computation prototype that was developed under this project. Vitri is discussed later in this executive summary.

3. Reducing Optimization Time Through Conjunctive-Use Approaches

The runtime of regulatory-scale models may be hours or even days, potentially making both their use in designing and evaluating management strategies and their integration into iterative optimization approaches impractical. In many cases, however, it may be possible to make these activities more practical if the complex model is used in conjunction with a simple model, such as a simplified version of the model or a surrogate model. Provided the simple model produces results of adequate quality, it may be used to eliminate evaluations of unpromising management approaches and to provide good, cost-effective starting strategies for making more detailed analyses using the complex models.

We have explored a variety of conjunctive-use approaches. To increase the correlation between simple and complex model predictions, we have used GA optimization to train a simple model to approximate the results of a complex model over a variety of conditions. The simple model was then used to characterize the control response surface, identifying potentially cost-effective control strategies and eliminating others. We also combined this approach with MGA to generate a cost-effective initial population to a GA which used the complex model.

We identified, although did not implement, a number of other potential implementations of conjunctive model use in the context of GA optimization. These include:

? Pre-emptive use of the simple model?If the duration of the simple model can be assumed to be very short relative to that of the complex model, and if the simple model is adept at identifying bad solutions, it may be advantageous to evaluate all solutions with the simple model, then eliminate solutions that do not need to be evaluated further with the complex model.

? Substitution of the simple model ?The simple model could be used as a substitute for the complex model for some fraction of the evaluations. The choice of the model to use for any particular evaluation may be made at random, or it may be advantageous to use the simple model or complex model for an entire generation.

? Staged use?The simple model could be used in the initial generations of the GA to eliminate bad regions of search space. The complex model would then be used to refine the search.

As an alternative to using a trained simple model, we explored a multivariate regression scheme in which regression was used to model the source-receptor behavior for an urban-scale ozone scenario. With this approach, we were able to obtain good source-receptor estimates which were then used in a mixed integer linear programming model. Solutions obtained from this model rivaled those of most configurations of the GA and were produced using considerably fewer total evaluations. While it is difficult to extrapolate these promising regression results to all meteorological episodes and problem domains, these results suggest that, for many situations, linear regression coefficients may be adequate to model complex nonlinear environmental phenomena and that use of these coefficients in a linear or mixed integer linear programming model may be a cost-effective approach.

The conjunctive modeling approaches discussed above involve conjunctive use of environmental models. Optimization algorithms can also be used conjunctively to speed computations. For example, approaches such as GAs are good global search algorithms but are not efficient for local search. Conjugate gradient approaches, however, are good local search approaches, but are easily trapped in local optima. We explored several approaches in which GAs and local search approaches are used together to make use of the advantages of each. We found a post-GA local search to be an effective approach for speeding computations, provided a more stringent GA stopping criterion is used. We also explored, with mixed results, using a local search operator based on Nelder-Meade Simplex within a GA.

We also have explored modifying the crossover and mutation parameters to be functions of population diversity. The crossover rate is initially very high as the GA compares and combines solutions within the initial population. Mutation, however, is less important at this stage and occurs at a rate that is nearly 0. As population diversity diminishes, so does the crossover rate since combining very similar solutions yields little benefit. Mutation rate, however, increases with decreasing diversity. Using these adaptive operators, initial trials suggest that the total number of evaluations required by the GA is reduced by roughly 20 to 40 percent for a trial problem.

4. Decision Support System Feature Prototyping and Evaluation

We have developed a number of prototypes to demonstrate the use of decision support tools in facilitating environmental decision-making. The Strategy Development Tool (SDT) is a prototype whose development began under a previous project and was continued under this project. The goal of the SDT was to prototype tools for air quality management strategy development and evaluation. Major capabilities of the system include: (1) inventory quality assurance, visualization, and analysis, (2) interactive strategy design, (3) strategy testing, and (4) strategy comparison. These capabilities allow regulators to analyze inventories and compare the costs and environmental impacts of command-and-control policies, incentive-based control strategies (charges and trading), and optimization-based least cost investigations. The SDT is currently being used in a research project with the State of North Carolina's Air Quality Planning group. Also, MCNC is using the SDT in several emissions projects. Feedback from actual and potential users from MCNC, the State of North Carolina, and the U.S. Environmental Protection Agency (EPA) have provided valuable insights and ideas for many new features. Ideally, the SDT will serve as a working prototype for the development of a production-quality environmental DSS that will assist state and federal regulators in future policymaking.

The Vitri distributed optimization prototype was developed to demonstrate how the computations of iterative decision support approaches (e.g., GAs and Monte Carlo simulations) could be distributed over a network of heterogeneous computers. This framework is combined with a GA-based optimizer to form a decision support system for engineering design and analysis that is generic, fault tolerant, and that performs automatic load-balancing. Vitri was designed using the object oriented framework paradigm, providing many "hot-spots" which are hook points in the program where a user can bring in new functionality by attaching application-specific code. Thus, these hot spots help make the system flexible and reusable. We have applied Vitri to a variety of engineering design problems, including the development of cost-effective control strategies for tropospheric ozone, reliable water distribution systems, and coupled piping systems for nuclear structures. Eventually, we may merge these two prototypes or use their feature sets and capabilities as roadmaps for future DSSs.

Journal Articles on this Report : 2 Displayed | Download in RIS Format

Publications Views
Other project views:	All 21 publications	2 publications in selected types	All 2 journal articles

Publications
Type	Citation	Project	Document Sources
Journal Article	Loughlin DH, Ranjithan SR, Baugh JW, Brill ED. Application of genetic algorithms for the design of ozone control strategies. Journal of the Air & Waste Management Association June 2000;50(6):1050-1063.	R825196 (Final)	not available
Journal Article	Loughlin DH, Ranjithan SR, Brill ED, Baugh JW. Genetic algorithm approaches for addressing unmodeled objectives in optimization problems. Engineering Optimization 2001;33(5):549-569.	R825196 (Final)	not available

Supplemental Keywords:

air quality management, ozone management, joint-cognitive system., RFA, Scientific Discipline, Ecosystem Protection/Environmental Exposure & Risk, computing technology, Ecology and Ecosystems, ecosystem modeling, decision support systems, environmental decision making, HPCC, computer science, data management, ecosystem simulation, data analysis, information technology, parallel computing

Progress and Final Reports:

Original Abstract

The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.

Grantee Research Project Results

Final Report: Advancement of Environmental Decision Support Systems Through High Performance Computing Communication

Objective:

Summary/Accomplishments (Outputs/Outcomes):

Supplemental Keywords:

Progress and Final Reports:

Project Research Results

Site Navigation

Related Information