The DIVERSE Toolkit, a Layman's Essay

Most of today's computationally intensive computer applications or high performance computing (HPC) applications typically consist of one computer program that may be broken into many processes (or threads). A typical HPC application program may be parallelized by using HPC compiler tools or application programming interfaces (APIs), like message passing interface (MPI). The processes run in a predetermined synchronous and reliable fashion. For an application to be successful, none of the processes can fail. Moreover, all inter-process communication (IPC) must be reliable and all processes must synchronize at predetermined intervals.

We have implemented a different HPC application paradigm at Virginia Tech. It is a real-time interactive HPC application, which consists of many programs that can run on many computers. The paradigm tolerates faults and requires neither reliable IPC methods nor regular process synchronization. The result is an increase in the performance of each of the programs and of the total system. This paradigm is general and can be used to model and simulate real-time, interactive, person-in-the-loop, dynamical systems. If the dynamical models are based on physics, then the observations made from the system can be used in the real world. We have used state-of-the-art physics-based models to implement this methodology to build a state-of-the-art Crane and Ship Simulator in the CAVE at Virginia Tech (VT-CAVE). The Simulator operator interacts in a highly realistic virtual environment, complete with high-fidelity, 270-degree scene 3-D visualization, ambient sound, base motion, physical control console, and chair.

The structure of the software system of the crane and ship simulator is a cluster of cooperating programs. They are coupled to each other by data structures that retain current physical state information of the system in inter-process shared memory from which the programs can read and write. This design facilitates a modular structure for the system. All the programs in the simulator run asynchronously with respect to all other programs. Some programs are simulating physical dynamical models whose cyclic solver is synchronized to real-time, others use the current shared physical state information from the shared memory. Consequently, the need for process synchronization is replaced indirectly by the sampling of physical state values that are being modeled. If the coupling between two processes is large, then the shared data is queued and the reading process interpolates values for the given time (or another state variable) that it is reading, so that continuity in the coupling may be better preserved. See http://thor.sv.vt.edu/crane/design_report.html for more details of software system of the crane and ship simulator.

The development of the crane ship simulator is driving the need for innovative HPC software tools like the Virginia Tech DIVERSE toolkit (DTK), and its sister project the Virginia Tech DIVERSE graphics interface. DTK is the "glue" that ties together all of the programs in the simulator. DTK consists of the server program, a C++ client API, and small utility programs. The DTK server manages the inter-process shared memory, provides an interface for other programs to serial hardware devices via shared memory, and a seamless interface to IP networked shared memory. The DTK C++ client API is used by all programs in the simulator. The DTK server is central to the system in that it is the only program required to be running at all times for the simulator to be operational. All of the other programs (modules) can be developed and run independently of most other modules. Modules can be emulated, started, and stopped without corrupting other running modules. When the system is running in a steady state, the primary IPC mechanism used is inter-process shared memory. DTK extends inter-process shared memory to Internet remote shared memory without additional coding, so that the system may be distributed on many computers when necessary.

The largest difference between this programming paradigm and one using MPI is that the programs distributed in this system typically run asynchronously. The network IPC methods in DTK are usually unreliable (i.e., UDP/IP). However, reliable TCP/IP is used for changing the system configuration, like adding an observer to the system or communicating discrete events, like turning on a light. The UDP/IP methods are used for transferring most of the network information, such as when a model of a hydraulic cylinder is continuously expanding and contracting due to an operator's input. The changes in the cylinder's length are being fed to the network continuously at a regular rate. If a cylinder length network IP packet is lost, there will not be a need to send again that information because, in most cases, the next cylinder length packet will come before a replacement IP packet can arrive. This method is analogous to the way real systems interact. For example, if you blink your eyes, the light that your eyes did not receive is not sent again, but instead the latest light information is sent to your eyes when you reopen them.

Today's computers are on the threshold of perfecting physically based real-time interactive models, and Virginia Tech is poised with a unique blend of expertise to advance this technology.


Find simular ideas at http://www.mpirt.org/.
this page: http://thor.sv.vt.edu/DTK/essay.html