An Overview of the Distributed Parallel Storage Server (DPSS)
Brian L. Tierney, William E. Johnston, Jason Lee, Gary Hoo, Mary Thompson
Data Intensive Distributed Computing Group
Lawrence Berkeley National Laboratory, Berkeley, CA 94720
1.0 Introduction1
We have developed and deployed a Distributed-Parallel Storage System (DPSS) in several high-speed ATM WAN testbeds to support several different types of data-intensive applications.
Architecturally the DPSS is a network striped disk array, but it is unique in that it allows client applications complete freedom to determine optimal data layout, replication and/or coding redundancy strategy, security policy, and dynamic reconfiguration.
The DPSS was first developed for real-time recording of, and access, to large, image-like, read-mostly data sets in the DARPA-funded MAGIC testbed. In MAGIC (http://www.magic.net), the DPSS is distributed across several sites separated by more than 1000 Km of high speed network that uses IP over ATM as the network protocol, and is used to store very high resolution images of several geographic areas.
Both the architecture and the implementation are intended to provide for easy and low-cost scalability. This approach has yielded a data source that both scales economically to very high speed, and also supports systems research on networks and distributed computing.
2.0 DPSS Architecture
The DPSS is a collection of disk servers which operate in parallel over a wide area network to provide logical block level access to large data sets. To achieve high performance we exploit many levels of parallelism, including that available at the level of the disks, controllers, processors / memory banks, servers, and the network
(See Figure 1: DPSS Architecture).
The implementation is based on user level software that runs on UNIX workstations. The DPSS is essentially a "block" server that can supply data to applications located anywhere in the network in which it is deployed. Multiple low-cost, medium-speed disk servers use the network to aggregate their data streams. Data blocks are declustered (dispersed in such a way that as many system elements as possible can operate simultaneously to satisfy a given request) across both disks and servers. This strategy allows a large collection of disks to seek in parallel, and all servers to send the resulting data to the application in parallel.
At the application level, the DPSS is a semi-persistent cache of named data-objects, and at the storage level it is a logical block server. The overall data flow involves "third-party" transfers from the storage servers directly to the data-consuming application (a model used by most high performance storage systems). Thus the application requests data, these requests are translated to physical block addresses (server name, disk number, and disk block), and the servers deliver data directly to the application.
Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation in a wide-area ATM network-based Internet environment.
3.0 DPSS Performance
A DPSS server consisting of a typical Unix workstation with 4-6 typical SCSI disks and an high-speed network interface is capable of 60 to 120 Mbits/sec. Therefore using serveral DPSS servers one may construct a data pipe fast enough to satisfy almost any application. For example, a six server configuration should be able to supply about 500 Mbits/second.
4.0 Sample Applications
Current DPSS client applications include on-line storage and viewing utilities for both very large images and video sequences:
- A health care imaging application that uses an ATM network and the DPSS to facilitate collection, storage, analysis, and delivery of medical x-ray fluoroscopy video. Video sequences are collected from the angiography imaging systems, and sent through an ATM network to storage and analysis systems, as well as directly to the clinic sites. Thus, data can be collected and stored for later use, data can be delivered live from the imaging device to remote clinics in real-time, or these data flows can all be done simultaneously. (see http://www-itg.lbl.gov/Kaiser)
- A terrain visualization application called "TerraVision". TerraVision, developed at SRI International for the MAGIC project, uses the DPSS to enable a user to explore or navigate a "real" landscape represented in 3D using ortho-corrected, one-meter resolution images and digital elevation models. TerraVision allows a user to view, in real time, a graphical representation of a real landscape created from elevation data and aerial images of that landscape. The data for the visualization reside on DPSS servers distributed across the network. The distributed nature of the DPSS servers, the high speed of the network, and the algorithms used by TerraVision to pre-fetch and display only those parts of the data required for a given viewpoint allow the user to roam about a terrain comprising tens to hundreds of gigabytes of data at arbitrary speeds and from arbitrary vantage points. (see http://www.ai.sri.com/~magic/terravision.html)
- Large Image browsing: The EROS Data Center is using the DPSS for browsing and analyzing very large (tens of gigabytes) satellite and aerial images.
- Particle Detector Data processing: We will soon be using the DPSS as a cache for processing very large amounts (several terrabytes) of High Energy Physics data from particle detector systems.
5.0 Large Data-Object Architecture
We are developing a strategy for using high-speed networks as enablers for storage systems whose components are distributed around wide area networks. The high-level goal is to dramatically increase the location independence for access to "large data-objects". These objects - typically the result of a single operational cycle of an instrument, and of sizes from tens of MBytes to tens of Gbytes - are the staple of modern analytical systems.
Our approach is an architecture that uses a collection of highly distributed services to provide flexibility of managing storage resources, reliability of access, and high performance, all in an open environment where the use-conditions for resources and stored information are guaranteed through the use of a strong, but decentralized, security architecture. The basic elements of a distributed large data-object architecture include: data collection, on-line storage, processing elements, data management, data access interfaces, tertiary storage management, and transparent security that provides access control for all of the systems components.
At the center of this architecture is a high-speed network cache, which is used both for initial data collection, and to provide subsequent high-speed access by applications. This DPSS is used for this cache. For more information see http://www-itg.lbl.gov/DPSS/papers/NARA-data-objects.ps.
6.0 DPSS as a Network Measurement Tool
As developers of high-speed network-based distributed services, we often observe unexpectedly low network throughput and/or high latency. The reason for the poor performance is frequently not obvious. The DPSS is instrumented to monitor the flow of data blocks throughout the network, which has proven to be very useful in finding network problems. For more information see http://www-itg.lbl.gov/DPSS/papers/ISS.net.measurement-tool.ps.
1
This work is supported by the DARPA Information Technology Office (http://www.ito.darpa.mil/ResearchAreas.html) and Director, Office of Energy Research, Office of Computation and Technology Research, Mathematical, Information, and Computational Sciences (MICS) Division, of the U. S. Department of Energy under Contract No. DE-AC03-76SF00098.
bltierney@lbl.gov
Copyright © 1996, Lawrence Berkeley National Lab. All rights
reserved.