Scalable Distributed Data Structures
Witold Litwin, University Paris 9
Thomas Schwartz, Santa Clara University
Commodity computers, interconnected through the high-speed networks
are becoming basic hardware. Such configurations are called
multicomputers, clusters, superservers, and more recently
grid-computing systems and P2P systems. They include dozens, hundreds
or even thousands of clients and servers. Their cumulative resources
are impressive: dozens of GBytes of distributed RAM, and TBytes of
disks accessible for GMips of highly parallel and distributed
processing. They offer potentially unbeatable performance and
price/performance ratio, opening up new perspectives for applications.
Major hardware and software makers present multicomputers as the next
step of their business strategy. The technology is also felt as the
next step for the Internet that should evolve from a data providing
utility today, into a computing utility. In the US, it is the
technology recommended at the highest governmental level to keep the
lead over the rest of the world in the next Millennium One also
foresees very large multicomputers the grid computing and P2P
computing as new tools for the academia, e.g., a 10,000 node
multicomputer for Stanford University in 10 years.
Multicomputers need new system software, fully taking advantage of the
distributed RAM, and of parallel and processing on multiple CPUs. One
especially needs new data structures, spanning multiple servers, and
scaling to as many sites as needed transparently for the application.
Such structures should possibly reside for processing in distributed
RAM, providing access performance inaccessible to disk files. They
should be accessible to multiple autonomous clients, including mobile
ones. They should not require any centralized data access computation
or directories, to avoid hot-spots. Finally, they should be
highly-available which means they should be able to survive failures
of storage nodes.
One solution towards this goal is a new class of data structures
called Distributed Scalable Data Structures (SDDSs). First proposed
in 1993, they have become well known to the research community. SDDS's
gave rise to important research effort, materialized by several
algorithms, papers and implementations. It was shown that SDDS files
are able to scale to thousands of sites, and terabytes in distributed
RAM, with constant access performance, and key search times under a
millisecond. Multi-key searches requiring an hour or so in a
traditional file, e.g., a k-d file, should succeed in less than a
second in an SDDS file. All these properties should be of prime
importance for applications, especially in the DBMS design arena.
They open new perspective for VLDB design, for multimedia databases,
for real-time and high-availability databases, for decision support
systems, and for high performance computing in general.
The tutorial will introduce the attendees to the SDDS technology. We
will start with the general principles and the overview the major
research projects, especially the SDDS-2000 at U. Paris 9. We will
discuss design, performance, and application issues of the main
families of SDDS algorithms known by today. These are specifically:
- Hash partitioned SDDS's. These structures offer scalable hash
partitioning, generalizing the static hash partitioning popular with
the current generation of the parallel DBMSs. They extend the more
traditional dynamic hash data structures, especially the popular
linear hashing and dynamic hashing, to network multicomputers.
These are especially the typical networks of Wintel computers, or
specific ones like the 1024 Window NT cluster of NCSA, as well as
the switched multicomputer, e.g. the Parsytech with 128 PowerPCs, or
the IBM SP-2 supercomputer with up to 512 processors. Versions of
LH* are now patented by HP and IBM.
- Range partitioned SDDS's. These extend the traditional ordered data
structures, B-trees or binary trees, as well as the static range
partitioning, also popular with the current generation of parallel
DBMSs, to the multicomputers. Access performance for large files
get largely improved, compared to more traditional implementations.
- The (multi-key) SDDS's. They extend the traditional k-d data
structures to the multicomputers. Performance of multi-attribute
search get improved by orders of magnitude for larger files, as
compared to the traditional implementations.
- High-availability SDDS's. These new structures have no known
counterpart among the traditional data structures. They
transparently survive failures of n >= 1 data servers. This
capability is of basic importance in the Web environment requiring
24/7 serviceability, as otherwise a failure can be expensive,
e.g. $4B in lost market value and operational losses of $25-30M.
Some high-availability SDDS's apply principles of record mirroring,
or of record striping, enhanced to support the file scalability.
Other use a new principle of record grouping. This technology
provides much better storage efficiency than the mirroring, and,
unlike striping, also allows for parallel scans and function
shipping, since the records remain entire. The high-availability
results from various method for parity calculus. This includes the
so-called grouping functions (patent pending by IBM), and a specific
version of the Reed-Salomon erasure correcting codes. This version
introduces the so-called logarithmic parity matrix in Galois Field GF
(2^16) with one column and one row of 1's which makes it to our best
knowledge the fastest RS matrix known. We will present in
particular the performance measures on 100 Mb/s and 1 Gb/s Ethernet
supporting typical a typical Wintel PCs as SDDS clients and servers.
- Scalable distributed relational tables. This topic covers the SDDS
schemes one may apply to a relational table so to make it scaling
over more and more nodes transparently for the application. We will
present the prototype termed SD-SQL Server running on the top of the
SQL Server. It manages (transparently) the scalable distributed
tables using a distributed partitioned view as the SDDS client view.
It is the only DBMS with this feature we are aware of at present.
We will in particular address the implementation issues of the SDDS's.
We will focus on the Wintel prototype termed SDDS-2004 (U. Paris 9)
http://ceria.dauphine.fr/.
This system is available for download at
CERIA Web site. It was installed and experimented with in several
places. We will describe the experimental SDDS protocol and the
software architectures of SDDS-2004. We will also address the
original technique of algebraic signatures that is built-in for faster
update processing and disk back-up. This technique is the only known,
to the best of our knowledge, to attain zero probability of collision
(false positive). We will also present experimental performance
measurements. These prove in particular access performance reaching
30 µs per key search in distributed RAM where SDDS-2003 keeps the
(possibly even very large) data for the search operations. This
access time is about 300 times faster than to a local disk, and will
remain probably ever inaccessible to mechanical devices.
We will also discuss the relationship of the design principles of
SDDSs with respect to the recent ideas in unstructured or structured
P2P systems such as Freenet, Chord, and Pastry. We will show
practical implications for information, knowledge and data management.
Finally, we will discuss the new perspective that SDDS's open up for
the applications, especially DBMSs and Web servers supporting the
discussed applications, including related problems of concurrency
control, transaction management, and query optimization issues.
Audience
The tutorial is intended for the researchers and practitioners
designing basic software, middleware, new data management systems and
advanced applications, interested in the revolutionary possibilities
that the multicomputers bring to their fields.
About the Instructors
Dr. Witold A. Litwin is Professor of Computer Science at the
University Paris 9 (Dauphine) since 1990. He is also Director of
Centre d'Etudes et de Recherches en Informatique Appliquée (CERIA) of
U. Paris 9. His research areas are multidatabase systems,
interoperability, data structures and scalable distributed
multicomputer data structures. The multidatabase system design
techniques and linear hash data structures he has proposed in 80-ties
are considered among most important practical contributions to these
domains. They are routinely taught in database courses, and are
present in major database systems and popular applications: Netscape
browser & servers, Unify DBMS, LH-Server, Microsoft: IIS (Internet
Information Server), FrontPage, Exchange... Dr. Litwin has been
invited lecturer and scientist at several universities. In the US, he
taught Database courses among others at UC Berkeley in 1992-94, at
Santa Clara University in 1991 and Stanford University in 1990-91.
His course on Multidatabase Systems was broadcast on Stanford
Educational TV Network (SITN). He was visiting scientist at prominent
US research centers, including IBM Almaden Research Center, in 1997
and 1998 and part time at Hewlett Packard Palo Alto Laboratories,
between 1990 and 1994, as well as U. of Maryland in 1989. Between
1980 and 1989 Dr. Litwin was Research Director at Institut National de
Recherche en Informatique et Automatique (INRIA, France), and Expert
for ISO Committee on Open Systems. Dr. Litwin wrote hundred fifty
research papers, edited or contributed to eleven books, and was
Program Committee member of fifty international database conferences.
He is ACM Fellow, and member of IEEE. The publication list of
Dr. Litwin is at http://ceria.dauphine.fr/witold.html.
Prof. Thomas Schwarz is currently Associate Professor in the
Department of Computer Engineering, Santa Clara University, Santa
Clara, CA. He has Ph.D. in Computer Science Engineering from UC San
Diego, 1994 and is Dr.rer. nat. in Non-associative Algebra of
FernUniversität Hagen, 1984. His research addresses, among others
goals, the signature calculus and the erasure correcting codes, for
the high-availability scalable distributed data structures
especially. Prof. Schwarz (co)authored numerous publications in these
areas.