Distributed Systems: Concepts and Design
Edition 2, 1994
Presentation Points

  1. Chapter 1: Characterization of Distributed Systems
  2. Chapter 2: Design Goals
  3. Chapter 3: Networking and Internetworking
  4. Chapter 4: Interprocess Communication
  5. Chapter 5: Remote procedure calling
  6. Chapter 6: Distributed Operating Systems
  7. Chapter 7: File Service: a Model
  8. Chapter 8: File Service: Case Studies
  9. Chapter 9: Name Services
  10. Additional References

Chapter 1: Characterization of Distributed Systems

Objectives

To place distributed systems in a realistic context through examples of their application, a summary of their history and a survey of their essential characteristics. To become aware of the inherently distributed nature of many key applications and be able to identify benefits and drawbacks of distribution in real-world applications. To gain a good understanding of the meanings of the terms resource sharing, openness, concurrency, scalability, fault tolerance and transparency as they apply to distributed systems.

Points to emphasize

The wide scope of distributed applications, as illustrated by the application examples of Section 1.2, and the corresponding need for distributed systems that are general-purpose.

The benefits of distribution and the need to incorporate the key characteristics of Section 1.3 in order to achieve the benefits (e.g. through a discussion of distributed UNIX implementations).

We take a resource-based approach to the definition of many distributed system design problems throughout the book, so the material on resource-sharing, resource managers and object managers in Section 1.3 is important. This also provides an opportunity to link the study of distributed systems to programming methodology through abstract data types and object-orientation.

Transparency in general and the eight forms of transparency defined on pages 20-21. Note that these transparencies are discussed in relation to Sun NFS on pages 223-225.

Possible difficulties

The biggest difficulty is probably the `bottom-up', hardware-oriented approach that many students will bring from previous courses on computer networks. The chapter is determinedly `top-down'; it aims to present a view of distributed systems as the result of a major research, software design and engineering enterprise that has produced solutions to many problems previously considered intractable.

Section 1.3 is inevitably rather abstract and needs illustration, see below.

Section 1.4 on historical development has little conceptual content apart from the cyclic model of the development of technical ideas presented Figure 1.9. Once that has been covered, the section is probably best left for private study.

Teaching hints

The top-down design ethos is perhaps best conveyed by emphasising that although distributed systems are now pervasive, there are many unresolved design problems, and by showing that some problems cannot easily be solved by bottom-up methods, e.g. openness, scalable naming schemes, fault-tolerant software design.

Since many of the concepts of Section 1.3 are likely to be new to students, they should be illustrated profusely, using the examples from the chapter or better, from local systems that provide suitable illustrations (positive or negative) of the key characteristics.


Chapter 2: Design Goals

Objectives

Like Chapter 1, this is a context-setting chapter. It introduces the key problems of distributed system design and discusses the nature of their solutions. It does not attempt to describe design solutions - these will emerge in later chapters. The objectives are to understand the issues involved in designing general-purpose solutions to the problems of naming, communication, software structure, workload allocation and consistency maintenance in distributed systems and to become aware of the constraints imposed on system designers by key design goals (Figure 2.1) and users' requirements (Section 2.3). Much of the vocabulary for the book will also be acquired from this chapter.

Points to emphasize

We suggest a design-oriented approach to the chapter, aiming to achieve the realization that the current state-of-the-art in distributed systems is the result of myriad design decisions, not an immutable physical reality.

Thus we would emphasize that the design of computer systems (and distributed systems in particular) involves trade-offs. The design objectives listed in Figure 2.1 are the commodities to be traded. In order to trade them, we must first understand how they are achieved.

For example, communication is not cost-free. Communication times have an important effect on the performance of distributed systems. We must understand that there are many factors affecting the cost of communication beyond simple network performance. Once this has been appreciated, we can consider the trade-offs to be made between communication performance and reliability, scalability or security.

Trade-offs should not be the concern of the users. Users should be offered guarantees with respect to the design goals of Figure 2.1 and the user requirements of Section 2.3. The guarantees must be verifiable based on the design characteristics of the system. For example, in Chapter 8 we describe the guarantees of consistency offered by various file services in the face of multiple concurrent updates to a file's data.

Possible difficulties

The notion of design against specific goals may be unfamiliar. Some students will have difficulty in placing themselves in the position of designers, considering alternative designs and the trade-offs that they represent.

We use the notion of shared resource in this chapter, so students should make sure that they have understood its definition in Chapter 1. Other terminology, such as process and name space may cause difficulties and should be revised if necessary.

Teaching hints

There are many possible approaches to this chapter. Here we outline a design-oriented approach. This involves taking specific examples of distributed system design problems, considering each of the design issues defined in Section 2.2 and discussing possible trade-offs between the design goals of Figure 2.1 (and possibly some of the user requirements of Section 2.3). Our solution to Exercise 2.1 illustrates the application of this approach to the design of NFS.

Other exercises and examples of design problems in distributed systems or applications can be introduced; for example, the design of simple text-based conferencing system, a multimedia conferencing system or a network information service (similar to World Wide Web) could be considered. A design project such as those proposed in the project work section of this Guide will give students practical experience with designing against specific goals and trading-off goals against each other.


Chapter 3: Networking and Internetworking

Objectives

The chapter does not assume any prior knowledge of computer networks, but the objectives and pace of study will differ considerably for students who have taken an earlier networks course and those who have not.

For students who have not previously studied computer networks: To provide an understanding of the basic concepts and principles of local and wide-area computer networking and their extension to internetworking. The intention is to achieve comprehension of the principles of operation of the major network technologies at a level of detail sufficient to assess the impact of networks on distributed system performance and reliability and to evaluate and compare different network technologies for use in distributed systems.

For students who have previously studied computer networks: To recapitulate and update their knowledge of the principles of operation of the major network technologies, including internetwork technologies. To relate that knowledge to the requirements of distributed systems. To consider the special requirements of distributed systems for simple, efficient protocol stacks.

Points to emphasize

Latency of communication is at least as important for distributed system performance as data transfer rate.

The performance of distributed systems is determined by the speed and latency of end-to-end communication (sending process to receiving process) and not just the network performance. The software delays in the sending and receiving machines nearly always swamp the network transmission delays in determining the speed of communication, except for very heavily loaded networks or very high bandwidth applications.

Networks are highly reliable, whereas client and server computers and their software often aren't, so error detection and recovery is best performed end-to-end at the highest feasible level.

The OSI Reference Model and the ISO protocol standards that are based on it are a useful reference point, but not much used in practice. The TCP/IP and UDP suites are currently dominant, even for distributed systems implemented solely over local area networks. Since these are internetwork protocols, it is important to emphasize the layering of IP (the network layer protocol of the Internet) over various network-layer protocols for different physical networks.

To a considerable extent, the choice of underlying protocols is unimportant once good RPC and multicast communication services are available. But the FLIP case study in Section 3.5 and the description of the Firefly RPC system in Section 18.8 show how a root-and-branch approach to the design of a protocol stack optimized for distributed system requirements can yield optimal performance.

Possible difficulties

Even students who have previously studied computer networks may not be very familiar with internetworking concepts and principles.

The conceptualization of the basic communication models - store-and-forward packet routing, broadcast with collisions (CSMA/CD) and without (the various ring technologies) - seems to cause some difficulty to students without previous networking knowledge. This leads to difficulty in relating the effect of a particular communication model on the potential performance of distributed systems - for example, routing delays in store-and-forward networks are variable and can be quite large. These distinctions should be emphasised and related to the case studies - for example, routing delays are still variable in ATM networks, but much smaller.

Teaching hints

Devote plenty of time to the internetwork case studies. Comer [1991] is an excellent source of additional material and exercises on internetworking.

Chapter 4: Interprocess Communication

Objectives

To study the building blocks for lightweight interprocess communication protocols for a distributed system. To describe Request-Reply protocols (RPC and language integration are left until Chapter 5). To introduce the main options for group communication (the implementation is left until Chapter 11 and Section 18.9).

Points to emphasis

The material in Chapter 3 is relevant for networking, where the emphasis is on a single interconnection between a pair of components. Chapter 4 is concerned with distributed systems in which there are many components and interconnections and the emphasis is on the logical relationship between components.

Message passing is the best building block for lightweight interprocess communication protocols because it carries the minimum possible overheads for the resulting protocols.

System reconfigurability requires location independent identifiers for message destinations.

Request-reply protocols are designed to support client-server communication in which a client in an active role asks a passive server to perform an operation and return the result. This relationship determines the protocol as follows: (i) the client needs one primitive (DoOperation) and the server needs two (GetRequest and SendReply); (ii) since clients normally wait for replies DoOperation is synchronous; (iii) a server must be able to receive GetRequest messages while performing operations. To deal with failure, requests are re-transmitted. To ensure that an operation is performed at most once, duplicate requests are filtered and replies are saved for re-transmission.

Communication between the members of a group of processes is the basis of replicated services. Atomic multicast ensures that all replicas will receive the same message. It is not simple to implement because the originator may fail. Totally-ordered multicast ensures that all replicas receive messages in the same order. Recipients `hold back' messages so as to deliver them in the correct order. Protocols entail considerable latency, large numbers of messages and storage costs. Causally ordered multicast is less expensive, because `hold-back' is not needed.

Possible difficulties

Students will already know about network ports from networking courses (or from Section 3.3). They may not perceive that process ports are different. Unfortunately, the Unix case study may reinforce such misconceptions. When students consider the design of a Request-Reply protocol, they are often unsure whether messages are queued at server ports by the underlying communication service (e.g. UDP)

Although the request-reply protocol appears to be straightforward, students do not always appreciate the effort required by the protocol to ensure that an operation is performed at most once. This is important for Chapter 5.

In our experience, students find it hard to understand the important options for multicast - in particular, they do not appear to register the forward reference to causal ordering.

Teaching hints

The approach taken is to consider the roles of processes and the interactions between them in client-server communication and group multicast. The students could be asked to go back to Section 2.2. To widen the discussion, other patterns of communication such as producer consumer or peer groups of servers could be discussed. Revise location transparency (see Section 1.3) before discussing location independent destination identifiers.The details of the naming aspects of identifiers are deferred until Section 6.4.

Before discussing the request-reply protocol, consider all the effects of client or server failing independently. Assume that client-server communication is synchronous (like a procedure call) and avoid the issue of calls not requiring replies until Section 5.5.

For totally-ordered multicast, emphasise the fact that messages from different originators may arrive at common destinations in different orders. Use this to motivate the need for globally ordered message identifiers.


Chapter 5: Remote procedure calling

Objectives

To study the integration of synchronous RPC into a programming language and to study its implementation by means of two case studies. To introduce asynchronous RPC. The performance of RPC is left to Section 6.5.

Points to emphasize

An RPC system runs over a Request-Reply protocol and can be integrated relatively transparently with a programming language. The semantics of parameter passing is adjusted to suit the separation of client and server. Programmers must avoid passing addresses and using global variables and must be aware that new sorts of errors can arise due to distribution.

Call semantics cannot be `exactly once'. The semantics achieved depends on the approach to dealing with failures.

Services are encapsulated and their clients interact with them only via interfaces. Servers are designed, implemented and installed before clients can use them. Client programs access a service by means of RPCs to operations in its interface.

A `service interface' defines the signatures of the procedures in a service. It forms the basis for generating `client stub procedures' and marshalling procedures which together enable RPCs to have access transparency. It can also be used to generate server stub procedures and dispatcher.

Late binding of a client to a service is essential. This is achieved with the help of a binder which can also provide location transparency. Incremental development of services is supported by allowing for the selection of a versions. A binder should be relocatable.

The two case studies illustrate approaches to the design of an RPC system. It is worth discussing the different approaches to an interface definition language and a binder. These studies also provide an opportunity to discuss issues of access and location transparency.

Synchronous RPC may not provide adequate performance for some applications. Asynchronous RPC and buffered calls enhance the throughput. If the two are integrated in single communication mechanism (e.g. the call stream), servers need not be aware of whether their clients are making synchronous or asynchronous calls.

Possible difficulties

In our experience, students find it hard to appreciate the variety of call semantics and how to achieve each one.

The section on asynchronous RPC is rather specialised and could be omitted without having any effect on the understanding of later chapters.

Teaching hints

The approach taken is to consider RPC as a transparent form of local procedure call. If necessary revise the following topics which should have been covered in earlier courses on programming and operating systems. (i) the semantics of local procedure call and parameter passing; (ii) the use of modules with interfaces; (iii) the use of the standard input/output library in UNIX as an application programming interface to the system calls; (iv) the way that application programs handle the -1 error returns of UNIX system calls.

Access and location transparency could be revised (from Section 1.3). The students could be asked to consider the measures taken to deal with failures in the Request-Reply protocol from Section 4.3.

Practical work involving either (i) the use of an RPC system (e.g. ANSA or Sun RPC) or (ii) the implementation of a simple RPC system over UDP would help to reinforce this chapter. (See Project Work).


Chapter 6: Distributed Operating Systems

Objectives

To reinforce the notion of a distributed system as a collection of resources managed by kernels and servers, and to identify the infrastructure requirements necessary for interworking between and within computers. To explain the rationale for microkernels and compare them to monolithic kernels. To understand the need for multi-threaded processes. To identify invocation mechanisms and appreciate their importance and costs. To examine required memory management features, including virtual memory. To prepare the ground for discussion of services and distributed shared memory described later in the book.

Points to emphasise

Very few distributed systems are controlled by a homogeneous operating system. There is not always a clear distinction between the operating system and the applications that utilise it. The material in this chapter deals with an infrastructure for process management, memory management and invocation, on which further services can be based.

The relative advantages and disadvantages of microkernels and monolithic kernels should be given lively and critical discussion, including discussion of which features covered in the chapter, such as multi-threading, can and should be bolted on to conventional monolithic kernels. Microkernels have yet to be proven in large-scale commercial use.

Threads are actually familiar `processes' from operating systems courses, which usually consider processes that can share memory. Threads are not a fundamentally new concept, but they are important for building efficient client-server systems.

Invocation is more than just communication. The need to handle multimedia data, in particular, shows that distributed operating systems have to provide more than just RPC. The cost of local invocations is significant. Communication is the key to heterogeneous interworking.

Large, sparse address spaces and copy-on-write memory sharing are needed in all modern operating systems, and are not connected with distribution per se. Virtual memory management has had to be re-thought for distributed systems; this has given us an opportunity to implement distributed shared memory.

Possible difficulties

This chapter assumes a reasonable grasp of first courses in computer architecture and (centralized) operating systems.

Students seem to be able to appreciate the need for threads in servers, but examples of multi-threaded applications will help them appreciate the need for threads in clients.

The material requires illustration wherever possible with practical problems in server design. Chapters 7, 8 and 9 will reinforce the issues for file and name services.

It is possible to omit Section 6.6. on virtual memory, but it is needed for Chapter 17 on distributed shared memory and the case studies in Chapter 18.

Teaching hints

Review the material on software structure in Section 2.2.

Link the material to current developments in commercial operating system development, for example Microsoft's Windows NT and IBM's Workplace OS.

One or more of the case studies in Chapter 18 can be used to illustrate the material.


Chapter 7: File Service: a Model

Objectives

The design of distributed file systems has been extensively researched since the earliest days of distributed system development (see Figures 1.10 and 1.11). Several successful products have emerged, with which many students will be familiar.

These products are largely based on a standard set of concepts and techniques. The purpose of this chapter is to lay down a framework for the discussion of distributed file systems that is somewhat abstracted from the products, enabling students more easily to understand their design and to compare and evaluate them. We achieve this by presenting a `basic model' that embodies most of the concepts found in currently available distributed file service products.

After studying this chapter, students should have sufficient understanding of distributed file service design and implementation to design and construct a practical file service. A secondary objective is to consider in detail the design of a particular distributed service, applying the principles learned in all preceding chapters.

Points to emphasize

The key role of file servers means that failure transparency is important and that the performance of file servers under load must be good.

File service components

The role of each of the components - flat file service, directory service and client module; the division of responsibilities between the components results in an exposed interface - the flat file service - that is hidden in conventional operating systems. The useful separation of concerns, openness and modularity achieved by the division of responsibilities. A forward reference can be made to the similarity that this structure bears to NFS (where the directory service can be seen as integrated with the client module, see Chapter 8)

Design issues

The need for unique file identifiers; file attributes; the benefits of statelessness.

Interfaces

The absence of an Open operation; the idempotent nature of all of the interfaces; comparison of the flat file service with Unix system calls; method for interpreting a multi-part pathname.

Implementation techniques

File groups; the significance of file groups is that they provide a basis for the distribution of groups of files between different servers and their replication (described in the Coda case study in Chapter 8).

Capabilities; space leaks; construction of UFIDs and their role in access control; file location methods; server cache; client cache; the practical importance of caching at both the server and the clients and the problems of consistency maintenance raised by client caching.

Possible difficulties

The most difficult portion of the chapter is probably the material on the construction and use of capabilities for access control, pp. 212-215. Capabilities are introduced here because the model relies on their use to implement file protection but they are of much wider applicability; they are an important tool for open system design and are used extensively in modern microkernel-based systems. Their use in Chorus and Amoeba is discussed in Chapter 18. If students have difficulty with capabilities here, they should be recapitulated when Chapter 18 is studied.

Teaching hints

The service interfaces defined in this chapter are quite straightforward. They are amongst the first service interfaces that the student will have encountered (there is one earlier example in Chapter 5 (Figure 5.3) and several others in later chapters); we recommend a review of these file service interfaces, comparing them with the Unix file system primitives (see page 206) and emphasizing their support for the design goal of stateless servers.

The model presented in this chapter is sufficiently detailed for it to be used as the basis for a coursework exercise on the implementation of a file service.


Chapter 8: File Service: Case Studies

Objectives

To show that the designs of NFS and the other file services discussed in this chapter are in accordance with the principles of distributed system design studied in earlier chapters and with the basic model for file servers presented in Chapter 7, thus grounding that material firmly in examples of real systems.

To provide a sufficiently detailed description of the file services covered to enable students to understand and evaluate their practical behaviour obtain an understanding of the reasons for the differences between their designs and for their success, commercial or otherwise, based on their design goals.

To give an early appreciation (through the study of Coda in Section 8.4) of the issues involved and the solutions available for the replication of data between servers. Replication will be covered more generally in Chapter 11.

Points to emphasize

The three systems studied in this chapter provide examples of the fact that designs for services can differ radically even when the application programming interface is almost identical. It is for this reason that the design of distributed systems remains an important and interesting activity.

Meaning of one-copy file semantics and the problems that this introduces for caching and replication.

NFS

Remote mounting - what it is, and what it means for location transparency; statelessness of the NFS interface - benefits and drawbacks; software components and architecture - why it is implemented in the kernel; VFS - its role; path name translation; mount service; caching - implications of updates and cache consistency; performance. See also [Pawlowski et al. 1994] for a description of recent revisions and performance improvements, and [Duchamp 1994] for a discussion of a method for optimal pathname lookup.

AFS

Whole-file serving and caching - compare with ftp; observations of UNIX file usage - especially average and maximum file sizes and locality of reference to files; software structure - roles of Vice and Venus, user-level processes; Vice as a flat file service, with directory structure implemented in Venus - FID structure; implementation of file system calls in AFS.

Callback promises - comparison of AFS-2 with AFS-1; Vice service interface; update semantics - for AFS-1 and AFS-2, and semantics while file is open, potential lost updates; Unix kernel changes; read-only replicas; performance.

Coda

Lessons learned from AFS - requirements for scalability, fault-tolerance, detached working; disconnected operation - user-generated list of required files; VSGs and AVSGs - broadcast of updates to AVSG on close; optimistic replication strategy - use of CVV, with example; update semantics; use of multicast RPC to replicate updates; cache coherence - dealing with changes to the AVSG and lost callbacks; disconnected operation; performance.

Possible difficulties and teaching hints

Coda version vectors usually cause difficulty on first reading. They are a special case of the vector timestamps found in Chapter 11.

Note that this chapter is by no means exhaustive in its coverage of research on file service design; many other designs have been developed and reported, some with novel approaches and solutions to problems that go beyond what can be achieved in a straightforward emulation of the Unix application programming interface. Had space and time allowed, we might have included a discussion of log-based file server design [Rosenblum and Ousterhout 1982] and of the recently-reported file services designed to support multimedia applications by the inclusion of Quality of Service specifications for time-based data [Anderson et al. 1992, Lo 1994].


Chapter 9: Name Services

Objectives

To give the reader an appreciation of the importance of naming as an issue in distributed systems. To understand the issues relevant to the design and implementation of a name service, including: the name space; the resolution mechanism; the division and replication of naming data between servers; and attribute caching. To understand the key features of the DNS, GNS and X.500 name services.

Points to emphasise

Naming is intimately related to network (access and location) transparency and migration transparency.

We need a separate name service in addition to the naming schemes used in the context of specific services, for the reasons given in Section 9.1.

There are separate concerns to be addressed: designing the name space; meeting administrative requirements to partition the name space; and creating an implementation that scales.

A system such as the DNS that resolves names on a world-wide scale represents a considerable engineering achievement - as does the system of routing daemons that resolves IP addresses. Students should be given a feel for the engineering problems and strong motivation for getting naming right.

The SNS is only a paper design; encourage students to criticize it and think up alternatives.

Caching and replication are key to name service implementation.

Possible difficulties

Students may be confused because a name service appears to be just a database service. Point out the ways in which a name service differs from a conventional database: the resolution mechanism, the physical scale of its use, and the slow-changing nature of the data.

Students sometimes find confusing the models of navigation outlined in Sections 9.2 and 9.3 and discussed in the DNS case study. They should be encouraged to think of reasons to use one method rather than another.

Teaching hints

Review the brief discussion of naming in Sections 2.2 and 6.4. Review the way names arise in interprocess communication and remote procedure call. Encourage the students to think of examples of how naming is important in other areas of Computer Science, for example program compilation and linking.

It is advisable to cover at least DNS as a case study, since this will be familiar to most students.

Encourage use of nslookup to query local DNS servers, and querying of X.500 servers.

Ask students to investigate and analyse the Uniform Resource Locators used in the World Wide Web.


Additional References

These recent publications expand on various topics discussed in the book.

[Anderson et al. 1992] Anderson, D.P., Osawa, Y. and Govindan, R., A File System for Continuous Media, ACM Trans. on Computer Systems, Vol. 10, No. 4, pp.311-337, November 1992.
A pioneering work on file service design for multimedia applications with time-based data.

[Duchamp 1994] Duchamp, D., Optimistic Lookup of Whole NFS Paths in a Single Operation, Summer USENIX Proceedings, Boston, MA, June 1994, pp 161-169.
An interesting discussion of a modified NFS that performs whole pathname lookups optimistically.

[Lo 1994] Lo, S.L., A Modular and Extensible Network Storage Architecture, Technical Report TR326, Cambridge University Computer Laboratory, January 1994, (ftp:ftp.cl.cam.ac.uk/reports/TR326-sll-network.storage.architecture.ps.gz ).
Another valuable work on file service design for multimedia applications with time-based data.

[Pawlowski et al. 1994] Pawlowski, B., Juscak, C., Staubach, P., Smith, C., Lebel, D., and Hitz, D., NFS Version 3, Design and Implementation, Summer USENIX Proceedings, Boston, MA, June 1994, pp 137-152.
Improvements to NFS to overcome the `write-through bottleneck' and several other problems.

[Rosenblum and Ousterhout 1992] Rosenblum, M. and Ousterhout, J.K., The design and Implementation of a Log-Structured File System, ACM Trans. on Computer Systems, Vol. 10, No. 1, pp.26-52, February 1992.
Standard reference on log-based file server design.