The benefits of distribution and the need to incorporate the key characteristics of Section 1.3 in order to achieve the benefits (e.g. through a discussion of distributed UNIX implementations).
We take a resource-based approach to the definition of many distributed system design problems throughout the book, so the material on resource-sharing, resource managers and object managers in Section 1.3 is important. This also provides an opportunity to link the study of distributed systems to programming methodology through abstract data types and object-orientation.
Transparency in general and the eight forms of transparency defined on pages 20-21. Note that these transparencies are discussed in relation to Sun NFS on pages 223-225.
Section 1.3 is inevitably rather abstract and needs illustration, see below.
Section 1.4 on historical development has little conceptual content apart from the cyclic model of the development of technical ideas presented Figure 1.9. Once that has been covered, the section is probably best left for private study.
Since many of the concepts of Section 1.3 are likely to be new to students, they should be illustrated profusely, using the examples from the chapter or better, from local systems that provide suitable illustrations (positive or negative) of the key characteristics.
Thus we would emphasize that the design of computer systems (and distributed systems in particular) involves trade-offs. The design objectives listed in Figure 2.1 are the commodities to be traded. In order to trade them, we must first understand how they are achieved.
For example, communication is not cost-free. Communication times have an important effect on the performance of distributed systems. We must understand that there are many factors affecting the cost of communication beyond simple network performance. Once this has been appreciated, we can consider the trade-offs to be made between communication performance and reliability, scalability or security.
Trade-offs should not be the concern of the users. Users should be offered guarantees with respect to the design goals of Figure 2.1 and the user requirements of Section 2.3. The guarantees must be verifiable based on the design characteristics of the system. For example, in Chapter 8 we describe the guarantees of consistency offered by various file services in the face of multiple concurrent updates to a file's data.
We use the notion of shared resource in this chapter, so students should make sure that they have understood its definition in Chapter 1. Other terminology, such as process and name space may cause difficulties and should be revised if necessary.
Other exercises and examples of design problems in distributed systems or applications can be introduced; for example, the design of simple text-based conferencing system, a multimedia conferencing system or a network information service (similar to World Wide Web) could be considered. A design project such as those proposed in the project work section of this Guide will give students practical experience with designing against specific goals and trading-off goals against each other.
For students who have not previously studied computer networks: To provide an understanding of the basic concepts and principles of local and wide-area computer networking and their extension to internetworking. The intention is to achieve comprehension of the principles of operation of the major network technologies at a level of detail sufficient to assess the impact of networks on distributed system performance and reliability and to evaluate and compare different network technologies for use in distributed systems.
For students who have previously studied computer networks: To recapitulate and update their knowledge of the principles of operation of the major network technologies, including internetwork technologies. To relate that knowledge to the requirements of distributed systems. To consider the special requirements of distributed systems for simple, efficient protocol stacks.
The performance of distributed systems is determined by the speed and latency of end-to-end communication (sending process to receiving process) and not just the network performance. The software delays in the sending and receiving machines nearly always swamp the network transmission delays in determining the speed of communication, except for very heavily loaded networks or very high bandwidth applications.
Networks are highly reliable, whereas client and server computers and their software often aren't, so error detection and recovery is best performed end-to-end at the highest feasible level.
The OSI Reference Model and the ISO protocol standards that are based on it are a useful reference point, but not much used in practice. The TCP/IP and UDP suites are currently dominant, even for distributed systems implemented solely over local area networks. Since these are internetwork protocols, it is important to emphasize the layering of IP (the network layer protocol of the Internet) over various network-layer protocols for different physical networks.
To a considerable extent, the choice of underlying protocols is unimportant once good RPC and multicast communication services are available. But the FLIP case study in Section 3.5 and the description of the Firefly RPC system in Section 18.8 show how a root-and-branch approach to the design of a protocol stack optimized for distributed system requirements can yield optimal performance.
The conceptualization of the basic communication models - store-and-forward packet routing, broadcast with collisions (CSMA/CD) and without (the various ring technologies) - seems to cause some difficulty to students without previous networking knowledge. This leads to difficulty in relating the effect of a particular communication model on the potential performance of distributed systems - for example, routing delays in store-and-forward networks are variable and can be quite large. These distinctions should be emphasised and related to the case studies - for example, routing delays are still variable in ATM networks, but much smaller.
Message passing is the best building block for lightweight interprocess communication protocols because it carries the minimum possible overheads for the resulting protocols.
System reconfigurability requires location independent identifiers for message destinations.
Request-reply protocols are designed to support client-server communication in which a client in an active role asks a passive server to perform an operation and return the result. This relationship determines the protocol as follows: (i) the client needs one primitive (DoOperation) and the server needs two (GetRequest and SendReply); (ii) since clients normally wait for replies DoOperation is synchronous; (iii) a server must be able to receive GetRequest messages while performing operations. To deal with failure, requests are re-transmitted. To ensure that an operation is performed at most once, duplicate requests are filtered and replies are saved for re-transmission.
Communication between the members of a group of processes is the basis of replicated services. Atomic multicast ensures that all replicas will receive the same message. It is not simple to implement because the originator may fail. Totally-ordered multicast ensures that all replicas receive messages in the same order. Recipients `hold back' messages so as to deliver them in the correct order. Protocols entail considerable latency, large numbers of messages and storage costs. Causally ordered multicast is less expensive, because `hold-back' is not needed.
Although the request-reply protocol appears to be straightforward, students do not always appreciate the effort required by the protocol to ensure that an operation is performed at most once. This is important for Chapter 5.
In our experience, students find it hard to understand the important options for multicast - in particular, they do not appear to register the forward reference to causal ordering.
Before discussing the request-reply protocol, consider all the effects of client or server failing independently. Assume that client-server communication is synchronous (like a procedure call) and avoid the issue of calls not requiring replies until Section 5.5.
For totally-ordered multicast, emphasise the fact that messages from different originators may arrive at common destinations in different orders. Use this to motivate the need for globally ordered message identifiers.
Call semantics cannot be `exactly once'. The semantics achieved depends on the approach to dealing with failures.
Services are encapsulated and their clients interact with them only via interfaces. Servers are designed, implemented and installed before clients can use them. Client programs access a service by means of RPCs to operations in its interface.
A `service interface' defines the signatures of the procedures in a service. It forms the basis for generating `client stub procedures' and marshalling procedures which together enable RPCs to have access transparency. It can also be used to generate server stub procedures and dispatcher.
Late binding of a client to a service is essential. This is achieved with the help of a binder which can also provide location transparency. Incremental development of services is supported by allowing for the selection of a versions. A binder should be relocatable.
The two case studies illustrate approaches to the design of an RPC system. It is worth discussing the different approaches to an interface definition language and a binder. These studies also provide an opportunity to discuss issues of access and location transparency.
Synchronous RPC may not provide adequate performance for some applications. Asynchronous RPC and buffered calls enhance the throughput. If the two are integrated in single communication mechanism (e.g. the call stream), servers need not be aware of whether their clients are making synchronous or asynchronous calls.
The section on asynchronous RPC is rather specialised and could be omitted without having any effect on the understanding of later chapters.
Access and location transparency could be revised (from Section 1.3). The students could be asked to consider the measures taken to deal with failures in the Request-Reply protocol from Section 4.3.
Practical work involving either (i) the use of an RPC system (e.g. ANSA or Sun RPC) or (ii) the implementation of a simple RPC system over UDP would help to reinforce this chapter. (See Project Work).
The relative advantages and disadvantages of microkernels and monolithic kernels should be given lively and critical discussion, including discussion of which features covered in the chapter, such as multi-threading, can and should be bolted on to conventional monolithic kernels. Microkernels have yet to be proven in large-scale commercial use.
Threads are actually familiar `processes' from operating systems courses, which usually consider processes that can share memory. Threads are not a fundamentally new concept, but they are important for building efficient client-server systems.
Invocation is more than just communication. The need to handle multimedia data, in particular, shows that distributed operating systems have to provide more than just RPC. The cost of local invocations is significant. Communication is the key to heterogeneous interworking.
Large, sparse address spaces and copy-on-write memory sharing are needed in all modern operating systems, and are not connected with distribution per se. Virtual memory management has had to be re-thought for distributed systems; this has given us an opportunity to implement distributed shared memory.
Students seem to be able to appreciate the need for threads in servers, but examples of multi-threaded applications will help them appreciate the need for threads in clients.
The material requires illustration wherever possible with practical problems in server design. Chapters 7, 8 and 9 will reinforce the issues for file and name services.
It is possible to omit Section 6.6. on virtual memory, but it is needed for Chapter 17 on distributed shared memory and the case studies in Chapter 18.
Link the material to current developments in commercial operating system development, for example Microsoft's Windows NT and IBM's Workplace OS.
One or more of the case studies in Chapter 18 can be used to illustrate the material.
These products are largely based on a standard set of concepts and techniques. The purpose of this chapter is to lay down a framework for the discussion of distributed file systems that is somewhat abstracted from the products, enabling students more easily to understand their design and to compare and evaluate them. We achieve this by presenting a `basic model' that embodies most of the concepts found in currently available distributed file service products.
After studying this chapter, students should have sufficient understanding of distributed file service design and implementation to design and construct a practical file service. A secondary objective is to consider in detail the design of a particular distributed service, applying the principles learned in all preceding chapters.
File service components
The role of each of the components - flat file service, directory service and client module; the division of responsibilities between the components results in an exposed interface - the flat file service - that is hidden in conventional operating systems. The useful separation of concerns, openness and modularity achieved by the division of responsibilities. A forward reference can be made to the similarity that this structure bears to NFS (where the directory service can be seen as integrated with the client module, see Chapter 8)
Design issues
The need for unique file identifiers; file attributes; the benefits of statelessness.
Interfaces
The absence of an Open operation; the idempotent nature of all of the interfaces; comparison of the flat file service with Unix system calls; method for interpreting a multi-part pathname.
Implementation techniques
File groups; the significance of file groups is that they provide a basis for the distribution of groups of files between different servers and their replication (described in the Coda case study in Chapter 8).
Capabilities; space leaks; construction of UFIDs and their role in access control; file location methods; server cache; client cache; the practical importance of caching at both the server and the clients and the problems of consistency maintenance raised by client caching.
The model presented in this chapter is sufficiently detailed for it to be used as the basis for a coursework exercise on the implementation of a file service.
To provide a sufficiently detailed description of the file services covered to enable students to understand and evaluate their practical behaviour obtain an understanding of the reasons for the differences between their designs and for their success, commercial or otherwise, based on their design goals.
To give an early appreciation (through the study of Coda in Section 8.4) of the issues involved and the solutions available for the replication of data between servers. Replication will be covered more generally in Chapter 11.
Meaning of one-copy file semantics and the problems that this introduces for caching and replication.
Callback promises - comparison of AFS-2 with AFS-1; Vice service interface; update semantics - for AFS-1 and AFS-2, and semantics while file is open, potential lost updates; Unix kernel changes; read-only replicas; performance.
Note that this chapter is by no means exhaustive in its coverage of research on file service design; many other designs have been developed and reported, some with novel approaches and solutions to problems that go beyond what can be achieved in a straightforward emulation of the Unix application programming interface. Had space and time allowed, we might have included a discussion of log-based file server design [Rosenblum and Ousterhout 1982] and of the recently-reported file services designed to support multimedia applications by the inclusion of Quality of Service specifications for time-based data [Anderson et al. 1992, Lo 1994].
We need a separate name service in addition to the naming schemes used in the context of specific services, for the reasons given in Section 9.1.
There are separate concerns to be addressed: designing the name space; meeting administrative requirements to partition the name space; and creating an implementation that scales.
A system such as the DNS that resolves names on a world-wide scale represents a considerable engineering achievement - as does the system of routing daemons that resolves IP addresses. Students should be given a feel for the engineering problems and strong motivation for getting naming right.
The SNS is only a paper design; encourage students to criticize it and think up alternatives.
Caching and replication are key to name service implementation.
Students sometimes find confusing the models of navigation outlined in Sections 9.2 and 9.3 and discussed in the DNS case study. They should be encouraged to think of reasons to use one method rather than another.
It is advisable to cover at least DNS as a case study, since this will be familiar to most students.
Encourage use of nslookup to query local DNS servers, and querying of X.500 servers.
Ask students to investigate and analyse the Uniform Resource Locators used in the World Wide Web.
These recent publications expand on various topics discussed in the book.
[Anderson et al. 1992] Anderson, D.P., Osawa, Y. and Govindan, R., A File
System for Continuous Media, ACM Trans. on Computer Systems, Vol. 10, No. 4,
pp.311-337, November 1992.
A pioneering work on file service design for
multimedia applications with time-based data.
[Duchamp 1994] Duchamp, D., Optimistic Lookup of Whole NFS Paths in a Single
Operation, Summer USENIX Proceedings, Boston, MA, June 1994, pp 161-169.
An interesting discussion of a modified NFS that performs whole pathname lookups
optimistically.
[Lo 1994] Lo, S.L., A Modular and Extensible Network Storage Architecture,
Technical Report TR326, Cambridge University Computer Laboratory, January 1994,
(ftp:ftp.cl.cam.ac.uk/reports/TR326-sll-network.storage.architecture.ps.gz ).
Another valuable work on file service design for multimedia applications with
time-based data.
[Pawlowski et al. 1994] Pawlowski, B., Juscak, C., Staubach, P., Smith, C., Lebel, D., and Hitz, D., NFS Version 3, Design and Implementation, Summer USENIX Proceedings, Boston, MA, June 1994, pp 137-152.
Improvements to NFS to
overcome the `write-through bottleneck' and several other problems.
[Rosenblum and Ousterhout 1992]
Rosenblum, M. and Ousterhout, J.K., The design and Implementation of a
Log-Structured File System, ACM Trans. on Computer Systems, Vol. 10, No. 1,
pp.26-52, February 1992.
Standard reference on log-based file server design.