Briefly describe the motivations and benefits of web services, computer science homework help

Content Type

User Generated

User

Una760

Subject

Writing

Description

Attached is the book

Describe the issues of data representation arising from interprocess communication on disparate platforms (4)
Discuss the importance of network transparency as it relates to distributed systems
Briefly describe the motivations and benefits of web services (9)
Explain the difference between a private and public cloud solution
What 5 distinctions differentiate routing overlays over IP in peer to peer networks?

Unformatted Attachment Preview

This page intentionally left blank DISTRIBUTED SYSTEMS Concepts and Design Fifth Edition This page intentionally left blank DISTRIBUTED SYSTEMS Concepts and Design Fifth Edition George Coulouris Cambridge University Jean Dollimore formerly of Queen Mary, University of London Tim Kindberg matter 2 media Gordon Blair Lancaster University Editorial Director: Editor-in-Chief: Executive Editor: Editorial Assistant: Vice President, Marketing: Marketing Manager: Marketing Coordinator: Vice President, Production: Managing Editor: Senior Production Project Manager: Senior Operations Supervisor: Manufacturing Buyer: Art Director: Cover Designer: Cover Image: Marcia Horton Michael Hirsch Matt Goldstein Chelsea Bell Patrice Jones Yezan Alayan Kathryn Ferranti Vince O’Brien Jeff Holcomb Marilyn Lloyd Alan Fischer Lisa McDowell Jayne Conte Suzanne Duda Sky: © amygdala_imagery; Kite: © Alamy; Mobile phone: © yasinguneysu/iStock Media Editor: Daniel Sandin Media Project Manager: Wanda Rockwell Printer/Binder: Edwards Brothers Cover Printer: Lehigh-Phoenix Color Typesetting and layout by the authors using FrameMaker Copyright © 2012, 2005, 2001, 1994, 1988 Pearson Education, Inc., publishing as Addison-Wesley. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, 501 Boylston Street, Suite 900, Boston, Massachusetts 02116. Many of the designations by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or all caps. Library of Congress Cataloging-in-Publication Data available upon request Impression 1 10 9 8 7 6 5 4 3 2 1—EB—15 14 13 12 11 ISBN 10: 0-13-214301-1 ISBN 13: 978-0-13-214301-1 CONTENTS PREFACE 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS 1.1 1.2 1.3 1.4 1.5 1.6 1.7 2 3 Introduction Examples of distributed systems Trends in distributed systems Focus on resource sharing Challenges Case study: The World Wide Web Summary XI 1 2 3 8 14 16 26 33 SYSTEM MODELS 37 2.1 2.2 2.3 2.4 2.5 38 39 40 61 76 Introduction Physical models Architectural models Fundamental models Summary NETWORKING AND INTERNETWORKING 3.1 3.2 3.3 3.4 3.5 3.6 Introduction Types of network Network principles Internet protocols Case studies: Ethernet, WiFi and Bluetooth Summary 81 82 86 89 106 128 141 V VI CONTENTS 4 5 6 7 INTERPROCESS COMMUNICATION 145 4.1 Introduction 146 4.2 The API for the Internet protocols 147 4.3 External data representation and marshalling 158 4.4 Multicast communication 169 4.5 Network virtualization: Overlay networks 174 4.6 Case study: MPI 178 4.7 Summary 181 REMOTE INVOCATION 185 5.1 Introduction 186 5.2 Request-reply protocols 187 5.3 Remote procedure call 195 5.4 Remote method invocation 204 5.5 Case study: Java RMI 217 5.6 Summary 225 INDIRECT COMMUNICATION 229 6.1 Introduction 230 6.2 Group communication 232 6.3 Publish-subscribe systems 242 6.4 Message queues 254 6.5 Shared memory approaches 262 6.6 Summary 274 OPERATING SYSTEM SUPPORT 279 7.1 Introduction 280 7.2 The operating system layer 281 7.3 Protection 284 7.4 Processes and threads 286 7.5 Communication and invocation 303 7.6 Operating system architecture 314 7.7 Virtualization at the operating system level 318 7.8 Summary 331 CONTENTS 8 9 VII DISTRIBUTED OBJECTS AND COMPONENTS 335 8.1 Introduction 336 8.2 Distributed objects 337 8.3 Case study: CORBA 340 8.4 From objects to components 358 8.5 Case studies: Enterprise JavaBeans and Fractal 364 8.6 Summary 378 WEB SERVICES 381 9.1 Introduction 382 9.2 Web services 384 9.3 Service descriptions and IDL for web services 400 9.4 A directory service for use with web services 404 9.5 XML security 406 9.6 Coordination of web services 411 9.7 Applications of web services 413 9.8 Summary 419 10 PEER-TO-PEER SYSTEMS 10.1 Introduction 423 424 10.2 Napster and its legacy 428 10.3 Peer-to-peer middleware 430 10.4 Routing overlays 433 10.5 Overlay case studies: Pastry, Tapestry 436 10.6 Application case studies: Squirrel, OceanStore, Ivy 449 10.7 Summary 458 11 SECURITY 463 11.1 Introduction 464 11.2 Overview of security techniques 472 11.3 Cryptographic algorithms 484 11.4 Digital signatures 493 11.5 Cryptography pragmatics 500 11.6 Case studies: Needham–Schroeder, Kerberos, TLS, 802.11 WiFi 503 11.7 Summary 518 VIII CONTENTS 12 DISTRIBUTED FILE SYSTEMS 12.1 12.2 12.3 12.4 12.5 12.6 Introduction File service architecture Case study: Sun Network File System Case study: The Andrew File System Enhancements and further developments Summary 13 NAME SERVICES 13.1 13.2 13.3 13.4 13.5 13.6 Introduction Name services and the Domain Name System Directory services Case study: The Global Name Service Case study: The X.500 Directory Service Summary 14 TIME AND GLOBAL STATES 14.1 14.2 14.3 14.4 14.5 14.6 14.7 521 522 530 536 548 557 563 565 566 569 584 585 588 592 595 Introduction Clocks, events and process states Synchronizing physical clocks Logical time and logical clocks Global states Distributed debugging Summary 596 597 599 607 610 619 626 15 COORDINATION AND AGREEMENT 629 15.1 15.2 15.3 15.4 15.5 15.6 Introduction Distributed mutual exclusion Elections Coordination and agreement in group communication Consensus and related problems Summary 630 633 641 646 659 671 CONTENTS 16 TRANSACTIONS AND CONCURRENCY CONTROL 16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8 Introduction Transactions Nested transactions Locks Optimistic concurrency control Timestamp ordering Comparison of methods for concurrency control Summary 17 DISTRIBUTED TRANSACTIONS 17.1 17.2 17.3 17.4 17.5 17.6 17.7 Introduction Flat and nested distributed transactions Atomic commit protocols Concurrency control in distributed transactions Distributed deadlocks Transaction recovery Summary 18 REPLICATION 18.1 18.2 18.3 18.4 Introduction System model and the role of group communication Fault-tolerant services Case studies of highly available services: The gossip architecture, Bayou and Coda 18.5 Transactions with replicated data 18.6 Summary 19 MOBILE AND UBIQUITOUS COMPUTING 19.1 19.2 19.3 19.4 19.5 19.6 19.7 19.8 Introduction Association Interoperation Sensing and context awareness Security and privacy Adaptation Case study: Cooltown Summary IX 675 676 679 690 692 707 711 718 720 727 728 728 731 740 743 751 761 765 766 768 775 782 802 814 817 818 827 835 844 857 866 871 878 X CONTENTS 20 DISTRIBUTED MULTIMEDIA SYSTEMS 20.1 20.2 20.3 20.4 20.5 20.6 20.7 Introduction Characteristics of multimedia data Quality of service management Resource management Stream adaptation Case studies: Tiger, BitTorrent and End System Multicast Summary 21 DESIGNING DISTRIBUTED SYSTEMS: GOOGLE CASE STUDY 21.1 21.2 21.3 21.4 21.5 21.6 21.7 Introduction Introducing the case study: Google Overall architecture and design philosophy Underlying communication paradigms Data storage and coordination services Distributed computation services Summary REFERENCES INDEX 881 882 886 887 897 899 901 913 915 916 917 922 928 935 956 964 967 1025 PREFACE This fifth edition of our textbook appears at a time when the Internet and the Web continue to grow and have an impact on every aspect of our society. For example, the introductory chapter of the book notes their impact on application areas as diverse as finance and commerce, arts and entertainment and the emergence of the information society more generally. It also highlights the very demanding requirements of application domains such as web search and multiplayer online games. From a distributed systems perspective, these developments are placing substantial new demands on the underlying system infrastructure in terms of the range of applications and the workloads and system sizes supported by many modern systems. Important trends include the increasing diversity and ubiquity of networking technologies (including the increasing importance of wireless networks), the inherent integration of mobile and ubiquitous computing elements into the distributed systems infrastructure New to the fifth edition New chapters: Indirect Communication: Covering group communication, publish-subscribe and case studies on JavaSpaces, JMS, WebSphere and Message Queues. Distributed Objects and Components: Covering component-based middleware and case studies on Enterprise JavaBeans, Fractal and CORBA. Designing Distributed Systems: Devoted to a major new case study on the Google infrastructure. Topics added to other chapters: Cloud computing, network virtualization, operating system virtualization, message passing interface, unstructured peer-to-peer, tuple spaces, loose coupling in relation to web services. Other new case studies: Skype, Gnutella, TOTA, L2imbo, BitTorrent, End System Multicast. See the table on page XV for further details of the changes. XI XII PREFACE (leading to radically different physical architectures), the need to support multimedia services and the emergence of the cloud computing paradigm, which challenges our perspective of distributed systems services. The book aims to provide an understanding of the principles on which the Internet and other distributed systems are based; their architecture, algorithms and design; and how they meet the demands of contemporary distributed applications. We begin with a set of seven chapters that together cover the building blocks for a study of distributed systems. The first two chapters provide a conceptual overview of the subject, outlining the characteristics of distributed systems and the challenges that must be addressed in their design: scalability, heterogeneity, security and failure handling being the most significant. These chapters also develop abstract models for understanding process interaction, failure and security. They are followed by other foundational chapters devoted to the study of networking, interprocess communication, remote invocation, indirect communication and operating system support. The next set of chapters covers the important topic of middleware, examining different approaches to supporting distributed applications including distributed objects and components, web services and alternative peer-to-peer solutions. We then cover the well-established topics of security, distributed file systems and distributed naming before moving on to important data-related aspects including distributed transactions and data replication. Algorithms associated with all these topics are covered as they arise and also in separate chapters devoted to timing, coordination and agreement. The book culminates in chapters that address the emerging areas of mobile and ubiquitous computing and distributed multimedia systems before presenting a substantial case study focusing on the design and implementation of the distributed systems infrastructure that supports Google both in terms of core search functionality and the increasing range of additional services offered by Google (for example, Gmail and Google Earth). This last chapter has an important role in illustrating how all the architectural concepts, algorithms and technologies introduced in the book can come together in a coherent overall design for a given application domain. Purposes and readership The book is intended for use in undergraduate and introductory postgraduate courses. It can equally be used for self-study. We take a top-down approach, addressing the issues to be resolved in the design of distributed systems and describing successful approaches in the form of abstract models, algorithms and detailed case studies of widely used systems. We cover the field in sufficient depth and breadth to enable readers to go on to study most research papers in the literature on distributed systems. We aim to make the subject accessible to students who have a basic knowledge of object-oriented programming, operating systems and elementary computer architecture. The book includes coverage of those aspects of computer networks relevant to distributed systems, including the underlying technologies for the Internet and for wide area, local area and wireless networks. Algorithms and interfaces are presented throughout the book in Java or, in a few cases, ANSI C. For brevity and clarity of presentation, a form of pseudo-code derived from Java/C is also used. PREFACE XIII Foundations Distributed algorithms 1 Characterization of Distributed Systems 2 System Models 3 Networking and Internetworking 4 Interprocess Communication 5 Remote Invocation 6 Indirect Communication 7 Operating System Support 14 Time and Global States 15 Coordination and Agreement Shared data Middleware 8 Dist. Objects and Components 9 Web Services 10 Peer-to-Peer Systems 16 Transactions and Concurrency Control 17 Distributed Transactions 18 Replication New challenges System services 11 Security 12 Distributed File Systems 13 Name Services 19 Mobile and Ubiquitous Computing 20 Distributed Multimedia Systems Substantial case study 21 Designing Distributed Systems: Google Case Study Organization of the book The diagram shows the book’s chapters under seven main topic areas. It is intended to provide a guide to the book’s structure and to indicate recommended navigation routes for instructors wishing to provide, or readers wishing to achieve, understanding of the various subfields of distributed system design. References The existence of the World Wide Web has changed the way in which a book such as this can be linked to source material, including research papers, technical specifications and standards. Many of the source documents are now available on the Web; some are available only there. For reasons of brevity and readability, we employ a special form of reference to web material that loosely resembles a URL: references such as [www.omg.org] and [www.rsasecurity.com I] refer to documentation that is available XIV PREFACE only on the Web. They can be looked up in the reference list at the end of the book, but the full URLs are given only in an online version of the reference list at the book’s web site, www.cdk5.net/refs where they take the form of clickable links. Both versions of the reference list include a more detailed explanation of this scheme. Changes relative to the fourth edition Before embarking on the writing of this new edition, we carried out a survey of teachers who used the fourth edition. From the results, we identified the new material required and a number of changes to be made. In addition, we recognized the increasing diversity of distributed systems, particularly in terms of the range of architectural approaches available to distributed systems developers today. This required significant changes to the book, especially in the earlier (foundational) chapters. Overall, this led to our writing three entirely new chapters, making substantial changes to a number of other chapters and making numerous insertions throughout the book to fold in new material. Many of the chapters have been changed to reflect new information that has become available about the systems described. These changes are summarized in the table below. To help teachers who have used the fourth edition, wherever possible we have preserved the structure adopted from the previous edition. Where material has been removed, we have placed this on our companion web site together with material removed from previous editions. This includes the case studies on ATM, interprocess communication in UNIX, CORBA (a shortened version of which remains in Chapter 8), the Jini distributed events specification and Grid middleware (featuring OGSA and the Globus toolkit), as well as the chapter on distributed shared memory (a brief summary of which is now included in Chapter 6). Some of the chapters in the book, such as the new chapter on indirect communication (Chapter 6), cover a lot of material. Teachers may elect to cover the broad spectrum before choosing two or three techniques to examine in more detail (for example, group communication, given its foundational role, and publish-subscribe or message queues, given their prevalence in commercial distributed systems). The chapter ordering has been changed to accommodate the new material and to reflect changes in the relative importance of certain topics. For a full understanding of some topics readers may find it necessary to follow a forward reference. For example, there is material in Chapter 9 on XML security techniques that will make better sense once the sections that it references in Chapter 11 Security have been absorbed. Acknowledgements We are very grateful to the following teachers who participated in our survey: Guohong Cao, Jose Fortes, Bahram Khalili, George Blank, Jinsong Ouyang, JoAnne Holliday, George K. Thiruvathukal, Joel Wein, Tao Xie and Xiaobo Zhou. We would like to thank the following people who reviewed the new chapters or provided other substantial help: Rob Allen, Roberto Baldoni, John Bates, Tom Berson, Lynne Blair, Geoff Coulson, Paul Grace, Andrew Herbert, David Hutchison, Laurent Mathy, Rajiv Ramdhany, Richard Sharp, Jean-Bernard Stefani, Rip Sohan, Francois PREFACE XV New chapters: 6 Indirect Communication Includes events and notification from 4th edition. 8 Distributed Objects and Components Includes a precised version of the CORBA case study from the 4th edition. 21 Designing Distributed Systems Includes a major new case study on Google Chapters which have undergone substantial changes: 1 Characterization of DS Significant restructuring of material New Section 1.2: Examples of distributed systems Section 1.3.4: Cloud computing introduced 2 System Models Significant restructuring of material New Section 2.2: Physical models Section 2.3: Major rewrite to reflect new book content and associated architectural perspectives 4 Interprocess Communication Several updates Client-server communication moved to Chapter 5 New Section 4.5: Network virtualization (includes case study on Skype) New Section 4.6: Case study on MPI Case study on IPC in UNIX removed 5 Remote Invocation Significant restructuring of material Client-server communication moved to here Progression introduced from client-server communication through RPC to RMI Events and notification moved to Chapter 6 Chapters to which new material has been added/removed, but without structural changes: 3 Networking and Internetworking Several updates Section 3.5: material on ATM removed 7 Operating System Support New Section 7.7: OS virtualization 9 Web Services Section 9.2: Discussion added on loose coupling 10 Peer-to-Peer Systems New Section 10.5.3: Unstructured peer-to-peer (including a new case study on Gnutella) 15 Coordination and Agreement Material on group communication moved to Ch. 6 18 Replication Material on group communication moved to Ch. 6 19 Mobile and Ubiquitous Computing Section 19.3.1: New material on tuple spaces (TOTA and L2imbo) 20 Distributed Multimedia Systems Section 20.6: New case studies added on BitTorrent and End System Multicast The remaining chapters have received only minor modifications. XVI PREFACE Taiani, Peter Triantafillou, Gareth Tyson and the late Sir Maurice Wilkes. We would also like to thank the staff at Google who provided insights into the design rationale for Google Infrastructure, namely: Mike Burrows, Tushar Chandra, Walfredo Cirne, Jeff Dean, Sanjay Ghemawat, Andrea Kirmse and John Reumann. Our copy editor, Rachel Head also provided outstanding support. Web site As before, we continue to maintain a web site with a wide range of material designed to assist teachers and readers. This web site can be accessed via the URL: www.cdk5.net The web site includes: Instructor’s Guide: We provide supporting material for teachers comprising: • complete artwork of the book available as PowerPoint files; • chapter-by-chapter teaching hints; • solutions to the exercises, protected by a password available only to teachers. Reference list: The list of references that can be found at the end of the book is replicated at the web site. The web version of the reference list includes active links for material that is available online. Errata list: A list of known errors in the book is maintained, with corrections. The errors will be corrected when new impressions are printed and a separate errata list will be provided for each impression. (Readers are encouraged to report any apparent errors they encounter to the email address below.) Supplementary material: We maintain a set of supplementary material for each chapter. This consists of source code for the programs in the book and relevant reading material that was present in previous editions of the book but was removed for reasons of space. References to this supplementary material appear in the book with links such as www.cdk5.net/ipc (the URL for supplementary material relating to the Interprocess Communication chapter). Two entire chapters from the 4th edition are not present in this one; they can be accessed at the URLs: CORBA Case Study www.cdk5.net/corba Distributed Shared Memory www.cdk5.net/dsm George Coulouris Jean Dollimore Tim Kindberg Gordon Blair London, Bristol and Lancaster, 2011 authors@cdk5.net 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Introduction Examples of distributed systems Trends in distributed systems Focus on resource sharing Challenges Case study: The World Wide Web Summary A distributed system is one in which components located at networked computers communicate and coordinate their actions only by passing messages. This definition leads to the following especially significant characteristics of distributed systems: concurrency of components, lack of a global clock and independent failures of components. We look at several examples of modern distributed applications, including web search, multiplayer online games and financial trading systems, and also examine the key underlying trends driving distributed systems today: the pervasive nature of modern networking, the emergence of mobile and ubiquitous computing, the increasing importance of distributed multimedia systems, and the trend towards viewing distributed systems as a utility. The chapter then highlights resource sharing as a main motivation for constructing distributed systems. Resources may be managed by servers and accessed by clients or they may be encapsulated as objects and accessed by other client objects. The challenges arising from the construction of distributed systems are the heterogeneity of their components, openness (which allows components to be added or replaced), security, scalability – the ability to work well when the load or the number of users increases – failure handling, concurrency of components, transparency and providing quality of service. Finally, the Web is discussed as an example of a large-scale distributed system and its main features are introduced. 1 2 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS 1.1 Introduction Networks of computers are everywhere. The Internet is one, as are the many networks of which it is composed. Mobile phone networks, corporate networks, factory networks, campus networks, home networks, in-car networks – all of these, both separately and in combination, share the essential characteristics that make them relevant subjects for study under the heading distributed systems. In this book we aim to explain the characteristics of networked computers that impact system designers and implementors and to present the main concepts and techniques that have been developed to help in the tasks of designing and implementing systems that are based on them. We define a distributed system as one in which hardware or software components located at networked computers communicate and coordinate their actions only by passing messages. This simple definition covers the entire range of systems in which networked computers can usefully be deployed. Computers that are connected by a network may be spatially separated by any distance. They may be on separate continents, in the same building or in the same room. Our definition of distributed systems has the following significant consequences: Concurrency: In a network of computers, concurrent program execution is the norm. I can do my work on my computer while you do your work on yours, sharing resources such as web pages or files when necessary. The capacity of the system to handle shared resources can be increased by adding more resources (for example. computers) to the network. We will describe ways in which this extra capacity can be usefully deployed at many points in this book. The coordination of concurrently executing programs that share resources is also an important and recurring topic. No global clock: When programs need to cooperate they coordinate their actions by exchanging messages. Close coordination often depends on a shared idea of the time at which the programs’ actions occur. But it turns out that there are limits to the accuracy with which the computers in a network can synchronize their clocks – there is no single global notion of the correct time. This is a direct consequence of the fact that the only communication is by sending messages through a network. Examples of these timing problems and solutions to them will be described in Chapter 14. Independent failures: All computer systems can fail, and it is the responsibility of system designers to plan for the consequences of possible failures. Distributed systems can fail in new ways. Faults in the network result in the isolation of the computers that are connected to it, but that doesn’t mean that they stop running. In fact, the programs on them may not be able to detect whether the network has failed or has become unusually slow. Similarly, the failure of a computer, or the unexpected termination of a program somewhere in the system (a crash), is not immediately made known to the other components with which it communicates. Each component of the system can fail independently, leaving the others still running. The consequences of this characteristic of distributed systems will be a recurring theme throughout the book. The prime motivation for constructing and using distributed systems stems from a desire to share resources. The term ‘resource’ is a rather abstract one, but it best characterizes the range of things that can usefully be shared in a networked computer system. It SECTION 1.2 EXAMPLES OF DISTRIBUTED SYSTEMS 3 extends from hardware components such as disks and printers to software-defined entities such as files, databases and data objects of all kinds. It includes the stream of video frames that emerges from a digital video camera and the audio connection that a mobile phone call represents. The purpose of this chapter is to convey a clear view of the nature of distributed systems and the challenges that must be addressed in order to ensure that they are successful. Section 1.2 gives some illustrative examples of distributed systems, with Section 1.3 covering the key underlying trends driving recent developments. Section 1.4 focuses on the design of resource-sharing systems, while Section 1.5 describes the key challenges faced by the designers of distributed systems: heterogeneity, openness, security, scalability, failure handling, concurrency, transparency and quality of service. Section 1.6 presents a detailed case study of one very well known distributed system, the World Wide Web, illustrating how its design supports resource sharing. 1.2 Examples of distributed systems The goal of this section is to provide motivational examples of contemporary distributed systems illustrating both the pervasive role of distributed systems and the great diversity of the associated applications. As mentioned in the introduction, networks are everywhere and underpin many everyday services that we now take for granted: the Internet and the associated World Wide Web, web search, online gaming, email, social networks, eCommerce, etc. To illustrate this point further, consider Figure 1.1, which describes a selected range of key commercial or social application sectors highlighting some of the associated established or emerging uses of distributed systems technology. As can be seen, distributed systems encompass many of the most significant technological developments of recent years and hence an understanding of the underlying technology is absolutely central to a knowledge of modern computing. The figure also provides an initial insight into the wide range of applications in use today, from relatively localized systems (as found, for example, in a car or aircraft) to globalscale systems involving millions of nodes, from data-centric services to processorintensive tasks, from systems built from very small and relatively primitive sensors to those incorporating powerful computational elements, from embedded systems to ones that support a sophisticated interactive user experience, and so on. We now look at more specific examples of distributed systems to further illustrate the diversity and indeed complexity of distributed systems provision today. 1.2.1 Web search Web search has emerged as a major growth industry in the last decade, with recent figures indicating that the global number of searches has risen to over 10 billion per calendar month. The task of a web search engine is to index the entire contents of the World Wide Web, encompassing a wide range of information styles including web pages, multimedia sources and (scanned) books. This is a very complex task, as current estimates state that the Web consists of over 63 billion pages and one trillion unique web 4 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS Figure 1.1 Selected application domains and associated networked applications Finance and commerce The growth of eCommerce as exemplified by companies such as Amazon and eBay, and underlying payments technologies such as PayPal; the associated emergence of online banking and trading and also complex information dissemination systems for financial markets. The information society The growth of the World Wide Web as a repository of information and knowledge; the development of web search engines such as Google and Yahoo to search this vast repository; the emergence of digital libraries and the large-scale digitization of legacy information sources such as books (for example, Google Books); the increasing significance of user-generated content through sites such as YouTube, Wikipedia and Flickr; the emergence of social networking through services such as Facebook and MySpace. Creative industries and entertainment The emergence of online gaming as a novel and highly interactive form of entertainment; the availability of music and film in the home through networked media centres and more widely in the Internet via downloadable or streaming content; the role of user-generated content (as mentioned above) as a new form of creativity, for example via services such as YouTube; the creation of new forms of art and entertainment enabled by emergent (including networked) technologies. Healthcare The growth of health informatics as a discipline with its emphasis on online electronic patient records and related issues of privacy; the increasing role of telemedicine in supporting remote diagnosis or more advanced services such as remote surgery (including collaborative working between healthcare teams); the increasing application of networking and embedded systems technology in assisted living, for example for monitoring the elderly in their own homes. Education The emergence of e-learning through for example web-based tools such as virtual learning environments; associated support for distance learning; support for collaborative or community-based learning. Transport and logistics The use of location technologies such as GPS in route finding systems and more general traffic management systems; the modern car itself as an example of a complex distributed system (also applies to other forms of transport such as aircraft); the development of web-based map services such as MapQuest, Google Maps and Google Earth. Science The emergence of the Grid as a fundamental technology for eScience, including the use of complex networks of computers to support the storage, analysis and processing of (often very large quantities of) scientific data; the associated use of the Grid as an enabling technology for worldwide collaboration between groups of scientists. Environmental management The use of (networked) sensor technology to both monitor and manage the natural environment, for example to provide early warning of natural disasters such as earthquakes, floods or tsunamis and to coordinate emergency response; the collation and analysis of global environmental parameters to better understand complex natural phenomena such as climate change. SECTION 1.2 EXAMPLES OF DISTRIBUTED SYSTEMS 5 addresses. Given that most search engines analyze the entire web content and then carry out sophisticated processing on this enormous database, this task itself represents a major challenge for distributed systems design. Google, the market leader in web search technology, has put significant effort into the design of a sophisticated distributed system infrastructure to support search (and indeed other Google applications and services such as Google Earth). This represents one of the largest and most complex distributed systems installations in the history of computing and hence demands close examination. Highlights of this infrastructure include: • an underlying physical infrastructure consisting of very large numbers of networked computers located at data centres all around the world; • a distributed file system designed to support very large files and heavily optimized for the style of usage required by search and other Google applications (especially reading from files at high and sustained rates); • an associated structured distributed storage system that offers fast access to very large datasets; • a lock service that offers distributed system functions such as distributed locking and agreement; • a programming model that supports the management of very large parallel and distributed computations across the underlying physical infrastructure. Further details on Google’s distributed systems services and underlying communications support can be found in Chapter 21, a compelling case study of a modern distributed system in action. 1.2.2 Massively multiplayer online games (MMOGs) Massively multiplayer online games offer an immersive experience whereby very large numbers of users interact through the Internet with a persistent virtual world. Leading examples of such games include Sony’s EverQuest II and EVE Online from the Finnish company CCP Games. Such worlds have increased significantly in sophistication and now include, complex playing arenas (for example EVE, Online consists of a universe with over 5,000 star systems) and multifarious social and economic systems. The number of players is also rising, with systems able to support over 50,000 simultaneous online players (and the total number of players perhaps ten times this figure). The engineering of MMOGs represents a major challenge for distributed systems technologies, particularly because of the need for fast response times to preserve the user experience of the game. Other challenges include the real-time propagation of events to the many players and maintaining a consistent view of the shared world. This therefore provides an excellent example of the challenges facing modern distributed systems designers. A number of solutions have been proposed for the design of massively multiplayer online games: • Perhaps surprisingly, the largest online game, EVE Online, utilises a client-server architecture where a single copy of the state of the world is maintained on a 6 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS centralized server and accessed by client programs running on players’ consoles or other devices. To support large numbers of clients, the server is a complex entity in its own right consisting of a cluster architecture featuring hundreds of computer nodes (this client-server approach is discussed in more detail in Section 1.4 and cluster approaches are discussed in Section 1.3.4). The centralized architecture helps significantly in terms of the management of the virtual world and the single copy also eases consistency concerns. The goal is then to ensure fast response through optimizing network protocols and ensuring a rapid response to incoming events. To support this, the load is partitioned by allocating individual ‘star systems’ to particular computers within the cluster, with highly loaded star systems having their own dedicated computer and others sharing a computer. Incoming events are directed to the right computers within the cluster by keeping track of movement of players between star systems. • Other MMOGs adopt more distributed architectures where the universe is partitioned across a (potentially very large) number of servers that may also be geographically distributed. Users are then dynamically allocated a particular server based on current usage patterns and also the network delays to the server (based on geographical proximity for example). This style of architecture, which is adopted by EverQuest, is naturally extensible by adding new servers. • Most commercial systems adopt one of the two models presented above, but researchers are also now looking at more radical architectures that are not based on client-server principles but rather adopt completely decentralized approaches based on peer-to-peer technology where every participant contributes resources (storage and processing) to accommodate the game. Further consideration of peerto-peer solutions is deferred until Chapters 2 and 10). 1.2.3 Financial trading As a final example, we look at distributed systems support for financial trading markets. The financial industry has long been at the cutting edge of distributed systems technology with its need, in particular, for real-time access to a wide range of information sources (for example, current share prices and trends, economic and political developments). The industry employs automated monitoring and trading applications (see below). Note that the emphasis in such systems is on the communication and processing of items of interest, known as events in distributed systems, with the need also to deliver events reliably and in a timely manner to potentially very large numbers of clients who have a stated interest in such information items. Examples of such events include a drop in a share price, the release of the latest unemployment figures, and so on. This requires a very different style of underlying architecture from the styles mentioned above (for example client-server), and such systems typically employ what are known as distributed event-based systems. We present an illustration of a typical use of such systems below and return to this important topic in more depth in Chapter 6. Figure 1.2 illustrates a typical financial trading system. This shows a series of event feeds coming into a given financial institution. Such event feeds share the SECTION 1.2 EXAMPLES OF DISTRIBUTED SYSTEMS 7 Figure 1.2 An example financial trading system Trading strategies FIX Gateway FIX Adapter Complex Event Processing Engine Reuters Adapter FIX events Reuters Gateway Reuters events following characteristics. Firstly, the sources are typically in a variety of formats, such as Reuters market data events and FIX events (events following the specific format of the Financial Information eXchange protocol), and indeed from different event technologies, thus illustrating the problem of heterogeneity as encountered in most distributed systems (see also Section 1.5.1). The figure shows the use of adapters which translate heterogeneous formats into a common internal format. Secondly, the trading system must deal with a variety of event streams, all arriving at rapid rates, and often requiring real-time processing to detect patterns that indicate trading opportunities. This used to be a manual process but competitive pressures have led to increasing automation in terms of what is known as Complex Event Processing (CEP), which offers a way of composing event occurrences together into logical, temporal or spatial patterns. This approach is primarily used to develop customized algorithmic trading strategies covering both buying and selling of stocks and shares, in particular looking for patterns that indicate a trading opportunity and then automatically responding by placing and managing orders. As an example, consider the following script: WHEN MSFT price moves outside 2% of MSFT Moving Average FOLLOWED-BY ( MyBasket moves up by 0.5% AND HPQ’s price moves up by 5% OR MSFT’s price moves down by 2% ) ) ALL WITHIN any 2 minute time period THEN BUY MSFT SELL HPQ 8 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS This script is based on the functionality provided by Apama [www.progress.com], a commercial product in the financial world originally developed out of research carried out at the University of Cambridge. The script detects a complex temporal sequence based on the share prices of Microsoft, HP and a basket of other share prices, resulting in decisions to buy or sell particular shares. This style of technology is increasingly being used in other areas of financial systems including the monitoring of trading activity to manage risk (in particular, tracking exposure), to ensure compliance with regulations and to monitor for patterns of activity that might indicate fraudulent transactions. In such systems, events are typically intercepted and passed through what is equivalent to a compliance and risk firewall before being processed (see also the discussion of firewalls in Section 1.3.1 below). 1.3 Trends in distributed systems Distributed systems are undergoing a period of significant change and this can be traced back to a number of influential trends: • the emergence of pervasive networking technology; • the emergence of ubiquitous computing coupled with the desire to support user mobility in distributed systems; • the increasing demand for multimedia services; • the view of distributed systems as a utility. 1.3.1 Pervasive networking and the modern Internet The modern Internet is a vast interconnected collection of computer networks of many different types, with the range of types increasing all the time and now including, for example, a wide range of wireless communication technologies such as WiFi, WiMAX, Bluetooth (see Chapter 3) and third-generation mobile phone networks. The net result is that networking has become a pervasive resource and devices can be connected (if desired) at any time and in any place. Figure 1.3 illustrates a typical portion of the Internet. Programs running on the computers connected to it interact by passing messages, employing a common means of communication. The design and construction of the Internet communication mechanisms (the Internet protocols) is a major technical achievement, enabling a program running anywhere to address messages to programs anywhere else and abstracting over the myriad of technologies mentioned above. The Internet is also a very large distributed system. It enables users, wherever they are, to make use of services such as the World Wide Web, email and file transfer. (Indeed, the Web is sometimes incorrectly equated with the Internet.) The set of services is open-ended – it can be extended by the addition of server computers and new types of service. The figure shows a collection of intranets – subnetworks operated by companies and other organizations and typically protected by firewalls. The role of a firewall is to protect an intranet by preventing unauthorized messages from leaving or entering. A SECTION 1.3 TRENDS IN DISTRIBUTED SYSTEMS 9 Figure 1.3 A typical portion of the Internet intranet ISP ba ne bo k c c ba e on kb k lin lite tel sa ck ba ne bo desktop computer: server: network link: firewall is implemented by filtering incoming and outgoing messages. Filtering might be done by source or destination, or a firewall might allow only those messages related to email and web access to pass into or out of the intranet that it protects. Internet Service Providers (ISPs) are companies that provide broadband links and other types of connection to individual users and small organizations, enabling them to access services anywhere in the Internet as well as providing local services such as email and web hosting. The intranets are linked together by backbones. A backbone is a network link with a high transmission capacity, employing satellite connections, fibre optic cables and other high-bandwidth circuits. Note that some organizations may not wish to connect their internal networks to the Internet at all. For example, police and other security and law enforcement agencies are likely to have at least some internal intranets that are isolated from the outside world (the most effective firewall possible – the absence of any physical connections to the Internet). Firewalls can also be problematic in distributed systems by impeding legitimate access to services when resource sharing between internal and external users is required. Hence, firewalls must often be complemented by more fine-grained mechanisms and policies, as discussed in Chapter 11. The implementation of the Internet and the services that it supports has entailed the development of practical solutions to many distributed system issues (including most of those defined in Section 1.5). We shall highlight those solutions throughout the book, pointing out their scope and their limitations where appropriate. 10 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS 1.3.2 Mobile and ubiquitous computing Technological advances in device miniaturization and wireless networking have led increasingly to the integration of small and portable computing devices into distributed systems. These devices include: • Laptop computers. • Handheld devices, including mobile phones, smart phones, GPS-enabled devices, pagers, personal digital assistants (PDAs), video cameras and digital cameras. • Wearable devices, such as smart watches with functionality similar to a PDA. • Devices embedded in appliances such as washing machines, hi-fi systems, cars and refrigerators. The portability of many of these devices, together with their ability to connect conveniently to networks in different places, makes mobile computing possible. Mobile computing is the performance of computing tasks while the user is on the move, or visiting places other than their usual environment. In mobile computing, users who are away from their ‘home’ intranet (the intranet at work, or their residence) are still provided with access to resources via the devices they carry with them. They can continue to access the Internet; they can continue to access resources in their home intranet; and there is increasing provision for users to utilize resources such as printers or even sales points that are conveniently nearby as they move around. The latter is also known as location-aware or context-aware computing. Mobility introduces a number of challenges for distributed systems, including the need to deal with variable connectivity and indeed disconnection, and the need to maintain operation in the face of device mobility (see the discussion on mobility transparency in Section 1.5.7). Ubiquitous computing is the harnessing of many small, cheap computational devices that are present in users’ physical environments, including the home, office and even natural settings. The term ‘ubiquitous’ is intended to suggest that small computing devices will eventually become so pervasive in everyday objects that they are scarcely noticed. That is, their computational behaviour will be transparently and intimately tied up with their physical function. The presence of computers everywhere only becomes useful when they can communicate with one another. For example, it may be convenient for users to control their washing machine or their entertainment system from their phone or a ‘universal remote control’ device in the home. Equally, the washing machine could notify the user via a smart badge or phone when the washing is done. Ubiquitous and mobile computing overlap, since the mobile user can in principle benefit from computers that are everywhere. But they are distinct, in general. Ubiquitous computing could benefit users while they remain in a single environment such as the home or a hospital. Similarly, mobile computing has advantages even if it involves only conventional, discrete computers and devices such as laptops and printers. Figure 1.4 shows a user who is visiting a host organization. The figure shows the user’s home intranet and the host intranet at the site that the user is visiting. Both intranets are connected to the rest of the Internet. The user has access to three forms of wireless connection. Their laptop has a means of connecting to the host’s wireless LAN. This network provides coverage of a SECTION 1.3 TRENDS IN DISTRIBUTED SYSTEMS Figure 1.4 11 Portable and handheld devices in a distributed system Internet Host intranet Home intranet Wireless LAN Printer Camera Mobile phone Laptop GPS satellite signal 3G phone network Host site few hundred metres (a floor of a building, say). It connects to the rest of the host intranet via a gateway or access point. The user also has a mobile (cellular) telephone, which is connected to the Internet. The phone gives access to the Web and other Internet services, constrained only by what can be presented on its small display, and may also provide location information via built-in GPS functionality. Finally, the user carries a digital camera, which can communicate over a personal area wireless network (with range up to about 10m) with a device such as a printer. With a suitable system infrastructure, the user can perform some simple tasks in the host site using the devices they carry. While journeying to the host site, the user can fetch the latest stock prices from a web server using the mobile phone and can also use the built-in GPS and route finding software to get directions to the site location. During the meeting with their hosts, the user can show them a recent photograph by sending it from the digital camera directly to a suitably enabled (local) printer or projector in the meeting room (discovered using a location service). This requires only the wireless link between the camera and printer or projector. And they can in principle send a document from their laptop to the same printer, utilizing the wireless LAN and wired Ethernet links to the printer. This scenario demonstrates the need to support spontaneous interoperation, whereby associations between devices are routinely created and destroyed – for example by locating and using the host’s devices, such as printers. The main challenge applying to such situations is to make interoperation fast and convenient (that is, spontaneous) even though the user is in an environment they may never have visited before. That means enabling the visitor’s device to communicate on the host network, and associating the device with suitable local services – a process called service discovery. Mobile and ubiquitous computing represent lively areas of research, and the various dimensions mentioned above are discussed in depth in Chapter 19. 12 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS 1.3.3 Distributed multimedia systems Another important trend is the requirement to support multimedia services in distributed systems. Multimedia support can usefully be defined as the ability to support a range of media types in an integrated manner. One can expect a distributed system to support the storage, transmission and presentation of what are often referred to as discrete media types, such as pictures or text messages. A distributed multimedia system should be able to perform the same functions for continuous media types such as audio and video; that is, it should be able to store and locate audio or video files, to transmit them across the network (possibly in real time as the streams emerge from a video camera), to support the presentation of the media types to the user and optionally also to share the media types across a group of users. The crucial characteristic of continuous media types is that they include a temporal dimension, and indeed, the integrity of the media type is fundamentally dependent on preserving real-time relationships between elements of a media type. For example, in a video presentation it is necessary to preserve a given throughput in terms of frames per second and, for real-time streams, a given maximum delay or latency for the delivery of frames (this is one example of quality of service, discussed in more detail in Section 1.5.8). The benefits of distributed multimedia computing are considerable in that a wide range of new (multimedia) services and applications can be provided on the desktop, including access to live or pre-recorded television broadcasts, access to film libraries offering video-on-demand services, access to music libraries, the provision of audio and video conferencing facilities and integrated telephony features including IP telephony or related technologies such as Skype, a peer-to-peer alternative to IP telephony (the distributed system infrastructure underpinning Skype is discussed in Section 4.5.2). Note that this technology is revolutionary in challenging manufacturers to rethink many consumer devices. For example, what is the core home entertainment device of the future – the computer, the television, or the games console? Webcasting is an application of distributed multimedia technology. Webcasting is the ability to broadcast continuous media, typically audio or video, over the Internet. It is now commonplace for major sporting or music events to be broadcast in this way, often attracting large numbers of viewers (for example, the Live8 concert in 2005 attracted around 170,000 simultaneous users at its peak). Distributed multimedia applications such as webcasting place considerable demands on the underlying distributed infrastructure in terms of: • providing support for an (extensible) range of encoding and encryption formats, such as the MPEG series of standards (including for example the popular MP3 standard otherwise known as MPEG-1, Audio Layer 3) and HDTV; • providing a range of mechanisms to ensure that the desired quality of service can be met; • providing associated resource management strategies, including appropriate scheduling policies to support the desired quality of service; • providing adaptation strategies to deal with the inevitable situation in open systems where quality of service cannot be met or sustained. Further discussion of such mechanisms can be found in Chapter 20. SECTION 1.3 TRENDS IN DISTRIBUTED SYSTEMS 1.3.4 13 Distributed computing as a utility With the increasing maturity of distributed systems infrastructure, a number of companies are promoting the view of distributed resources as a commodity or utility, drawing the analogy between distributed resources and other utilities such as water or electricity. With this model, resources are provided by appropriate service suppliers and effectively rented rather than owned by the end user. This model applies to both physical resources and more logical services: • Physical resources such as storage and processing can be made available to networked computers, removing the need to own such resources on their own. At one end of the spectrum, a user may opt for a remote storage facility for file storage requirements (for example, for multimedia data such as photographs, music or video) and/or for backups. Similarly, this approach would enable a user to rent one or more computational nodes, either to meet their basic computing needs or indeed to perform distributed computation. At the other end of the spectrum, users can access sophisticated data centres (networked facilities offering access to repositories of often large volumes of data to users or organizations) or indeed computational infrastructure using the sort of services now provided by companies such as Amazon and Google. Operating system virtualization is a key enabling technology for this approach, implying that users may actually be provided with services by a virtual rather than a physical node. This offers greater flexibility to the service supplier in terms of resource management (operating system virtualization is discussed in more detail in Chapter 7). • Software services (as defined in Section 1.4) can also be made available across the global Internet using this approach. Indeed, many companies now offer a comprehensive range of services for effective rental, including services such as email and distributed calendars. Google, for example, bundles a range of business services under the banner Google Apps [www.google.com I]. This development is enabled by agreed standards for software services, for example as provided by web services (see Chapter 9). The term cloud computing is used to capture this vision of computing as a utility. A cloud is defined as a set of Internet-based application, storage and computing services sufficient to support most users’ needs, thus enabling them to largely or totally dispense with local data storage and application software (see Figure 1.5). The term also promotes a view of everything as a service, from physical or virtual infrastructure through to software, often paid for on a per-usage basis rather than purchased. Note that cloud computing reduces requirements on users’ devices, allowing very simple desktop or portable devices to access a potentially wide range of resources and services. Clouds are generally implemented on cluster computers to provide the necessary scale and performance required by such services. A cluster computer is a set of interconnected computers that cooperate closely to provide a single, integrated highperformance computing capability. Building on projects such as the NOW (Network of Workstations) Project at Berkeley [Anderson et al. 1995, now.cs.berkeley.edu] and Beowulf at NASA [www.beowulf.org], the trend is towards utilizing commodity hardware both for the computers and for the interconnecting networks. Most clusters 14 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS Figure 1.5 Cloud computing Application services Clients Storage services Internet Computational services consist of commodity PCs running a standard (sometimes cut-down) version of an operating system such as Linux, interconnected by a local area network. Companies such as HP, Sun and IBM offer blade solutions. Blade servers are minimal computational elements containing for example processing and (main memory) storage capabilities. A blade system consists of a potentially large number of blade servers contained within a blade enclosure. Other elements such as power, cooling, persistent storage (disks), networking and displays, are provided either by the enclosure or through virtualized solutions (discussed in Chapter 7). Through this solution, individual blade servers can be much smaller and also cheaper to produce than commodity PCs. The overall goal of cluster computers is to provide a range of cloud services, including high-performance computing capabilities, mass storage (for example through data centres), and richer application services such as web search (Google, for example relies on a massive cluster computer architecture to implement its search engine and other services, as discussed in Chapter 21). Grid computing (as discussed in Chapter 9, Section 9.7.2) can also be viewed as a form of cloud computing. The terms are largely synonymous and at times ill-defined, but Grid computing can generally be viewed as a precursor to the more general paradigm of cloud computing with a bias towards support for scientific applications. 1.4 Focus on resource sharing Users are so accustomed to the benefits of resource sharing that they may easily overlook their significance. We routinely share hardware resources such as printers, data resources such as files, and resources with more specific functionality such as search engines. SECTION 1.4 FOCUS ON RESOURCE SHARING 15 Looked at from the point of view of hardware provision, we share equipment such as printers and disks to reduce costs. But of far greater significance to users is the sharing of the higher-level resources that play a part in their applications and in their everyday work and social activities. For example, users are concerned with sharing data in the form of a shared database or a set of web pages – not the disks and processors on which they are implemented. Similarly, users think in terms of shared resources such as a search engine or a currency converter, without regard for the server or servers that provide these. In practice, patterns of resource sharing vary widely in their scope and in how closely users work together. At one extreme, a search engine on the Web provides a facility to users throughout the world, users who need never come into contact with one another directly. At the other extreme, in computer-supported cooperative working (CSCW), a group of users who cooperate directly share resources such as documents in a small, closed group. The pattern of sharing and the geographic distribution of particular users determines what mechanisms the system must supply to coordinate users’ actions. We use the term service for a distinct part of a computer system that manages a collection of related resources and presents their functionality to users and applications. For example, we access shared files through a file service; we send documents to printers through a printing service; we buy goods through an electronic payment service. The only access we have to the service is via the set of operations that it exports. For example, a file service provides read, write and delete operations on files. The fact that services restrict resource access to a well-defined set of operations is in part standard software engineering practice. But it also reflects the physical organization of distributed systems. Resources in a distributed system are physically encapsulated within computers and can only be accessed from other computers by means of communication. For effective sharing, each resource must be managed by a program that offers a communication interface enabling the resource to be accessed and updated reliably and consistently. The term server is probably familiar to most readers. It refers to a running program (a process) on a networked computer that accepts requests from programs running on other computers to perform a service and responds appropriately. The requesting processes are referred to as clients, and the overall approach is known as client-server computing. In this approach, requests are sent in messages from clients to a server and replies are sent in messages from the server to the clients. When the client sends a request for an operation to be carried out, we say that the client invokes an operation upon the server. A complete interaction between a client and a server, from the point when the client sends its request to when it receives the server’s response, is called a remote invocation. The same process may be both a client and a server, since servers sometimes invoke operations on other servers. The terms ‘client’ and ‘server’ apply only to the roles played in a single request. Clients are active (making requests) and servers are passive (only waking up when they receive requests); servers run continuously, whereas clients last only as long as the applications of which they form a part. Note that while by default the terms ‘client’ and ‘server’ refer to processes rather than the computers that they execute upon, in everyday parlance those terms also refer to the computers themselves. Another distinction, which we shall discuss in Chapter 5, 16 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS is that in a distributed system written in an object-oriented language, resources may be encapsulated as objects and accessed by client objects, in which case we speak of a client object invoking a method upon a server object. Many, but certainly not all, distributed systems can be constructed entirely in the form of interacting clients and servers. The World Wide Web, email and networked printers all fit this model. We discuss alternatives to client-server systems in Chapter 2. An executing web browser is an example of a client. The web browser communicates with a web server, to request web pages from it. We consider the Web and its associated client-server architecture in more detail in Section 1.6. 1.5 Challenges The examples in Section 1.2 are intended to illustrate the scope of distributed systems and to suggest the issues that arise in their design. In many of them, significant challenges were encountered and overcome. As the scope and scale of distributed systems and applications is extended the same and other challenges are likely to be encountered. In this section we describe the main challenges. 1.5.1 Heterogeneity The Internet enables users to access services and run applications over a heterogeneous collection of computers and networks. Heterogeneity (that is, variety and difference) applies to all of the following: • networks; • computer hardware; • operating systems; • programming languages; • implementations by different developers. Although the Internet consists of many different sorts of network (illustrated in Figure 1.3), their differences are masked by the fact that all of the computers attached to them use the Internet protocols to communicate with one another. For example, a computer attached to an Ethernet has an implementation of the Internet protocols over the Ethernet, whereas a computer on a different sort of network will need an implementation of the Internet protocols for that network. Chapter 3 explains how the Internet protocols are implemented over a variety of different networks. Data types such as integers may be represented in different ways on different sorts of hardware – for example, there are two alternatives for the byte ordering of integers. These differences in representation must be dealt with if messages are to be exchanged between programs running on different hardware. Although the operating systems of all computers on the Internet need to include an implementation of the Internet protocols, they do not necessarily all provide the same application programming interface to these protocols. For example, the calls for exchanging messages in UNIX are different from the calls in Windows. SECTION 1.5 CHALLENGES 17 Different programming languages use different representations for characters and data structures such as arrays and records. These differences must be addressed if programs written in different languages are to be able to communicate with one another. Programs written by different developers cannot communicate with one another unless they use common standards, for example, for network communication and the representation of primitive data items and data structures in messages. For this to happen, standards need to be agreed and adopted – as have the Internet protocols. Middleware • The term middleware applies to a software layer that provides a programming abstraction as well as masking the heterogeneity of the underlying networks, hardware, operating systems and programming languages. The Common Object Request Broker (CORBA), which is described in Chapters 4, 5 and 8, is an example. Some middleware, such as Java Remote Method Invocation (RMI) (see Chapter 5), supports only a single programming language. Most middleware is implemented over the Internet protocols, which themselves mask the differences of the underlying networks, but all middleware deals with the differences in operating systems and hardware – how this is done is the main topic of Chapter 4. In addition to solving the problems of heterogeneity, middleware provides a uniform computational model for use by the programmers of servers and distributed applications. Possible models include remote object invocation, remote event notification, remote SQL access and distributed transaction processing. For example, CORBA provides remote object invocation, which allows an object in a program running on one computer to invoke a method of an object in a program running on another computer. Its implementation hides the fact that messages are passed over a network in order to send the invocation request and its reply. Heterogeneity and mobile code • The term mobile code is used to refer to program code that can be transferred from one computer to another and run at the destination – Java applets are an example. Code suitable for running on one computer is not necessarily suitable for running on another because executable programs are normally specific both to the instruction set and to the host operating system. The virtual machine approach provides a way of making code executable on a variety of host computers: the compiler for a particular language generates code for a virtual machine instead of a particular hardware order code. For example, the Java compiler produces code for a Java virtual machine, which executes it by interpretation. The Java virtual machine needs to be implemented once for each type of computer to enable Java programs to run. Today, the most commonly used form of mobile code is the inclusion Javascript programs in some web pages loaded into client browsers. This extension of Web technology is discussed further in Section 1.6. 1.5.2 Openness The openness of a computer system is the characteristic that determines whether the system can be extended and reimplemented in various ways. The openness of distributed systems is determined primarily by the degree to which new resource-sharing services can be added and be made available for use by a variety of client programs. 18 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS Openness cannot be achieved unless the specification and documentation of the key software interfaces of the components of a system are made available to software developers. In a word, the key interfaces are published. This process is akin to the standardization of interfaces, but it often bypasses official standardization procedures, which are usually cumbersome and slow-moving. However, the publication of interfaces is only the starting point for adding and extending services in a distributed system. The challenge to designers is to tackle the complexity of distributed systems consisting of many components engineered by different people. The designers of the Internet protocols introduced a series of documents called ‘Requests For Comments’, or RFCs, each of which is known by a number. The specifications of the Internet communication protocols were published in this series in the early 1980s, followed by specifications for applications that run over them, such as file transfer, email and telnet by the mid-1980s. This practice has continued and forms the basis of the technical documentation of the Internet. This series includes discussions as well as the specifications of protocols. Copies can be obtained from [www.ietf.org]. Thus the publication of the original Internet communication protocols has enabled a variety of Internet systems and applications including the Web to be built. RFCs are not the only means of publication. For example, the World Wide Web Consortium (W3C) develops and publishes standards related to the working of the Web [www.w3.org]. Systems that are designed to support resource sharing in this way are termed open distributed systems to emphasize the fact that they are extensible. They may be extended at the hardware level by the addition of computers to the network and at the software level by the introduction of new services and the reimplementation of old ones, enabling application programs to share resources. A further benefit that is often cited for open systems is their independence from individual vendors. To summarize: • Open systems are characterized by the fact that their key interfaces are published. • Open distributed systems are based on the provision of a uniform communication mechanism and published interfaces for access to shared resources. • Open distributed systems can be constructed from heterogeneous hardware and software, possibly from different vendors. But the conformance of each component to the published standard must be carefully tested and verified if the system is to work correctly. 1.5.3 Security Many of the information resources that are made available and maintained in distributed systems have a high intrinsic value to their users. Their security is therefore of considerable importance. Security for information resources has three components: confidentiality (protection against disclosure to unauthorized individuals), integrity (protection against alteration or corruption), and availability (protection against interference with the means to access the resources). Section 1.1 pointed out that although the Internet allows a program in one computer to communicate with a program in another computer irrespective of its SECTION 1.5 CHALLENGES 19 location, security risks are associated with allowing free access to all of the resources in an intranet. Although a firewall can be used to form a barrier around an intranet, restricting the traffic that can enter and leave, this does not deal with ensuring the appropriate use of resources by users within an intranet, or with the appropriate use of resources in the Internet, that are not protected by firewalls. In a distributed system, clients send requests to access data managed by servers, which involves sending information in messages over a network. For example: 1. A doctor might request access to hospital patient data or send additions to that data. 2. In electronic commerce and banking, users send their credit card numbers across the Internet. In both examples, the challenge is to send sensitive information in a message over a network in a secure manner. But security is not just a matter of concealing the contents of messages – it also involves knowing for sure the identity of the user or other agent on whose behalf a message was sent. In the first example, the server needs to know that the user is really a doctor, and in the second example, the user needs to be sure of the identity of the shop or bank with which they are dealing. The second challenge here is to identify a remote user or other agent correctly. Both of these challenges can be met by the use of encryption techniques developed for this purpose. They are used widely in the Internet and are discussed in Chapter 11. However, the following two security challenges have not yet been fully met: Denial of service attacks: Another security problem is that a user may wish to disrupt a service for some reason. This can be achieved by bombarding the service with such a large number of pointless requests that the serious users are unable to use it. This is called a denial of service attack. There have been several denial of service attacks on well-known web services. Currently such attacks are countered by attempting to catch and punish the perpetrators after the event, but that is not a general solution to the problem. Countermeasures based on improvements in the management of networks are under development, and these will be touched on in Chapter 3. Security of mobile code: Mobile code needs to be handled with care. Consider someone who receives an executable program as an electronic mail attachment: the possible effects of running the program are unpredictable; for example, it may seem to display an interesting picture but in reality it may access local resources, or perhaps be part of a denial of service attack. Some measures for securing mobile code are outlined in Chapter 11. 1.5.4 Scalability Distributed systems operate effectively and efficiently at many different scales, ranging from a small intranet to the Internet. A system is described as scalable if it will remain effective when there is a significant increase in the number of resources and the number of users. The number of computers and servers in the Internet has increased dramatically. Figure 1.6 shows the increasing number of computers and web servers during the 12-year history of the Web up to 2005 [zakon.org]. It is interesting to note the significant growth in both computers and web servers in this period, but also that the 20 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS relative percentage is flattening out – a trend that is explained by the growth of fixed and mobile personal computing. One web server may also increasingly be hosted on multiple computers. The design of scalable distributed systems presents the following challenges: Controlling the cost of physical resources: As the demand for a resource grows, it should be possible to extend the system, at reasonable cost, to meet it. For example, the frequency with which files are accessed in an intranet is likely to grow as the number of users and computers increases. It must be possible to add server computers to avoid the performance bottleneck that would arise if a single file server had to handle all file access requests. In general, for a system with n users to be scalable, the quantity of physical resources required to support them should be at most O(n) – that is, proportional to n. For example, if a single file server can support 20 users, then two such servers should be able to support 40 users. Although that sounds an obvious goal, it is not necessarily easy to achieve in practice, as we show in Chapter 12. Controlling the performance loss: Consider the management of a set of data whose size is proportional to the number of users or resources in the system – for example, the table with the correspondence between the domain names of computers and their Internet addresses held by the Domain Name System, which is used mainly to look up DNS names such as www.amazon.com. Algorithms that use hierarchic structures scale better than those that use linear structures. But even with hierarchic structures an increase in size will result in some loss in performance: the time taken to access hierarchically structured data is O(log n), where n is the size of the set of data. For a system to be scalable, the maximum performance loss should be no worse than this. Preventing software resources running out: An example of lack of scalability is shown by the numbers used as Internet (IP) addresses (computer addresses in the Internet). In the late 1970s, it was decided to use 32 bits for this purpose, but as will be explained in Chapter 3, the supply of available Internet addresses is running out. For this reason, a new version of the protocol with 128-bit Internet addresses is being adopted, and this will require modifications to many software components. To be fair Figure 1.6 Growth of the Internet (computers and web servers) Date Computers Web servers Percentage 1993, July 1,776,000 130 0.008 1995, July 6,642,000 23,500 0.4 1997, July 19,540,000 1,203,096 6 1999, July 56,218,000 6,598,697 12 2001, July 125,888,197 31,299,592 25 2003, July ~200,000,000 42,298,371 21 2005, July 353,284,187 67,571,581 19 SECTION 1.5 CHALLENGES 21 to the early designers of the Internet, there is no correct solution to this problem. It is difficult to predict the demand that will be put on a system years ahead. Moreover, overcompensating for future growth may be worse than adapting to a change when we are forced to – larger Internet addresses will occupy extra space in messages and in computer storage. Avoiding performance bottlenecks: In general, algorithms should be decentralized to avoid having performance bottlenecks. We illustrate this point with reference to the predecessor of the Domain Name System, in which the name table was kept in a single master file that could be downloaded to any computers that needed it. That was fine when there were only a few hundred computers in the Internet, but it soon became a serious performance and administrative bottleneck. The Domain Name System removed this bottleneck by partitioning the name table between servers located throughout the Internet and administered locally – see Chapters 3 and 13. Some shared resources are accessed very frequently; for example, many users may access the same web page, causing a decline in performance. We shall see in Chapter 2 that caching and replication may be used to improve the performance of resources that are very heavily used. Ideally, the system and application software should not need to change when the scale of the system increases, but this is difficult to achieve. The issue of scale is a dominant theme in the development of distributed systems. The techniques that have been successful are discussed extensively in this book. They include the use of replicated data (Chapter 18), the associated technique of caching (Chapters 2 and 12) and the deployment of multiple servers to handle commonly performed tasks, enabling several similar tasks to be performed concurrently. 1.5.5 Failure handling Computer systems sometimes fail. When faults occur in hardware or software, programs may produce incorrect results or may stop before they have completed the intended computation. We shall discuss and classify a range of possible failure types that can occur in the processes and networks that comprise a distributed system in Chapter 2. Failures in a distributed system are partial – that is, some components fail while others continue to function. Therefore the handling of failures is particularly difficult. The following techniques for dealing with failures are discussed throughout the book: Detecting failures: Some failures can be detected. For example, checksums can be used to detect corrupted data in a message or a file. Chapter 2 explains that it is difficult or even impossible to detect some other failures, such as a remote crashed server in the Internet. The challenge is to manage in the presence of failures that cannot be detected but may be suspected. Masking failures: Some failures that have been detected can be hidden or made less severe. Two examples of hiding failures: 1. Messages can be retransmitted when they fail to arrive. 2. File data can be written to a pair of disks so that if one is corrupted, the other may still be correct. 22 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS Just dropping a message that is corrupted is an example of making a fault less severe – it could be retransmitted. The reader will probably realize that the techniques described for hiding failures are not guaranteed to work in the worst cases; for example, the data on the second disk may be corrupted too, or the message may not get through in a reasonable time however often it is retransmitted. Tolerating failures: Most of the services in the Internet do exhibit failures – it would not be practical for them to attempt to detect and hide all of the failures that might occur in such a large network with so many components. Their clients can be designed to tolerate failures, which generally involves the users tolerating them as well. For example, when a web browser cannot contact a web server, it does not make the user wait for ever while it keeps on trying – it informs the user about the problem, leaving them free to try again later. Services that tolerate failures are discussed in the paragraph on redundancy below. Recovery from failures: Recovery involves the design of software so that the state of permanent data can be recovered or ‘rolled back’ after a server has crashed. In general, the computations performed by some programs will be incomplete when a fault occurs, and the permanent data that they update (files and other material stored in permanent storage) may not be in a consistent state. Recovery is described in Chapter 17. Redundancy: Services can be made to tolerate failures by the use of redundant components. Consider the following examples: 1. There should always be at least two different routes between any two routers in the Internet. 2. In the Domain Name System, every name table is replicated in at least two different servers. 3. A database may be replicated in several servers to ensure that the data remains accessible after the failure of any single server; the servers can be designed to detect faults in their peers; when a fault is detected in one server, clients are redirected to the remaining servers. The design of effective techniques for keeping replicas of rapidly changing data upto-date without excessive loss of performance is a challenge. Approaches are discussed in Chapter 18. Distributed systems provide a high degree of availability in the face of hardware faults. The availability of a system is a measure of the proportion of time that it is available for use. When one of the components in a distributed system fails, only the work that was using the failed component is affected. A user may move to another computer if the one that they were using fails; a server process can be started on another computer. 1.5.6 Concurrency Both services and applications provide resources that can be shared by clients in a distributed system. There is therefore a possibility that several clients will attempt to SECTION 1.5 CHALLENGES 23 access a shared resource at the same time. For example, a data structure that records bids for an auction may be accessed very frequently when it gets close to the deadline time. The process that manages a shared resource could take one client request at a time. But that approach limits throughput. Therefore services and applications generally allow multiple client requests to be processed concurrently. To make this more concrete, suppose that each resource is encapsulated as an object and that invocations are executed in concurrent threads. In this case it is possible that several threads may be executing concurrently within an object, in which case their operations on the object may conflict with one another and produce inconsistent results. For example, if two concurrent bids at an auction are ‘Smith: $122’ and ‘Jones: $111’, and the corresponding operations are interleaved without any control, then they might get stored as ‘Smith: $111’ and ‘Jones: $122’. The moral of this story is that any object that represents a shared resource in a distributed system must be responsible for ensuring that it operates correctly in a concurrent environment. This applies not only to servers but also to objects in applications. Therefore any programmer who takes an implementation of an object that was not intended for use in a distributed system must do whatever is necessary to make it safe in a concurrent environment. For an object to be safe in a concurrent environment, its operations must be synchronized in such a way that its data remains consistent. This can be achieved by standard techniques such as semaphores, which are used in most operating systems. This topic and its extension to collections of distributed shared objects are discussed in Chapters 7 and 17. 1.5.7 Transparency Transparency is defined as the concealment from the user and the application programmer of the separation of components in a distributed system, so that the system is perceived as a whole rather than as a collection of independent components. The implications of transparency are a major influence on the design of the system software. The ANSA Reference Manual [ANSA 1989] and the International Organization for Standardization’s Reference Model for Open Distributed Processing (RM-ODP) [ISO 1992] identify eight forms of transparency. We have paraphrased the original ANSA definitions, replacing their migration transparency with our own mobility transparency, whose scope is broader: Access transparency enables local and remote resources to be accessed using identical operations. Location transparency enables resources to be accessed without knowledge of their physical or network location (for example, which building or IP address). Concurrency transparency enables several processes to operate concurrently using shared resources without interference between them. 24 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS Replication transparency enables multiple instances of resources to be used to increase reliability and performance without knowledge of the replicas by users or application programmers. Failure transparency enables the concealment of faults, allowing users and application programs to complete their tasks despite the failure of hardware or software components. Mobility transparency allows the movement of resources and clients within a system without affecting the operation of users or programs. Performance transparency allows the system to be reconfigured to improve performance as loads vary. Scaling transparency allows the system and applications to expand in scale without change to the system structure or the application algorithms. The two most important transparencies are access and location transparency; their presence or absence most strongly affects the utilization of distributed resources. They are sometimes referred to together as network transparency. As an illustration of access transparency, consider a graphical user interface with folders, which is the same whether the files inside the folder are local or remote. Another example is an API for files that uses the same operations to access both local and remote files (see Chapter 12). As an example of a lack of access transparency, consider a distributed system that does not allow you to access files on a remote computer unless you make use of the ftp program to do so. Web resource names or URLs are location-transparent because the part of the URL that identifies a web server domain name refers to a computer name in a domain, rather than to an Internet address. However, URLs are not mobility-transparent, because someone’s personal web page cannot move to their new place of work in a different domain – all of the links in other pages will still point to the original page. In general, identifiers such as URLs that include the domain names of computers prevent replication transparency. Although the DNS allows a domain name to refer to several computers, it picks just one of them when it looks up a name. Since a replication scheme generally needs to be able to access all of the participating computers, it would need to access each of the DNS entries by name. As an illustration of the presence of network transparency, consider the use of an electronic mail address such as Fred.Flintstone@stoneit.com. The address consists of a user’s name and a domain name. Sending mail to such a user does not involve knowing their physical or network location. Nor does the procedure to send an email message depend upon the location of the recipient. Thus electronic mail within the Internet provides both location and access transparency (that is, network transparency). Failure transparency can also be illustrated in the context of electronic mail, which is eventually delivered, even when servers or communication links fail. The faults are masked by attempting to retransmit messages until they are successfully delivered, even if it takes several days. Middleware generally converts the failures of networks and processes into programming-level exceptions (see Chapter 5 for an explanation). To illustrate mobility transparency, consider the case of mobile phones. Suppose that both caller and callee are travelling by train in different parts of a country, moving SECTION 1.5 CHALLENGES 25 from one environment (cell) to another. We regard the caller’s phone as the client and the callee’s phone as a resource. The two phone users making the call are unaware of the mobility of the phones (the client and the resource) between cells. Transparency hides and renders anonymous the resources that are not of direct relevance to the task in hand for users and application programmers. For example, it is generally desirable for similar hardware resources to be allocated interchangeably to perform a task – the identity of a processor used to execute a process is generally hidden from the user and remains anonymous. As pointed out in Section 1.3.2, this may not always be what is required: for example, a traveller who attaches a laptop computer to the local network in each office visited should make use of local services such as the send mail service, using different servers at each location. Even within a building, it is normal to arrange for a document to be printed at a particular, named printer: usually one that is near to the user. 1.5.8 Quality of service Once users are provided with the functionality that they require of a service, such as the file service in a distributed system, we can go on to ask about the quality of the service provided. The main nonfunctional properties of systems that affect the quality of the service experienced by clients and users are reliability, security and performance. Adaptability to meet changing system configurations and resource availability has been recognized as a further important aspect of service quality. Reliability and security issues are critical in the design of most computer systems. The performance aspect of quality of service was originally defined in terms of responsiveness and computational throughput, but it has been redefined in terms of ability to meet timeliness guarantees, as discussed in the following paragraphs. Some applications, including multimedia applications, handle time-critical data – streams of data that are required to be processed or transferred from one process to another at a fixed rate. For example, a movie service might consist of a client program that is retrieving a film from a video server and presenting it on the user’s screen. For a satisfactory result the successive frames of video need to be displayed to the user within some specified time limits. In fact, the abbreviation QoS has effectively been commandeered to refer to the ability of systems to meet such deadlines. Its achievement depends upon the availability of the necessary computing and network resources at the appropriate times. This implies a requirement for the system to provide guaranteed computing and communication resources that are sufficient to enable applications to complete each task on time (for example, the task of displaying a frame of video). The networks commonly used today have high performance – for example, BBC iPlayer generally performs acceptably – but when networks are heavily loaded their performance can deteriorate, and no guarantees are provided. QoS applies to operating systems as well as networks. Each critical resource must be reserved by the applications that require QoS, and there must be resource managers that provide guarantees. Reservation requests that cannot be met are rejected. These issues will be addressed further in Chapter 20. 26 CHAPTER 1 CHARACTERIZATION OF DISTRIBUTED SYSTEMS 1.6 Case study: The World Wide Web The World Wide Web [www.w3.org I, Berners-Lee 1991] is an evolving system for publishing and accessing resources and services across the Internet. Through commonly available web browsers, users retrieve and view documents of many types, listen to audio streams and view video streams, and interact with an unlimited set of services. The Web began life at the European centre for nuclear research (CERN), Switzerland, in 1989 as a vehicle for exchanging documents between a community of physicists connected by the Internet [Berners-Lee 1999]. A key feature of the Web is that it provides a hypertext structure among the documents that it stores, reflecting the users’ requirement to organize their knowledge. This means that documents contain links (or hyperlinks) – references to other documents and resources that are also stored in the Web. It is fundamental to the user’s experience of the Web that when they encounter a given image or piece of text within a document, this will frequently be accompanied by links to related documents and other resources. The structure of links can be arbitrarily complex and the set of resources that can be added is unlimited – the ‘web’ of links is indeed world-wide. Bush [1945] conceived of hypertextual structures over 50 years ago; it was with the development of the Internet that this idea could be manifested on a world-wide scale. The Web is an open system: it can be extended and implemented in new ways without disturbing its existing functionality (see Section 1.5.2). First, its operation is based on communication standards and document or content standards that are freely published and widely implemented. For example, there are many types of browser, each in many cases implemented on several platforms; and there are many implementations of web servers. Any conformant browser can retrieve resources from any conformant server. So users have access to browsers on the majority of the devices that they use, from mobile phones to desktop computers. Second, the Web is open with respect to the types of resource that can be published and shared on it. At its simplest, a resource on the Web is a web page or some other type of content that can be presented to the user, such as media files and documents in Portable Document Format. If somebody invents, say, a new image-storage format, then images in this format can immediately be published on the Web. Users require a means of viewing images in this new format, but browsers are designed to accommodate new content-presentation functionality in the form of ‘helper’ applications and ‘plug-ins’. The Web has moved beyond these simple data resources to encompass services, such as electronic purchasing of goods. It has evolved without changing its basic architecture. The Web is based on three main standard technological components: • the HyperText Markup Language (HTML), a language for specifying the contents and layout of pages as they are displayed by web browsers; • Uniform Resource Locators (URLs), also known as Uniform Resource Identifiers (URIs), which identify documents and other resources stored as part of the Web; • a client-server system architecture, with standard rules for interaction (...
Purchase answer to see full attachment

Tags: IT Distributed systems

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

hi, here is the assignment ready, kindly go through and let me know in case of anything. Otherwise bye for now

Surname 1
Surname

Institution
Tutor’s name

Date

1) Describe the issues of data representation arising from inter-process communication
on disparate platforms (4)
i.

The API for the internet protocol-Application programming interface is a set of
protocols of building an application software. A good API provides all the
necessary building block for building a computer program. Internet transmission
protocols provide two protocols from which applications may be built. The two
protocols are the UDP and TCP. The UDP provides a simple message –passing
facility that suffers from omission failures but carries no built in performance
penalty. The TCP on the other hand guarantees message delivery.

ii.

External data representat...