operation system

Content Type

User Generated

User

Fxlybir

Subject

Programming

Description

Do the following problems at the end of Chapter 5 in the textbook: 2,4,5*,14,16,18*,23,31
Do the following problems at the end of Chapter 6 in the textbook: 1,5,6*,10,14,21,34

Unformatted Attachment Preview

5 INPUT/OUTPUT In addition to providing abstractions such as processes, address spaces, and files, an operating system also controls all the computer’s I/O (Input/Output) de- vices. It must issue commands to the devices, catch interrupts, and handle errors. It should also provide an interface between the devices and the rest of the system that is simple and easy to use. To the extent possible, the interface should be the same for all devices (device independence). The I/O code represents a significant fraction of the total operating system. How the operating system manages I/O is the subject of this chapter. This chapter is organized as follows. We will look first at some of the prin- ciples of I/O hardware and then at I/O software in general. I/O software can be structured in layers, with each having a well-defined task. We will look at these layers to see what they do and how they fit together. Next, we will look at several I/O devices in detail: disks, clocks, keyboards, and displays. For each device we will look at its hardware and software. Finally, we will consider power management. 5.1 PRINCIPLES OF I/O HARDWARE Different people look at I/O hardware in different ways. Electrical engineers look at it in terms of chips, wires, power supplies, motors, and all the other physi- cal components that comprise the hardware. Programmers look at the interface 337 338 INPUT/OUTPUT CHAP. 5 presented to the software—the commands the hardware accepts, the functions it carries out, and the errors that can be reported back. In this book we are concerned with programming I/O devices, not designing, building, or maimtaining them, so our interest is in how the hardware is programmed, not how it works inside. Never- theless, the programming of many I/O devices is often intimately connected with their internal operation. In the next three sections we will provide a little general background on I/O hardware as it relates to programming. It may be regarded as a review and expansion of the introductory material in Sec. 1.3. 5.1.1 I/O Devices I/O devices can be roughly divided into two categories: block devices and character devices. A block device is one that stores information in fixed-size blocks, each one with its own address. Common block sizes range from 512 to 65,536 bytes. All transfers are in units of one or more entire (consecutive) blocks. The essential property of a block device is that it is possible to read or write each block independently of all the other ones. Hard disks, Blu-ray discs, and USB sticks are common block devices. If you look very closely, the boundary between devices that are block address- able and those that are not is not well defined. Everyone agrees that a disk is a block addressable device because no matter where the arm currently is, it is always possible to seek to another cylinder and then wait for the required block to rotate under the head. Now consider an old-fashioned tape drive still used, sometimes, for making disk backups (because tapes are cheap). Tapes contain a sequence of blocks. If the tape drive is given a command to read block N, it can always rewind the tape and go forward until it comes to block N. This operation is analogous to a disk doing a seek, except that it takes much longer. Also, it may or may not be pos- sible to rewrite one block in the middle of a tape. Even if it were possible to use tapes as random access block devices, that is stretching the point somewhat: they are normally not used that way. The other type of I/O device is the character device. A character device deliv- ers or accepts a stream of characters, without regard to any block structure. It is not addressable and does not have any seek operation. Printers, network interfaces, mice (for pointing), rats (for psychology lab experiments), and most other devices that are not disk-like can be seen as character devices. This classification scheme is not perfect. Some devices do not fit in. Clocks, for example, are not block addressable. Nor do they generate or accept character streams. All they do is cause interrupts at welldefined intervals. Memory-mapped screens do not fit the model well either. Nor do touch screens, for that matter. Still, the model of block and character devices is general enough that it can be used as a basis for making some of the operating system software dealing with I/O device in- dependent. The file system, for example, deals just with abstract block devices and leaves the device-dependent part to lower-level software. SEC. 5.1 PRINCIPLES OF I/O HARDWARE 339 I/O devices cover a huge range in speeds, which puts considerable pressure on the software to perform well over many orders of magnitude in data rates. Figure 5-1 shows the data rates of some common devices. Most of these devices tend to get faster as time goes on. Device Data rate Keyboard 10 bytes/sec Mouse 100 bytes/sec 56K modem 7 KB/sec Scanner at 300 dpi 1 MB/sec Digital camcorder 3.5 MB/sec 4x Blu-ray disc 18 MB/sec 802.11n Wireless 37.5 MB/sec USB 2.0 60 MB/sec FireWire 800 100 MB/sec Gigabit Ethernet 125 MB/sec SATA 3 disk drive 600 MB/sec USB 3.0 625 MB/sec SCSI Ultra 5 bus 640 MB/sec Single-lane PCIe 3.0 bus 985 MB/sec Thunderbolt 2 bus 2.5 GB/sec SONET OC-768 network 5 GB/sec Figure 5-1. Some typical device, network, and bus data rates. 5.1.2 Device Controllers I/O units often consist of a mechanical component and an electronic compo- nent. It is possible to separate the two portions to provide a more modular and general design. The electronic component is called the device controller or adapter. On personal computers, it often takes the form of a chip on the par- entboard or a printed circuit card that can be inserted into a (PCIe) expansion slot. The mechanical component is the device itself. This arrangement is shown in Fig. 1-6. The controller card usually has a connector on it, into which a cable leading to the device itself can be plugged. Many controllers can handle two, four, or even eight identical devices. If the interface between the controller and device is a stan- dard interface, either an official ANSI, IEEE, or ISO standard or a de facto one, then companies can make controllers or devices that fit that interface. Many com- panies, for example, make disk drives that match the SATA, SCSI, USB, Thunder- bolt, or FireWire (IEEE 1394) interfaces. 340 INPUT/OUTPUT CHAP. 5 The interface between the controller and the device is often a very low-level one. A disk, for example, might be formatted with 2,000,000 sectors of 512 bytes per track. What actually comes off the drive, however, is a serial bit stream, start- ing with a preamble, then the 4096 bits in a sector, and finally a checksum, or ECC (Error-Correcting Code). The preamble is written when the disk is for- matted and contains the cylinder and sector number, the sector size, and similar data, as well as synchronization information. The controller’s job is to convert the serial bit stream into a block of bytes and perform any error correction necessary. The block of bytes is typically first assem- bled, bit by bit, in a buffer inside the controller. After its checksum has been veri- fied and the block has been declared to be error free, it can then be copied to main memory. The controller for an LCD display monitor also works as a bit serial device at an equally low level. It reads bytes containing the characters to be displayed from memory and generates the signals to modify the polarization of the backlight for the corresponding pixels in order to write them on screen. If it were not for the display controller, the operating system programmer would have to explicitly pro- gram the electric fields of all pixels. With the controller, the operating system ini- tializes the controller with a few parameters, such as the number of characters or pixels per line and number of lines per screen, and lets the controller take care of actually driving the electric fields. In a very short time, LCD screens have completely replaced the old CRT (Cathode Ray Tube) monitors. CRT monitors fire a beam of electrons onto a flu- orescent screen. Using magnetic fields, the system is able to bend the beam and draw pixels on the screen. Compared to LCD screens, CRT monitors were bulky, power hungry, and fragile. Moreover, the resolution on today ́s (Retina) LCD screens is so good that the human eye is unable to distinguish individual pixels. It is hard to imagine today that laptops in the past came with a small CRT screen that made them more than 20 cm deep with a nice work-out weight of around 12 kilos. 5.1.3 Memory-Mapped I/O Each controller has a few registers that are used for communicating with the CPU. By writing into these registers, the operating system can command the de- vice to deliver data, accept data, switch itself on or off, or otherwise perform some action. By reading from these registers, the operating system can learn what the device’s state is, whether it is prepared to accept a new command, and so on. In addition to the control registers, many devices have a data buffer that the op- erating system can read and write. For example, a common way for computers to display pixels on the screen is to have a video RAM, which is basically just a data buffer, available for programs or the operating system to write into. The issue thus arises of how the CPU communicates with the control registers and also with the device data buffers. Two alternatives exist. In the first approach, SEC. 5.1 PRINCIPLES OF I/O HARDWARE 341 each control register is assigned an I/O port number, an 8- or 16-bit integer. The set of all the I/O ports form the I/O port space, which is protected so that ordinary user programs cannot access it (only the operating system can). Using a special I/O instruction such as IN REG,PORT, the CPU can read in control register PORT and store the result in CPU register REG. Similarly, using OUT PORT,REG the CPU can write the contents of REG to a control register. Most early computers, including nearly all mainframes, such as the IBM 360 and all of its successors, worked this way. In this scheme, the address spaces for memory and I/O are different, as shown in Fig. 5-2(a). The instructions IN R0,4 and MOV R0,4 are completely different in this design. The former reads the contents of I/O port 4 and puts it in R0 whereas the latter reads the contents of memory word 4 and puts it in R0. The 4s in these examples refer to different and unrelated address spaces. Two address 0xFFFF... 0 (a) Figure 5-2. (a) Separate I/O and memory space. (b) Memory-mapped I/O. (c) Hybrid. One address space Two address spaces Memory I/O ports The second approach, introduced with the PDP-11, is to map all the control registers into the memory space, as shown in Fig. 5-2(b). Each control register is assigned a unique memory address to which no memory is assigned. This system is called memory-mapped I/O. In most systems, the assigned addresses are at or near the top of the address space. A hybrid scheme, with memory-mapped I/O data buffers and separate I/O ports for the control registers, is shown in Fig. 5-2(c). (b) (c) 342 INPUT/OUTPUT CHAP. 5 The x86 uses this architecture, with addresses 640K to 1M 1 being reserved for device data buffers in IBM PC compatibles, in addition to I/O ports 0 to 64K 1. How do these schemes actually work in practice? In all cases, when the CPU wants to read a word, either from memory or from an I/O port, it puts the address it needs on the bus’ address lines and then asserts a READ signal on a bus’ control line. A second signal line is used to tell whether I/O space or memory space is needed. If it is memory space, the memory responds to the request. If it is I/O space, the I/O device responds to the request. If there is only memory space [as in Fig. 5-2(b)], every memory module and every I/O device compares the address lines to the range of addresses that it services. If the address falls in its range, it re- sponds to the request. Since no address is ever assigned to both memory and an I/O device, there is no ambiguity and no conflict. These two schemes for addressing the controllers have different strengths and weaknesses. Let us start with the advantages of memory-mapped I/O. Firstof all, if special I/O instructions are needed to read and write the device control registers, access to them requires the use of assembly code since there is no way to execute an IN or OUT instruction in C or C++. Calling such a procedure adds overhead to controlling I/O. In contrast, with memory-mapped I/O, device control registers are just variables in memory and can be addressed in C the same way as any other var- iables. Thus with memory-mapped I/O, an I/O device driver can be written entirely in C. Without memory-mapped I/O, some assembly code is needed. Second, with memory-mapped I/O, no special protection mechanism is needed to keep user processes from performing I/O. All the operating system has to do is refrain from putting that portion of the address space containing the control regis- ters in any user’s virtual address space. Better yet, if each device has its control registers on a different page of the address space, the operating system can give a user control over specific devices but not others by simply including the desired pages in its page table. Such a scheme can allow different device drivers to be placed in different address spaces, not only reducing kernel size but also keeping one driver from interfering with others. Third, with memory-mapped I/O, every instruction that can reference memory can also reference control registers. For example, if there is an instruction, TEST, that tests a memory word for 0, it can also be used to test a control register for 0, which might be the signal that the device is idle and can accept a new command. The assembly language code might look like this: // check if port 4 is 0 // if it is 0, go to ready // otherwise, continue testing LOOP: TEST PORT 4 BEQ READY BRANCH LOOP READY: If memory-mapped I/O is not present, the control register must first be read into the CPU, then tested, requiring two instructions instead of just one. In the case of SEC. 5.1 PRINCIPLES OF I/O HARDWARE 343 the loop given above, a fourth instruction has to be added, slightly slowing down the responsiveness of detecting an idle device. In computer design, practically everything involves trade-offs, and that is the case here, too. Memorymapped I/O also has its disadvantages. First, most com- puters nowadays have some form of caching of memory words. Caching a device control register would be disastrous. Consider the assembly-code loop given above in the presence of caching. The first reference to PORT 4 would cause it to be cached. Subsequent references would just take the value from the cache and not even ask the device. Then when the device finally became ready, the software would have no way of finding out. Instead, the loop would go on forever. To prevent this situation with memory-mapped I/O, the hardware has to be able to selectively disable caching, for example, on a per-page basis. This feature adds extra complexity to both the hardware and the operating system, which has to man- age the selective caching. Second, if there is only one address space, then all memory modules and all I/O devices must examine all memory references to see which ones to respond to. If the computer has a single bus, as in Fig. 5-3(a), having everyone look at every address is straightforward. CPU reads and writes of memory go over this high-bandwidth bus This memory port is to allow I/O devices access to memory (b) Figure 5-3. (a) A single-bus architecture. (b) A dual-bus memory architecture. However, the trend in modern personal computers is to have a dedicated high- speed memory bus, as shown in Fig. 5-3(b). The bus is tailored to optimize memo- ry performance, with no compromises for the sake of slow I/O devices. x86 sys- tems can have multiple buses (memory, PCIe, SCSI, and USB), as shown in Fig. 1-12. The trouble with having a separate memory bus on memory-mapped machines is that the I/O devices have no way of seeing memory addresses as they go by on the memory bus, so they have no way of responding to them. Again, special meas- ures have to be taken to make memory-mapped I/O work on a system with multiple CPU Memory I/O CPU Memory I/O All addresses (memory and I/O) go here (a) Bus 344 INPUT/OUTPUT CHAP. 5 buses. One possibility is to first send all memory references to the memory. If the memory fails to respond, then the CPU tries the other buses. This design can be made to work but requires additional hardware complexity. A second possible design is to put a snooping device on the memory bus to pass all addresses presented to potentially interested I/O devices. The problem here is that I/O devices may not be able to process requests at the speed the memory can. A third possible design, and one that would well match the design sketched in Fig. 1-12, is to filter addresses in the memory controller. In that case, the memory controller chip contains range registers that are preloaded at boot time. For ex- ample, 640K to 1M 1 could be marked as a nonmemory range. Addresses that fall within one of the ranges marked as nonmemory are forwarded to devices in- stead of to memory. The disadvantage of this scheme is the need for figuring out at boot time which memory addresses are not really memory addresses. Thus each scheme has arguments for and against it, so compromises and trade-offs are inevitable. 5.1.4 Direct Memory Access No matter whether a CPU does or does not have memory-mapped I/O, it needs to address the device controllers to exchange data with them. The CPU can request data from an I/O controller one byte at a time, but doing so wastes the CPU’s time, so a different scheme, called DMA (Direct Memory Access) is often used. To simplify the explanation, we assume that the CPU accesses all devices and memory via a single system bus that connects the CPU, the memory, and the I/O devices, as shown in Fig. 5-4. We already know that the real organization in modern systems is more complicated, but all the principles are the same. The operating system can use only DMA if the hardware has a DMA controller, which most systems do. Sometimes this controller is integrated into disk controllers and other controllers, but such a design requires a separate DMA controller for each device. More com- monly, a single DMA controller is available (e.g., on the parentboard) for regulat- ing transfers to multiple devices, often concurrently. No matter where it is physically located, the DMA controller has access to the system bus independent of the CPU, as shown in Fig. 5-4. It contains several reg- isters that can be written and read by the CPU. These include a memory address register, a byte count register, and one or more control registers. The control regis- ters specify the I/O port to use, the direction of the transfer (reading from the I/O device or writing to the I/O device), the transfer unit (byte at a time or word at a time), and the number of bytes to transfer in one burst. To explain how DMA works, let us first look at how disk reads occur when DMA is not used. First the disk controller reads the block (one or more sectors) from the drive serially, bit by bit, until the entire block is in the controller’s internal buffer. Next, it computes the checksum to verify that no read errors have occurred. SEC. 5.1 PRINCIPLES OF I/O HARDWARE 345 CPU 1. CPU programs the DMA controller Interrupt when done DMA controller Disk controller Drive Buffer Main memory Address Count Control 4. Ack 2. DMA requests transfer to memory 3. Data transferred Bus Figure 5-4. Operation of a DMA transfer. Then the controller causes an interrupt. When the operating system starts running, it can read the disk block from the controller’s buffer a byte or a word at a time by executing a loop, with each iteration reading one byte or word from a controller de- vice register and storing it in main memory. When DMA is used, the procedure is different. First the CPU programs the DMA controller by setting its registers so it knows what to transfer where (step 1 in Fig. 5-4). It also issues a command to the disk controller telling it to read data from the disk into its internal buffer and verify the checksum. When valid data are in the disk controller’s buffer, DMA can begin. The DMA controller initiates the transfer by issuing a read request over the bus to the disk controller (step 2). This read request looks like any other read request, and the disk controller does not know (or care) whether it came from the CPU or from a DMA controller. Typically, the memory address to write to is on the bus’ address lines, so when the disk controller fetches the next word from its internal buffer, it knows where to write it. The write to memory is another standard bus cycle (step 3). When the write is complete, the disk controller sends an acknowl- edgement signal to the DMA controller, also over the bus (step 4). The DMA con- troller then increments the memory address to use and decrements the byte count. If the byte count is still greater than 0, steps 2 through 4 are repeated until the count reaches 0. At that time, the DMA controller interrupts the CPU to let it know that the transfer is now complete. When the operating system starts up, it does not have to copy the disk block to memory; it is already there. DMA controllers vary considerably in their sophistication. The simplest ones handle one transfer at a time, as described above. More complex ones can be pro- grammed to handle multiple transfers at the same time. Such controllers have mul- tiple sets of registers internally, one for each channel. The CPU starts by loading each set of registers with the relevant parameters for its transfer. Each transfer must 346 INPUT/OUTPUT CHAP. 5 use a different device controller. After each word is transferred (steps 2 through 4) in Fig. 5-4, the DMA controller decides which device to service next. It may be set up to use a round-robin algorithm, or it may have a priority scheme design to favor some devices over others. Multiple requests to different device controllers may be pending at the same time, provided that there is an unambiguous way to tell the acknowledgements apart. Often a different acknowledgement line on the bus is used for each DMA channel for this reason. Many buses can operate in two modes: word-at-a-time mode and block mode. Some DMA controllers can also operate in either mode. In the former mode, the operation is as described above: the DMA controller requests the transfer of one word and gets it. If the CPU also wants the bus, it has to wait. The mechanism is called cycle stealing because the device controller sneaks in and steals an occa- sional bus cycle from the CPU once in a while, delaying it slightly. In block mode, the DMA controller tells the device to acquire the bus, issue a series of transfers, then release the bus. This form of operation is called burst mode. It is more ef- ficient than cycle stealing because acquiring the bus takes time and multiple words can be transferred for the price of one bus acquisition. The down side to burst mode is that it can block the CPU and other devices for a substantial period if a long burst is being transferred. In the model we have been discussing, sometimes called fly-by mode, the DMA controller tells the device controller to transfer the data directly to main memory. An alternative mode that some DMA controllers use is to have the device controller send the word to the DMA controller, which then issues a second bus re- quest to write the word to wherever it is supposed to go. This scheme requires an extra bus cycle per word transferred, but is more flexible in that it can also perform device-to-device copies and even memory-to-memory copies (by first issuing a read to memory and then issuing a write to memory at a different address). Most DMA controllers use physical memory addresses for their transfers. Using physical addresses requires the operating system to convert the virtual ad- dress of the intended memory buffer into a physical address and write this physical address into the DMA controller’s address register. An alternative scheme used in a few DMA controllers is to write virtual addresses into the DMA controller in- stead. Then the DMA controller must use the MMU to have the virtual-to-physical translation done. Only in the case that the MMU is part of the memory (possible, but rare), rather than part of the CPU, can virtual addresses be put on the bus. We mentioned earlier that the disk first reads data into its internal buffer before DMA can start. You may be wondering why the controller does not just store the bytes in main memory as soon as it gets them from the disk. In other words, why does it need an internal buffer? There are two reasons. First, by doing internal buffering, the disk controller can verify the checksum before starting a transfer. If the checksum is incorrect, an error is signaled and no transfer is done. The second reason is that once a disk transfer has started, the bits keep arriving from the disk at a constant rate, whether the controller is ready for them or not. If SEC. 5.1 PRINCIPLES OF I/O HARDWARE 347 the controller tried to write data directly to memory, it would have to go over the system bus for each word transferred. If the bus were busy due to some other de- vice using it (e.g., in burst mode), the controller would have to wait. If the next disk word arrived before the previous one had been stored, the controller would have to store it somewhere. If the bus were very busy, the controller might end up storing quite a few words and having a lot of administration to do as well. When the block is buffered internally, the bus is not needed until the DMA begins, so the design of the controller is much simpler because the DMA transfer to memory is not time critical. (Some older controllers did, in fact, go directly to memory with only a small amount of internal buffering, but when the bus was very busy, a trans- fer might have had to be terminated with an overrun error.) Not all computers use DMA. The argument against it is that the main CPU is often far faster than the DMA controller and can do the job much faster (when the limiting factor is not the speed of the I/O device). If there is no other work for it to do, having the (fast) CPU wait for the (slow) DMA controller to finish is pointless. Also, getting rid of the DMA controller and having the CPU do all the work in software saves money, important on low-end (embedded) computers. 5.1.5 Interrupts Revisited We briefly introduced interrupts in Sec. 1.3.4, but there is more to be said. In a typical personal computer system, the interrupt structure is as shown in Fig. 5-5. At the hardware level, interrupts work as follows. When an I/O device has finished the work given to it, it causes an interrupt (assuming that interrupts have been enabled by the operating system). It does this by asserting a signal on a bus line that it has been assigned. This signal is detected by the interrupt controller chip on the parentboard, which then decides what to do. CPU 3. CPU acks interrupt 2. Controller issues interrupt Interrupt controller 1. Device is finished Disk 11 1 12 10 2 Keyboard Printer 93 Clock 84 6 7 5 Figure 5-5. How an interrupt happens. The connections between the devices and the controller actually use interrupt lines on the bus rather than dedicated wires. If no other interrupts are pending, the interrupt controller handles the interrupt immediately. However, if another interrupt is in progress, or another device has made a simultaneous request on a higher-priority interrupt request line on the bus, Bus 348 INPUT/OUTPUT CHAP. 5 the device is just ignored for the moment. In this case it continues to assert an in- terrupt signal on the bus until it is serviced by the CPU. To handle the interrupt, the controller puts a number on the address lines speci- fying which device wants attention and asserts a signal to interrupt the CPU. The interrupt signal causes the CPU to stop what it is doing and start doing something else. The number on the address lines is used as an index into a table called the interrupt vector to fetch a new program counter. This program counter points to the start of the corresponding interrupt-service procedure. Typically traps and interrupts use the same mechanism from this point on, often sharing the same interrupt vector. The location of the interrupt vector can be hardwired into the ma- chine or it can be anywhere in memory, with a CPU register (loaded by the operat- ing system) pointing to its origin. Shortly after it starts running, the interrupt-service procedure acknowledges the interrupt by writing a certain value to one of the interrupt controller’s I/O ports. This acknowledgement tells the controller that it is free to issue another interrupt. By having the CPU delay this acknowledgement until it is ready to handle the next interrupt, race conditions involving multiple (almost simultaneous) interrupts can be avoided. As an aside, some (older) computers do not have a centralized inter- rupt controller, so each device controller requests its own interrupts. The hardware always saves certain information before starting the service pro- cedure. Which information is saved and where it is saved varies greatly from CPU to CPU. As a bare minimum, the program counter must be saved, so the inter- rupted process can be restarted. At the other extreme, all the visible registers and a large number of internal registers may be saved as well. One issue is where to save this information. One option is to put it in internal registers that the operating system can read out as needed. A problem with this ap- proach is that then the interrupt controller cannot be acknowledged until all poten- tially relevant information has been read out, lest a second interrupt overwrite the internal registers saving the state. This strategy leads to long dead times when in- terrupts are disabled and possibly to lost interrupts and lost data. Consequently, most CPUs save the information on the stack. However, this ap- proach, too, has problems. To start with: whose stack? If the current stack is used, it may well be a user process stack. The stack pointer may not even be legal, which would cause a fatal error when the hardware tried to write some words at the ad- dress pointed to. Also, it might point to the end of a page. After several memory writes, the page boundary might be exceeded and a page fault generated. Having a page fault occur during the hardware interrupt processing creates a bigger problem: where to save the state to handle the page fault? If the kernel stack is used, there is a much better chance of the stack pointer being legal and pointing to a pinned page. However, switching into kernel mode may require changing MMU contexts and will probably invalidate most or all of the cache and TLB. Reloading all of these, statically or dynamically, will increase the time to process an interrupt and thus waste CPU time. SEC. 5.1 PRINCIPLES OF I/O HARDWARE 349 Precise and Imprecise Interrupts Another problem is caused by the fact that most modern CPUs are heavily pipelined and often superscalar (internally parallel). In older systems, after each instruction was finished executing, the microprogram or hardware checked to see if there was an interrupt pending. If so, the program counter and PSW were pushed onto the stack and the interrupt sequence begun. After the interrupt handler ran, the reverse process took place, with the old PSW and program counter popped from the stack and the previous process continued. This model makes the implicit assumption that if an interrupt occurs just after some instruction, all the instructions up to and including that instruction have been executed completely, and no instructions after it have executed at all. On older ma- chines, this assumption was always valid. On modern ones it may not be. For starters, consider the pipeline model of Fig. 1-7(a). What happens if an in- terrupt occurs while the pipeline is full (the usual case)? Many instructions are in various stages of execution. When the interrupt occurs, the value of the program counter may not reflect the correct boundary between executed instructions and nonexecuted instructions. In fact, many instructions may have been partially ex- ecuted, with different instructions being more or less complete. In this situation, the program counter most likely reflects the address of the next instruction to be fetched and pushed into the pipeline rather than the address of the instruction that just was processed by the execution unit. On a superscalar machine, such as that of Fig. 1-7(b), things are even worse. Instructions may be decomposed into micro-operations and the micro-operations may execute out of order, depending on the availability of internal resources such as functional units and registers. At the time of an interrupt, some instructions started long ago may not have started and others started more recently may be al- most done. At the point when an interrupt is signaled, there may be many instruc- tions in various states of completeness, with less relation between them and the program counter. An interrupt that leaves the machine in a well-defined state is called a precise interrupt (Walker and Cragon, 1995). Such an interrupt has four properties: 1. 2. 3. 4. The PC (Program Counter) is saved in a known place. All instructions before the one pointed to by the PC have completed. No instruction beyond the one pointed to by the PC has finished. The execution state of the instruction pointed to by the PC is known. Note that there is no prohibition on instructions beyond the one pointed to by the PC from starting. It is just that any changes they make to registers or memory must be undone before the interrupt happens. It is permitted that the instruction pointed to has been executed. It is also permitted that it has not been executed. 350 INPUT/OUTPUT CHAP. 5 However, it must be clear which case applies. Often, if the interrupt is an I/O inter- rupt, the instruction will not yet have started. However, if the interrupt is really a trap or page fault, then the PC generally points to the instruction that caused the fault so it can be restarted later. The situation of Fig. 5-6(a) illustrates a precise in- terrupt. All instructions up to the program counter (316) have completed and none of those beyond it have started (or have been rolled back to undo their effects). 332 328 324 320 PC 316 312 308 304 300 (a) PC 332 328 324 320 316 312 308 304 300 Not executed Not executed Not executed Not executed Fully executed Fully executed Fully executed Fully executed Not executed 10% executed 40% executed 35% executed 20% executed 60% executed 80% executed Fully executed Figure 5-6. (a) A precise interrupt. (b) An imprecise interrupt. An interrupt that does not meet these requirements is called an imprecise int- errupt and makes life most unpleasant for the operating system writer, who now has to figure out what has happened and what still has to happen. Fig. 5-6(b) illus- trates an imprecise interrupt, where different instructions near the program counter are in different stages of completion, with older ones not necessarily more com- plete than younger ones. Machines with imprecise interrupts usually vomit a large amount of internal state onto the stack to give the operating system the possibility of figuring out what was going on. The code necessary to restart the machine is typically exceedingly complicated. Also, saving a large amount of information to memory on every interrupt makes interrupts slow and recovery even worse. This leads to the ironic situation of having very fast superscalar CPUs sometimes being unsuitable for real-time work due to slow interrupts. Some computers are designed so that some kinds of interrupts and traps are precise and others are not. For example, having I/O interrupts be precise but traps due to fatal programming errors be imprecise is not so bad since no attempt need be made to restart a running process after it has divided by zero. Some machines have a bit that can be set to force all interrupts to be precise. The downside of set- ting this bit is that it forces the CPU to carefully log everything it is doing and maintain shadow copies of registers so it can generate a precise interrupt at any in- stant. All this overhead has a major impact on performance. Some superscalar machines, such as the x86 family, have precise interrupts to allow old software to work correctly. The price paid for backward compatibility with precise interrupts is extremely complex interrupt logic within the CPU to make sure that when the interrupt controller signals that it wants to cause an inter- rupt, all instructions up to some point are allowed to finish and none beyond that (b) SEC. 5.1 PRINCIPLES OF I/O HARDWARE 351 point are allowed to have any noticeable effect on the machine state. Here the price is paid not in time, but in chip area and in complexity of the design. If precise in- terrupts were not required for backward compatibility purposes, this chip area would be available for larger on-chip caches, making the CPU faster. On the other hand, imprecise interrupts make the operating system far more complicated and slower, so it is hard to tell which approach is really better. 5.2 PRINCIPLES OF I/O SOFTWARE Let us now turn away from the I/O hardware and look at the I/O software. First we will look at its goals and then at the different ways I/O can be done from the point of view of the operating system. 5.2.1 Goals of the I/O Software A key concept in the design of I/O software is known as device independence. What it means is that we should be able to write programs that can access any I/O device without having to specify the device in advance. For example, a program that reads a file as input should be able to read a file on a hard disk, a DVD, or on a USB stick without having to be modified for each different device. Similarly, one should be able to type a command such as sor t output and have it work with input coming from any kind of disk or the keyboard and the output going to any kind of disk or the screen. It is up to the operating system to take care of the problems caused by the fact that these devices really are different and require very different command sequences to read or write. Closely related to device independence is the goal of uniform naming. The name of a file or a device should simply be a string or an integer and not depend on the device in any way. In UNIX, all disks can be integrated in the file-system hier- archy in arbitrary ways so the user need not be aware of which name corresponds to which device. For example, a USB stick can be mounted on top of the directory /usr/ast/backup so that copying a file to /usr/ast/backup/monday copies the file to the USB stick. In this way, all files and devices are addressed the same way: by a path name. Another important issue for I/O software is error handling. In general, errors should be handled as close to the hardware as possible. If the controller discovers a read error, it should try to correct the error itself if it can. If it cannot, then the device driver should handle it, perhaps by just trying to read the block again. Many errors are transient, such as read errors caused by specks of dust on the read head, and will frequently go away if the operation is repeated. Only if the lower layers 352 INPUT/OUTPUT CHAP. 5 are not able to deal with the problem should the upper layers be told about it. In many cases, error recovery can be done transparently at a low level without the upper levels even knowing about the error. Still another important issue is that of synchronous (blocking) vs. asyn- chronous (interrupt-driven) transfers. Most physical I/O is asynchronous—the CPU starts the transfer and goes off to do something else until the interrupt arrives. User programs are much easier to write if the I/O operations are blocking—after a read system call the program is automatically suspended until the data are avail- able in the buffer. It is up to the operating system to make operations that are ac- tually interrupt-driven look blocking to the user programs. However, some very high-performance applications need to control all the details of the I/O, so some operating systems make asynchronous I/O available to them. Another issue for the I/O software is buffering. Often data that come off a de- vice cannot be stored directly in their final destination. For example, when a packet comes in off the network, the operating system does not know where to put it until it has stored the packet somewhere and examined it. Also, some devices have severe real-time constraints (for example, digital audio devices), so the data must be put into an output buffer in advance to decouple the rate at which the buffer is filled from the rate at which it is emptied, in order to avoid buffer underruns. Buff- ering involves considerable copying and often has a major impact on I/O per- formance. The final concept that we will mention here is sharable vs. dedicated devices. Some I/O devices, such as disks, can be used by many users at the same time. No problems are caused by multiple users having open files on the same disk at the same time. Other devices, such as printers, have to be dedicated to a single user until that user is finished. Then another user can have the printer. Having two or more users writing characters intermixed at random to the same page will defi- nitely not work. Introducing dedicated (unshared) devices also introduces a variety of problems, such as deadlocks. Again, the operating system must be able to hanfle both shared and dedicated devices in a way that avoids problems. 5.2.2 Programmed I/O There are three fundamentally different ways that I/O can be performed. In this section we will look at the first one (programmed I/O). In the next two sec- tions we will examine the others (interrupt-driven I/O and I/O using DMA). The simplest form of I/O is to have the CPU do all the work. This method is called pro- grammed I/O. It is simplest to illustrate how programmed I/O works by means of an example. Consider a user process that wants to print the eight-character string ‘‘ABCDE- FGH’’ on the printer via a serial interface. Displays on small embedded systems sometimes work this way. The software first assembles the string in a buffer in user space, as shown in Fig. 5-7(a). SEC. 5.2 PRINCIPLES OF I/O SOFTWARE 353 User space Kernel space Printed page Printed page String to be printed ABCD EFGH A AB (a) (b) (c) Next Figure 5-7. Steps in printing a string. The user process then acquires the printer for writing by making a system call to open it. If the printer is currently in use by another process, this call will fail and return an error code or will block until the printer is available, depending on the operating system and the parameters of the call. Once it has the printer, the user process makes a system call telling the operating system to print the string on the printer. The operating system then (usually) copies the buffer with the string to an array, say, p, in kernel space, where it is more easily accessed (because the kernel may have to change the memory map to get at user space). It then checks to see if the printer is currently available. If not, it waits until it is. As soon as the printer is available, the operating system copies the first character to the printer’s data regis- ter, in this example using memory-mapped I/O. This action activates the printer. The character may not appear yet because some printers buffer a line or a page be- fore printing anything. In Fig. 5-7(b), however, we see that the first character has been printed and that the system has marked the ‘‘B’’ as the next character to be printed. As soon as it has copied the first character to the printer, the operating system checks to see if the printer is ready to accept another one. Generally, the printer has a second register, which gives its status. The act of writing to the data register causes the status to become not ready. When the printer controller has processed the current character, it indicates its availability by setting some bit in its status reg- ister or putting some value in it. At this point the operating system waits for the printer to become ready again. When that happens, it prints the next character, as shown in Fig. 5-7(c). This loop continues until the entire string has been printed. Then control returns to the user process. The actions followed by the operating system are briefly summarized in Fig. 5-8. First the data are copied to the kernel. Then the operating system enters a Next ABCD EFGH ABCD EFGH 354 INPUT/OUTPUT CHAP. 5 tight loop, outputting the characters one at a time. The essential aspect of program- med I/O, clearly illustrated in this figure, is that after outputting a character, the CPU continuously polls the device to see if it is ready to accept another one. This behavior is often called polling or busy waiting. * * copy from user(buffer, p, count); / p is the kernel buffer / for (i = 0; i < count; i++) { * while ( printer status reg != READY) ; * printer data register = p[i]; return to user( ); * * * * / loop on every character / / loop until ready / * * / output one character / } Programmed I/O is simple but has the disadvantage of tying up the CPU full time until all the I/O is done. If the time to ‘‘print’’ a character is very short (because all the printer is doing is copying the new character to an internal buffer), then busy waiting is fine. Also, in an embedded system, where the CPU has nothing else to do, busy waiting is fine. However, in more complex systems, where the CPU has other work to do, busy waiting is inefficient. A better I/O method is needed. 5.2.3 Interrupt-Driven I/O Now let us consider the case of printing on a printer that does not buffer char- acters but prints each one as it arrives. If the printer can print, say 100 charac- ters/sec, each character takes 10 msec to print. This means that after every charac- ter is written to the printer’s data register, the CPU will sit in an idle loop for 10 msec waiting to be allowed to output the next character. This is more than enough time to do a context switch and run some other process for the 10 msec that would otherwise be wasted. The way to allow the CPU to do something else while waiting for the printer to become ready is to use interrupts. When the system call to print the string is made, the buffer is copied to kernel space, as we showed earlier, and the first character is copied to the printer as soon as it is willing to accept a character. At that point the CPU calls the scheduler and some other process is run. The process that asked for the string to be printed is blocked until the entire string has printed. The work done on the system call is shown in Fig. 5-9(a). When the printer has printed the character and is prepared to accept the next one, it generates an interrupt. This interrupt stops the current process and saves its state. Then the printer interrupt-service procedure is run. A crude version of this code is shown in Fig. 5-9(b). If there are no more characters to print, the interrupt handler takes some action to unblock the user. Otherwise, it outputs the next char- acter, acknowledges the interrupt, and returns to the process that was running just before the interrupt, which continues from where it left off. Figure 5-8. Writing a string to the printer using programmed I/O. SEC. 5.2 PRINCIPLES OF I/O SOFTWARE 355 copy from user(buffer,p,count); if (count == 0) { enable interrupts( ); unblock user( ); * * while ( printer status reg != READY) ; printer data register = p[0]; scheduler( ); } else { * printer data register = p[i]; count = count 1; i = i + 1; } acknowledge interrupt(); return from interrupt(); (a) (b) Figure 5-9. Writing a string to the printer using interrupt-driven I/O. (a) Code executed at the time the print system call is made. (b) Interrupt service procedure for the printer. 5.2.4 I/O Using DMA An obvious disadvantage of interrupt-driven I/O is that an interrupt occurs on every character. Interrupts take time, so this scheme wastes a certain amount of CPU time. A solution is to use DMA. Here the idea is to let the DMA controller feed the characters to the printer one at time, without the CPU being bothered. In essence, DMA is programmed I/O, only with the DMA controller doing all the work, instead of the main CPU. This strategy requires special hardware (the DMA controller) but frees up the CPU during the I/O to do other work. An outline of the code is given in Fig. 5-10. copyfromuser(buffer,p,count); acknowledgeinterrupt(); set up DMA controller( ); unblock user( ); return from interrupt( ); (a) (b) Figure 5-10. Printing a string using DMA. (a) Code executed when the print system call is made. (b) Interrupt-service procedure. The big win with DMA is reducing the number of interrupts from one per character to one per buffer printed. If there are many characters and interrupts are slow, this can be a major improvement. On the other hand, the DMA controller is usually much slower than the main CPU. If the DMA controller is not capable of driving the device at full speed, or the CPU usually has nothing to do anyway while waiting for the DMA interrupt, then interrupt-driven I/O or even pro- grammed I/O may be better. Most of the time, though, DMA is worth it. scheduler( ); 356 INPUT/OUTPUT CHAP. 5 5.3 I/O SOFTWARE LAYERS I/O software is typically organized in four layers, as shown in Fig. 5-11. Each layer has a well-defined function to perform and a well-defined interface to the ad- jacent layers. The functionality and interfaces differ from system to system, so the discussion that follows, which examines all the layers starting at the bottom, is not specific to one machine. User-level I/O software Device-independent operating system software Device drivers Interrupt handlers Hardware Figure 5-11. Layers of the I/O software system. 5.3.1 Interrupt Handlers While programmed I/O is occasionally useful, for most I/O, interrupts are an unpleasant fact of life and cannot be avoided. They should be hidden away, deep in the bowels of the operating system, so that as little of the operating system as pos- sible knows about them. The best way to hide them is to have the driver starting an I/O operation block until the I/O has completed and the interrupt occurs. The driver can block itself, for example, by doing a down on a semaphore, a wait on a condi- tion variable, a receive on a message, or something similar. When the interrupt happens, the interrupt procedure does whatever it has to in order to handle the interrupt. Then it can unblock the driver that was waiting for it. In some cases it will just complete up on a semaphore. In others it will do a signal on a condition variable in a monitor. In still others, it will send a message to the blocked driver. In all cases the net effect of the interrupt will be that a driver that was previously blocked will now be able to run. This model works best if drivers are structured as kernel processes, with their own states, stacks, and program counters. Of course, reality is not quite so simple. Processing an interrupt is not just a matter of taking the interrupt, doing an up on some semaphore, and then executing an IRET instruction to return from the interrupt to the previous process. There is a great deal more work involved for the operating system. We will now give an out- line of this work as a series of steps that must be performed in software after the hardware interrupt has completed. It should be noted that the details are highly SEC. 5.3 I/O SOFTWARE LAYERS 357 system dependent, so some of the steps listed below may not be needed on a partic- ular machine, and steps not listed may be required. Also, the steps that do occur may be 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. in a different order on some machines. Save any registers (including the PSW) that have not already been saved by the interrupt hardware. Set up a context for the interrupt-service procedure. Doing this may involve setting up the TLB, MMU and a page table. Set up a stack for the interrupt service-procedure. Acknowledge the interrupt controller. If there is no centralized inter- rupt controller, reenable interrupts. Copy the registers from where they were saved (possibly some stack) to the process table. Run the interrupt-service procedure. It will extract information from the interrupting device controller’s registers. Choose which process to run next. If the interrupt has caused some high-priority process that was blocked to become ready, it may be chosen to run now. Set up the MMU context for the process to run next. Some TLB set- up may also be needed. Load the new process’ registers, including its PSW. Start running the new process. As can be seen, interrupt processing is far from trivial. It also takes a considerable number of CPU instructions, especially on machines in which virtual memory is present and page tables have to be set up or the state of the MMU stored (e.g., the R and M bits). On some machines the TLB and CPU cache may also have to be managed when switching between user and kernel modes, which takes additional machine cycles. 5.3.2 Device Drivers Earlier in this chapter we looked at what device controllers do. We saw that each controller has some device registers used to give it commands or some device registers used to read out its status or both. The number of device registers and the nature of the commands vary radically from device to device. For example, a mouse driver has to accept information from the mouse telling it how far it has moved and which buttons are currently depressed. In contrast, a disk driver may 358 INPUT/OUTPUT CHAP. 5 have to know all about sectors, tracks, cylinders, heads, arm motion, motor drives, head settling times, and all the other mechanics of making the disk work properly. Obviously, these drivers will be very different. Consequently, each I/O device attached to a computer needs some device-spe- cific code for controlling it. This code, called the device driver, is generally writ- ten by the device’s manufacturer and delivered along with the device. Since each operating system needs its own drivers, device manufacturers commonly supply drivers for several popular operating systems. Each device driver normally handles one device type, or at most, one class of closely related devices. For example, a SCSI disk driver can usually handle multi- ple SCSI disks of different sizes and different speeds, and perhaps a SCSI Blu-ray disk as well. On the other hand, a mouse and joystick are so different that different drivers are usually required. However, there is no technical restriction on having one device driver control multiple unrelated devices. It is just not a good idea in most cases. Sometimes though, wildly different devices are based on the same underlying technology. The bestknown example is probably USB, a serial bus technology that is not called ‘‘universal’’ for nothing. USB devices include disks, memory sticks, cameras, mice, keyboards, mini-fans, wireless network cards, robots, credit card readers, rechargeable shavers, paper shredders, bar code scanners, disco balls, and portable thermometers. They all use USB and yet they all do very different things. The trick is that USB drivers are typically stacked, like a TCP/IP stack in networks. At the bottom, typically in hardware, we find the USB link layer (serial I/O) that handles hardware stuff like signaling and decoding a stream of signals to USB packets. It is used by higher layers that deal with the data packets and the common functionality for USB that is shared by most devices. On top of that, finally, we find the higher-layer APIs such as the interfaces for mass storage, cameras, etc. Thus, we still have separate device drivers, even though they share part of the pro- tocol stack. In order to access the device’s hardware, actually, meaning the controller’s reg- isters, the device driver normally has to be part of the operating system kernel, at least with current architectures. Actually, it is possible to construct drivers that run in user space, with system calls for reading and writing the device registers. This design isolates the kernel from the drivers and the drivers from each other, elimi- nating a major source of system crashes—buggy drivers that interfere with the ker- nel in one way or another. For building highly reliable systems, this is definitely the way to go. An example of a system in which the device drivers run as user processes is MINIX 3 (www.minix3.org). However, since most other desktop oper- ating systems expect drivers to run in the kernel, that is the model we will consider here. Since the designers of every operating system know that pieces of code (driv- ers) written by outsiders will be installed in it, it needs to have an architecture that allows such installation. This means having a well-defined model of what a driver SEC. 5.3 I/O SOFTWARE LAYERS 359 does and how it interacts with the rest of the operating system. Device drivers are normally positioned below the rest of the operating system, as is illustrated in Fig. 5-12. User process User program Printer driver Rest of the operating system Camcorder driver CD-ROM driver Printer controller Camcorder controller CD-ROM controller User space Kernel space Hardware Devices Figure 5-12. Logical positioning of device drivers. In reality all communication between drivers and device controllers goes over the bus. Operating systems usually classify drivers into one of a small number of cate- gories. The most common categories are the block devices, such as disks, which contain multiple data blocks that can be addressed independently, and the charac- ter devices, such as keyboards and printers, which generate or accept a stream of characters. Most operating systems define a standard interface that all block drivers must support and a second standard interface that all character drivers must support. These interfaces consist of a number of procedures that the rest of the operating system can call to get the driver to do work for it. Typical procedures are those to read a block (block device) or write a character string (character device). In some systems, the operating system is a single binary program that contains all of the drivers it will need compiled into it. This scheme was the norm for years 360 INPUT/OUTPUT CHAP. 5 with UNIX systems because they were run by computer centers and I/O devices rarely changed. If a new device was added, the system administrator simply re- compiled the kernel with the new driver to build a new binary. With the advent of personal computers, with their myriad I/O devices, this model no longer worked. Few users are capable of recompiling or relinking the kernel, even if they have the source code or object modules, which is not always the case. Instead, operating systems, starting with MS-DOS, went over to a model in which drivers were dynamically loaded into the system during execution. Dif- ferent systems handle loading drivers in different ways. A device driver has several functions. The most obvious one is to accept abstract read and write requests from the device-independent software above it and see that they are carried out. But there are also a few other functions they must per- form. For example, the driver must initialize the device, if needed. It may also need to manage its power requirements and log events. Many device drivers have a similar general structure. A typical driver starts out by checking the input parameters to see if they are valid. If not, an error is re- turned. If they are valid, a translation from abstract to concrete terms may be need- ed. For a disk driver, this may mean converting a linear block number into the head, track, sector, and cylinder numbers for the disk’s geometry. Next the driver may check if the device is currently in use. If it is, the request will be queued for later processing. If the device is idle, the hardware status will be examined to see if the request can be handled now. It may be necessary to switch the device on or start a motor before transfers can be begun. Once the de- vice is on and ready to go, the actual control can begin. Controlling the device means issuing a sequence of commands to it. The driver is the place where the command sequence is determined, depending on what has to be done. After the driver knows which commands it is going to issue, it starts writ- ing them into the controller’s device registers. After each command is written to the controller, it may be necessary to check to see if the controller accepted the command and is prepared to accept the next one. This sequence continues until all the commands have been issued. Some controllers can be given a linked list of commands (in memory) and told to read and process them all by itself without fur- ther help from the operating system. After the commands have been issued, one of two situations will apply. In many cases the device driver must wait until the controller does some work for it, so it blocks itself until the interrupt comes in to unblock it. In other cases, howev- er, the operation finishes without delay, so the driver need not block. As an ex- ample of the latter situation, scrolling the screen requires just writing a few bytes into the controller’s registers. No mechanical motion is needed, so the entire oper- ation can be completed in nanoseconds. In the former case, the blocked driver will be awakened by the interrupt. In the latter case, it will never go to sleep. Either way, after the operation has been com- pleted, the driver must check for errors. If everything is all right, the driver may SEC. 5.3 I/O SOFTWARE LAYERS 361 have some data to pass to the device-independent software (e.g., a block just read). Finally, it returns some status information for error reporting back to its caller. If any other requests are queued, one of them can now be selected and started. If nothing is queued, the driver blocks waiting for the next request. This simple model is only a rough approximation to reality. Many factors make the code much more complicated. For one thing, an I/O device may complete while a driver is running, interrupting the driver. The interrupt may cause a device driver to run. In fact, it may cause the current driver to run. For example, while the network driver is processing an incoming packet, another packet may arrive. Consequently, drivers have to be reentrant, meaning that a running driver has to expect that it will be called a second time before the first call has completed. In a hot-pluggable system, devices can be added or removed while the com- puter is running. As a result, while a driver is busy reading from some device, the system may inform it that the user has suddenly removed that device from the sys- tem. Not only must the current I/O transfer be aborted without damaging any ker- nel data structures, but any pending requests for the now-vanished device must also be gracefully removed from the system and their callers given the bad news. Fur- thermore, the unexpected addition of new devices may cause the kernel to juggle resources (e.g., interrupt request lines), taking old ones away from the driver and giving it new ones in their place. Drivers are not allowed to make system calls, but they often need to interact with the rest of the kernel. Usually, calls to certain kernel procedures are permitted. For example, there are usually calls to allocate and deallocate hardwired pages of memory for use as buffers. Other useful calls are needed to manage the MMU, timers, the DMA controller, the interrupt controller, and so on. 5.3.3 Device-Independent I/O Software Although some of the I/O software is device specific, other parts of it are de- vice independent. The exact boundary between the drivers and the device-indepen- dent software is system (and device) dependent, because some functions that could be done in a device-independent way may actually be done in the drivers, for ef- ficiency or other reasons. The functions shown in Fig. 5-13 are typically done in the device-independent software. Uniform interfacing for device drivers Buffering Error reporting Allocating and releasing dedicated devices Providing a device-independent block size Figure 5-13. Functions of the device-independent I/O software. 362 INPUT/OUTPUT CHAP. 5 The basic function of the device-independent software is to perform the I/O functions that are common to all devices and to provide a uniform interface to the user-level software. We will now look at the above issues in more detail. Uniform Interfacing for Device Drivers A major issue in an operating system is how to make all I/O devices and driv- ers look more or less the same. If disks, printers, keyboards, and so on, are all in- terfaced in different ways, every time a new device comes along, the operating sys- tem must be modified for the new device. Having to hack on the operating system for each new device is not a good idea. One aspect of this issue is the interface between the device drivers and the rest of the operating system. In Fig. 5-14(a) we illustrate a situation in which each de- vice driver has a different interface to the operating system. What this means is that the driver functions available for the system to call differ from driver to driver. It might also mean that the kernel functions that the driver needs also differ from driver to driver. Taken together, it means that interfacing each new driver requires a lot of new programming effort. Operating system SATA disk driver USB disk driver SCSI disk driver SATA disk driver USB disk driver SCSI disk driver (a) (b) Figure 5-14. (a) Without a standard driver interface. (b) With a standard driver interface. In contrast, in Fig. 5-14(b), we show a different design in which all drivers have the same interface. Now it becomes much easier to plug in a new driver, pro- viding it conforms to the driver interface. It also means that driver writers know what is expected of them. In practice, not all devices are absolutely identical, but usually there are only a small number of device types and even these are generally almost the same. The way this works is as follows. For each class of devices, such as disks or printers, the operating system defines a set of functions that the driver must supply. For a disk these would naturally include read and write, but also turning the power Operating system SEC. 5.3 I/O SOFTWARE LAYERS 363 on and off, formatting, and other disky things. Often the driver holds a table with pointers into itself for these functions. When the driver is loaded, the operating system records the address of this table of function pointers, so when it needs to call one of the functions, it can make an indirect call via this table. This table of function pointers defines the interface between the driver and the rest of the operat- ing system. All devices of a given class (disks, printers, etc.) must obey it. Another aspect of having a uniform interface is how I/O devices are named. The device-independent software takes care of mapping symbolic device names onto the proper driver. For example, in UNIX a device name, such as /dev/disk0, uniquely specifies the i-node for a special file, and this i-node contains the major device number, which is used to locate the appropriate driver. The i-node also contains the minor device number, which is passed as a parameter to the driver in order to specify the unit to be read or written. All devices have major and minor numbers, and all drivers are accessed by using the major device number to select the driver. Closely related to naming is protection. How does the system prevent users from accessing devices that they are not entitled to access? In both UNIX and Windows, devices appear in the file system as named objects, which means that the usual protection rules for files also apply to I/O devices. The system administrator can then set the proper permissions for each device. Buffering Buffering is also an issue, both for block and character devices, for a variety of reasons. To see one of them, consider a process that wants to read data from an (ADSL—Asymmetric Digital Subscriber Line) modem, something many people use at home to connect to the Internet. One possible strategy for dealing with the incoming characters is to have the user process do a read system call and block waiting for one character. Each arriving character causes an interrupt. The inter- rupt-service procedure hands the character to the user process and unblocks it. After putting the character somewhere, the process reads another character and blocks again. This model is indicated in Fig. 5-15(a). The trouble with this way of doing business is that the user process has to be started up for every incoming character. Allowing a process to run many times for short runs is inefficient, so this design is not a good one. An improvement is shown in Fig. 5-15(b). Here the user process provides an n-character buffer in user space and does a read of n characters. The interrupt-ser- vice procedure puts incoming characters in this buffer until it is completely full. Only then does it wakes up the user process. This scheme is far more efficient than the previous one, but it has a drawback: what happens if the buffer is paged out when a character arrives? The buffer could be locked in memory, but if many processes start locking pages in memory willy nilly, the pool of available pages will shrink and performance will degrade. 364 INPUT/OUTPUT CHAP. 5 User process 2 2 User space Kernel space Modem (a) Modem (b) 113 Modem Modem (c) (d) Figure 5-15. (a) Unbuffered input. (b) Buffering in user space. (c) Buffering in the kernel followed by copying to user space. (d) Double buffering in the kernel. Yet another approach is to create a buffer inside the kernel and have the inter- rupt handler put the characters there, as shown in Fig. 5-15(c). When this buffer is full, the page with the user buffer is brought in, if needed, and the buffer copied there in one operation. This scheme is far more efficient. However, even this improved scheme suffers from a problem: What happens to characters that arrive while the page with the user buffer is being brought in from the disk? Since the buffer is full, there is no place to put them. A way out is to have a second kernel buffer. After the first buffer fills up, but before it has been emptied, the second one is used, as shown in Fig. 5-15(d). When the second buffer fills up, it is available to be copied to the user (assuming the user has asked for it). While the second buffer is being copied to user space, the first one can be used for new characters. In this way, the two buffers take turns: while one is being copied to user space, the other is accumulating new input. A buffering scheme like this is called double buffering. Another common form of buffering is the circular buffer. It consists of a re- gion of memory and two pointers. One pointer points to the next free word, where new data can be placed. The other pointer points to the first word of data in the buffer that has not been removed yet. In many situations, the hardware advances the first pointer as it adds new data (e.g., just arriving from the network) and the operating system advances the second pointer as it removes and processes data. Both pointers wrap around, going back to the bottom when they hit the top. Buffering is also important on output. Consider, for example, how output is done to the modem without buffering using the model of Fig. 5-15(b). The user process executes a write system call to output n characters. The system has two choices at this point. It can block the user until all the characters have been writ- ten, but this could take a very long time over a slow telephone line. It could also release the user immediately and do the I/O while the user computes some more, SEC. 5.3 I/O SOFTWARE LAYERS 365 but this leads to an even worse problem: how does the user process know that the output has been completed and it can reuse the buffer? The system could generate a signal or software interrupt, but that style of programming is difficult and prone to race conditions. A much better solution is for the kernel to copy the data to a kernel buffer, analogous to Fig. 5-15(c) (but the other way), and unblock the caller immediately. Now it does not matter when the actual I/O has been completed. The user is free to reuse the buffer the instant it is unblocked. Buffering is a widely used technique, but it has a downside as well. If data get buffered too many times, performance suffers. Consider, for example, the network of Fig. 5-16. Here a user does a system call to write to the network. The kernel copies the packet to a kernel buffer to allow the user to proceed immediately (step 1). At this point the user program can reuse the buffer. 1 User space Kernel space User process 2 3 Network controller 5 4 Network Figure 5-16. Networking may involve many copies of a packet. When the driver is called, it copies the packet to the controller for output (step 2). The reason it does not output to the wire directly from kernel memory is that once a packet transmission has been started, it must continue at a uniform speed. The driver cannot guarantee that it can get to memory at a uniform speed because DMA channels and other I/O devices may be stealing many cycles. Failing to get a word on time would ruin the packet. By buffering the packet inside the controller, this problem is avoided. After the packet has been copied to the controller’s internal buffer, it is copied out onto the network (step 3). Bits arrive at the receiver shortly after being sent, so just after the last bit has been sent, that bit arrives at the receiver, where the packet has been buffered in the controller. Next the packet is copied to the receiver’s ker- nel buffer (step 4). Finally, it is copied to the receiving process’ buffer (step 5). Usually, the receiver then sends back an acknowledgement. When the sender gets the acknowledgement, it is free to send the next packet. However, it should be clear that all this copying is going to slow down the transmission rate considerably because all the steps must happen sequentially. 366 INPUT/OUTPUT CHAP. 5 Error Reporting Errors are far more common in the context of I/O than in other contexts. When they occur, the operating system must handle them as best it can. Many errors are device specific and must be handled by the appropriate driver, but the framework for error handling is device independent. One class of I/O errors is programming errors. These occur when a process asks for something impossible, such as writing to an input device (keyboard, scan- ner, mouse, etc.) or reading from an output device (printer, plotter, etc.). Other er- rors are providing an invalid buffer address or other parameter, and specifying an invalid device (e.g., disk 3 when the system has only two disks), and so on. The action to take on these errors is straightforward: just report back an error code to the caller. Another class of errors is the class of actual I/O errors, for example, trying to write a disk block that has been damaged or trying to read from a camcorder that has been switched off. In these circumstances, it is up to the driver to determine what to do. If the driver does not know what to do, it may pass the problem back up to device-independent software. What this software does depends on the environment and the nature of the error. If it is a simple read error and there is an interactive user available, it may display a dialog box asking the user what to do. The options may include retrying a certain number of times, ignoring the error, or killing the calling process. If there is no user available, probably the only real option is to have the system call fail with an error code. However, some errors cannot be handled this way. For example, a critical data structure, such as the root directory or free block list, may have been destroyed. In this case, the system may have to display an error message and terminate. There is not much else it can do. Allocating and Releasing Dedicated Devices Some devices, such as printers, can be used only by a single process at any given moment. It is up to the operating system to examine requests for device usage and accept or reject them, depending on whether the requested device is available or not. A simple way to handle these requests is to require processes to perform opens on the special files for devices directly. If the device is unavailable, the open fails. Closing such a dedicated device then releases it. An alternative approach is to have special mechanisms for requesting and releasing dedicated devices. An attempt to acquire a device that is not available blocks the caller instead of failing. Blocked processes are put on a queue. Sooner or later, the requested device becomes available and the first process on the queue is allowed to acquire it and continue execution. SEC. 5.3 I/O SOFTWARE LAYERS 367 Device-Independent Block Size Different disks may have different sector sizes. It is up to the device-indepen- dent software to hide this fact and provide a uniform block size to higher layers, for example, by treating several sectors as a single logical block. In this way, the higher layers deal only with abstract devices that all use the same logical block size, independent of the physical sector size. Similarly, some character devices de- liver their data one byte at a time (e.g., mice), while others deliver theirs in larger units (e.g., Ethernet interfaces). These differences may also be hidden. 5.3.4 User-Space I/O Software Although most of the I/O software is within the operating system, a small por- tion of it consists of libraries linked together with user programs, and even whole programs running outside the kernel. System calls, including the I/O system calls, are normally made by library procedures. When a C program contains the call count = write(fd, buffer, nbytes); the library procedure write might be linked with the program and contained in the binary program present in memory at run time. In other systems, libraries can be loaded during program execution. Either way, the collection of all these library procedures is clearly part of the I/O system. While these procedures do little more than put their parameters in the ap- propriate place for the system call, other I/O procedures actually do real work. In particular, formatting of input and output is done by library procedures. One ex- ample from C is printf, which takes a format string and possibly some variables as input, builds an ASCII string, and then calls write to output the string. As an ex- ample of printf, consider the statement * printf("The square of %3d is %6d\n", i, i i); It formats a string consisting of the 14-character string ‘‘The square of ’’ followed by the value i as a 32 character string, then the 4-character string ‘‘ is ’’, then i as 6 characters, and finally a line feed. An example of a similar procedure for input is scanf, which reads input and stores it into variables described in a format string using the same syntax as printf. The standard I/O library contains a number of procedures that involve I/O and all run as part of user programs. Not all user-level I/O software consists of library procedures. Another impor- tant category is the spooling system. Spooling is a way of dealing with dedicated I/O devices in a multiprogramming system. Consider a typical spooled device: a printer. Although it would be technically easy to let any user process open the character special file for the printer, suppose a process opened it and then did noth- ing for hours. No other process could print anything. 368 INPUT/OUTPUT CHAP. 5 Instead what is done is to create a special process, called a daemon, and a spe- cial directory, called a spooling directory. To print a file, a process first generates the entire file to be printed and puts it in the spooling directory. It is up to the dae- mon, which is the only process having permission to use the printer’s special file, to print the files in the directory. By protecting the special file against direct use by users, the problem of having someone keeping it open unnecessarily long is elimi- nated. Spooling is used not only for printers. It is also used in other I/O situations. For example, file transfer over a network often uses a network daemon. To send a file somewhere, a user puts it in a network spooling directory. Later on, the net- work daemon takes it out and transmits it. One particular use of spooled file trans- mission is the USENET News system (now part of Google Groups). This network consists of millions of machines around the world communicating using the Inter- net. Thousands of news groups exist on many topics. To post a news message, the user invokes a news program, which accepts the message to be posted and then deposits it in a spooling directory for transmission to other machines later. The en- tire news system runs outside the operating system. Figure 5-17 summarizes the I/O system, showing all the layers and the princi- pal functions of each layer. Starting at the bottom, the layers are the hardware, in- terrupt handlers, device drivers, deviceindependent software, and finally the user processes. Layer I/O reply I/O functions Make I/O call; format I/O; spooling Naming, protection, blocking, buffering, allocation Set up device registers; check status Wake up driver when I/O completed Perform I/O operation User processes Device-independent software Device drivers Interrupt handlers Hardware I/O request Figure 5-17. Layers of the I/O system and the main functions of each layer. The arrows in Fig. 5-17 show the flow of control. When a user program tries to read a block from a file, for example, the operating system is invoked to carry out the call. The device-independent software looks for it, say, in the buffer cache. If the needed block is not there, it calls the device driver to issue the request to the hardware to go get it from the disk. The process is then blocked until the disk oper- ation has been completed and the data are safely available in the caller’s buffer. SEC. 5.3 I/O SOFTWARE LAYERS 369 When the disk is finished, the hardware generates an interrupt. The interrupt handler is run to discover what has happened, that is, which device wants attention right now. It then extracts the status from the device and wakes up the sleeping process to finish off the I/O request and let the user process continue. 5.4 DISKS Now we will begin studying some real I/O devices. We will begin with disks, which are conceptually simple, yet very important. After that we will examine clocks, keyboards, and displays. 5.4.1 Disk Hardware Disks come in a variety of types. The most common ones are the magnetic hard disks. They are characterized by the fact that reads and writes are equally fast, which makes them suitable as secondary memory (paging, file systems, etc.). Arrays of these disks are sometimes used to provide highly reliable storage. For distribution of programs, data, and movies, optical disks (DVDs and Blu-ray) are also important. Finally, solid-state disks are increasingly popular as they are fast and do not contain moving parts. In the following sections we will discuss mag- netic disks as an example of the hardware and then describe the software for disk devices in general. Magnetic Disks Magnetic disks are organized into cylinders, each one containing as many tracks as there are heads stacked vertically. The tracks are divided into sectors, with the number of sectors around the circumference typically being 8 to 32 on floppy disks, and up to several hundred on hard disks. The number of heads varies from 1 to about 16. Older disks have little electronics and just deliver a simple serial bit stream. On these disks, the controller does most of the work. On other disks, in particular, IDE (Integrated Drive Electronics) and SATA (Serial ATA) disks, the disk drive itself contains a microcontroller that does considerable work and allows the real controller to issue a set of higher-level commands. The controller often does track caching, bad-block remapping, and much more. A device feature that has important implications for the disk driver is the possi- bility of a controller doing seeks on two or more drives at the same time. These are known as overlapped seeks. While the controller and software are waiting for a seek to complete on one drive, the controller can initiate a seek on another drive. Many controllers can also read or write on one drive while seeking on one or more other drives, but a floppy disk controller cannot read or write on two drives at the 370 INPUT/OUTPUT CHAP. 5 same time. (Reading or writing requires the controller to move bits on a microsec- ond time scale, so one transfer uses up most of its computing power.) The situa- tion is different for hard disks with integrated controllers, and in a system with more than one of these hard drives they can operate simultaneously, at least to the extent of transferring between the disk and the controller’s buffer memory. Only one transfer between the controller and the main memory is possible at once, how- ever. The ability to perform two or more operations at the same time can reduce the average access time considerably. Figure 5-18 compares parameters of the standard storage medium for the origi- nal IBM PC with parameters of a disk made three decades later to show how much disks changed in that time. It is interesting to note that not all parameters have im- proved as much. Average seek time is almost 9 times better than it was, transfer rate is 16,000 times better, while capacity is up by a factor of 800,000. This pattern has to do with relatively gradual improvements in the moving parts, but much higher bit densities on the recording surfaces. Parameter Number of cylinders IBM 360-KB floppy disk WD 3000 HLFS hard disk 40 36,481 Tracks per cylinder 2 Sectors per track 9 Sectors per disk 720 Bytes per sector 512 Disk capacity 360 KB Seek time (adjacent cylinders) 6 msec Seek time (average case) 77 msec Rotation time 200 msec Time to transfer 1 sector 22 msec 255 63 (avg) 586,072,368 512 300 GB 0.7 msec 4.2 msec 6 msec 1.4 sec Figure 5-18. Disk parameters for the original IBM PC 360-KB floppy disk and a Western Digital WD 3000 HLFS (‘‘Velociraptor’’) hard disk. One thing to be aware of in looking at the specifications of modern hard disks is that the geometry specified, and used by the driver software, is almost always different from the physical format. On old disks, the number of sectors per track was the same for all cylinders. Modern disks are divided into zones with more sec- tors on the outer zones than the inner ones. Fig. 5-19(a) illustrates a tiny disk with two zones. The outer zone has 32 sectors per track; the inner one has 16 sectors per track. A real disk, such as the WD 3000 HLFS, typically has 16 or more zones, with the number of sectors increasing by about 4% per zone as one goes out from the innermost to the outermost zone. To hide the details of how many sectors each track has, most modern disks have a virtual geometry that is presented to the operating system. The software is instructed to act as though there are x cylinders, y heads, and z sectors per track. SEC. 5.4 DISKS 371 Figure 5-19. (a) Physical geometry of a disk with two zones. (b) A possible vir- tual geometry for this disk. The controller then remaps a request for (x, y, z) onto the real cylinder, head, and sector. A possible virtual geometry for the physical disk of Fig. 5-19(a) is shown in Fig. 5-19(b). In both cases the disk has 192 sectors, only the published arrange- ment is different than the real one. For PCs, the maximum values for these three parameters are often (65535, 16, and 63), due to the need to be backward compatible with the limitations of the original IBM PC. On this machine, 16-, 4-, and 6-bit fields were used to specify these numbers, with cylinders and sectors numbered starting at 1 and heads num- bered starting at 0. With these parameters and 512 bytes per sector, the largest pos- sible disk is 31.5 GB. To get around this limit, all modern disks now support a sys- tem called logical block addressing, in which disk sectors are just numbered con- secutively starting at 0, without regard to the disk geometry. RAID CPU performance has been increasing exponentially over the past decade, roughly doubling every 18 months. Not so with disk performance. In the 1970s, average seek times on minicomputer disks were 50 to 100 msec. Now seek times are still a few msec. In most technical industries (say, automobiles or aviation), a factor of 5 to 10 performance improvement in two decades would be major news (imagine 300-MPG cars), but in the computer industry it is an embarrassment. Thus the gap between CPU performance and (hard) disk performance has become much larger over time. Can anything be done to help? 1 4 3 2 0 3 3 0 0 2 9 1 1 2 2 2 8 2 2 2 3 7 2 1 4 3 2 6 2 5 0 2 4 5 6 2 5 1 4 0 1 1 9 5 4 3 1 2 7 2 1 3 2 1 3 1 4 1 8 2 5 0 1 8 1 6 6 7 9 2 8 2 9 7 1 7 1 1 2 0 6 8 1 0 1 9 2 1 1 9 1 2 5 0 1 1 1 8 3 1 1 1 1 4 4 1 1 1 5 2 7 1 6 3 1 1 372 INPUT/OUTPUT CHAP. 5 Yes! As we have seen, parallel processing is increasingly being used to speed up CPU performance. It has occurred to various people over the years that parallel I/O might be a good idea, too. In their 1988 paper, Patterson et al. suggested six specific disk organizations that could be used to improve disk performance, re- liability, or both (Patterson et al., 1988). These ideas were quickly adopted by in- dustry and have led to a new class of I/O device called a RAID. Patterson et al. defined RAID as Redundant Array of Inexpensive Disks, but industry redefined the I to be ‘‘Independent’’ rather than ‘‘Inexpensive’’ (maybe so they could charge more?). Since a villain was also needed (as in RISC vs. CISC, also due to Patterson), the bad guy here was the SLED (Single Large Expensive Disk). The fundamental idea behind a RAID is to install a box full of disks next to the computer, typically a large server, replace the disk controller card with a RAID controller, copy the data over to the RAID, and then continue normal operation. In other words, a RAID should look like a SLED to the operating system but have better performance and better reliability. In the past, RAIDs consisted almost ex- clusively of a RAID SCSI controller plus a box of SCSI disks, because the per- formance was good and modern SCSI supports up to 15 disks on a single con- troller. Nowadays, many manufacturers also offer (less expensive) RAIDs based on SATA. In this way, no software changes are required to use the RAID, a big sell- ing point for many system administrators. In addition to appearing like a single disk to the software, all RAIDs have the property that the data are distributed over the drives, to allow parallel operation. Several different schemes for doing this were defined by Patterson et al. Now- adays, most manufacturers refer to the seven standard configurations as RAID level 0 through RAID level 6. In addition, there are a few other minor levels that we will not discuss. The term ‘‘level’’ is something of a misnomer since no hier- archy is involved; there are simply seven different organizations possible. RAID level 0 is illustrated in Fig. 5-20(a). It consists of viewing the virtual single disk simulated by the RAID as being divided up into strips of k sectors each, with sectors 0 to k 1 being strip 0, sectors k to 2k 1 strip 1, and so on. For k  1, each strip is a sector; for k  2 a strip is two sectors, etc. The RAID level 0 organization writes consecutive strips over the drives in round-robin fashion, as depicted in Fig. 5-20(a) for a RAID with four disk drives. Distributing data over multiple drives like this is called striping. For example, if the software issues a command to read a data block consisting of four consecu- tive strips starting at a strip boundary, the RAID controller will break this com- mand up into four separate commands, one for each of the four disks, and have them operate in parallel. Thus we have parallel I/O without the software knowing about it. RAID level 0 works best with large requests, the bigger the better. If a request is larger than the number of drives times the strip size, some drives will get multi- ple requests, so that when they finish the first request they start the second one. It is up to the controller to split the request up and feed the proper commands to the SEC. 5.4 DISKS 373 proper disks in the right sequence and then assemble the results in memory cor- rectly. Performance is excellent and the implementation is straightforward. RAID level 0 works worst with operating systems that habitually ask for data one sector at a time. The results will be correct, but there is no parallelism and hence no performance gain. Another disadvantage of this organization is that the reliability is potentially worse than having a SLED. If a RAID consists of four disks, each with a mean time to failure of 20,000 hours, about once every 5000 hours a drive will fail and all the data will be completely lost. A SLED with a mean time to failure of 20,000 hours would be four times more reliable. Because no redundancy is present in this design, it is not really a true RAID. The next option, RAID level 1, shown in Fig. 5-20(b), is a true RAID. It dupli- cates all the disks, so there are four primary disks and four backup disks. On a write, every strip is written twice. On a read, either copy can be used, distributing the load over more drives. Consequently, write performance is no better than for a single drive, but read performance can be up to twice as good. Fault tolerance is excellent: if a drive crashes, the copy is simply used instead. Recovery consists of simply installing a new drive and copying the entire backup drive to it. Unlike levels 0 and 1, which work with strips of sectors, RAID level 2 works on a word basis, possibly even a byte basis. Imagine splitting each byte of the sin- gle virtual disk into a pair of 4-bit nibbles, then adding a Hamming code to each one to form a 7-bit word, of which bits 1, 2, and 4 were parity bits. Further imagine that the seven drives of Fig. 5-20(c) were synchronized in terms of arm position and rotational position. Then it would be possible to write the 7-bit Hamming coded word over the seven drives, one bit per drive. The Thinking Machines CM-2 computer used this scheme, taking 32-bit data words and adding 6 parity bits to form a 38-bit Hamming word, plus an extra bit for word parity, and spread each word over 39 disk drives. The total throughput was immense, because in one sector time it could write 32 sectors worth of data. Also, losing one drive did not cause problems, because loss of a drive amounted to losing 1 bit in each 39-bit word read, something the Hamming code could handle on the fly. On the down side, this scheme requires all the drives to be rotationally syn- chronized, and it only makes sense with a substantial number of drives (even with 32 data drives and 6 parity drives, the overhead is 19%). It also asks a lot of the controller, since it must do a Hamming checksum every bit time. RAID level 3 is a simplified version of RAID level 2. It is illustrated in Fig. 5-20(d). Here a single parity bit is computed for each data word and written to a parity drive. As in RAID level 2, the drives must be exactly synchronized, since individual data words are spread over multiple drives. At first thought, it might appear that a single parity bit gives only error detec- tion, not error correction. For the case of random undetected errors, this observa- tion is true. However, for the case of a drive crashing, it provides full 1-bit error correction since the position of the bad bit is known. In the event that a drive 374 INPUT/OUTPUT CHAP. 5 Figure 5-20. RAID levels 0 through 6. Backup and parity drives are shown shaded. SEC. 5.4 DISKS 375 crashes, the controller just pretends that all its bits are 0s. If a word has a parity error, the bit from the dead drive must have been a 1, so it is corrected. Although both RAID levels 2 and 3 offer very high data rates, the number of separate I/O re- quests per second they can handle is no better than for a single drive. RAID levels 4 and 5 work with strips again, not individual words with parity, and do not require synchronized drives. RAID level 4 [see Fig. 5-20(e)] is like RAID level 0, with a strip-for-strip parity written onto an extra drive. For example, if each strip is k bytes long, all the strips are EXCLUSIVE ORed together, re- sulting in a parity strip k bytes long. If a drive crashes, the lost bytes can be recomputed from the parity drive by reading the entire set of drives. This design protects against the loss of a drive but performs poorly for small updates. If one sector is changed, it is necessary to read all the drives in order to recalculate the parity, which must then be rewritten. Alternatively, it can read the old user data and the old parity data and recompute the new parity from them. Even with this optimization, a small update requires two reads and two writes. As a consequence of the heavy load on the parity drive, it may become a bot- tleneck. This bottleneck is eliminated in RAID level 5 by distributing the parity bits uniformly over all the drives, round-robin fashion, as shown in Fig. 5-20(f). However, in the event of a drive crash, reconstructing the contents of the failed drive is a complex process. Raid level 6 is similar to RAID level 5, except that an additional parity block is used. In other words, the data is striped across the disks with two parity blocks in- stead of one. As a result, writes are bit more expensive because of the parity calcu- lations, but reads incur no performance penalty. It does offer more reliability (im- agine what happens if RAID level 5 encounters a bad block just when it is rebuild- ing its array). 5.4.2 Disk Formatting A hard disk consists of a stack of aluminum, alloy, or glass platters typically 3.5 inch in diameter (or 2.5 inch on notebook computers). On each platter is deposited a thin magnetizable metal oxide. After manufacturing, there is no infor- mation whatsoever on the disk. Before the disk can be used, each platter must receive a low-level format done by software. The format consists of a series of concentric tracks, each containing some number of sectors, with short gaps between the sectors. The format of a sec- tor is shown in Fig. 5-21. Preamble Data ECC Figure 5-21. A disk sector. 376 INPUT/OUTPUT CHAP. 5 The preamble starts with a certain bit pattern that allows the hardware to rec- ognize the start of the sector. It also contains the...
Purchase answer to see full attachment

Tags: "end" " "do""

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hello check the question in the doc

Chapter 5

2.

Given the speeds listed in Fig. 5-1, is it possible to scan documents from a scanner and transmit them
over an 802.11g network at full speed? Defend your answer.

Yes, it is easy and possible to scan the documents from the scanner and then transmit them
over then network of 802.11g at the full speed as the scanner puts maximum at 400 kb/sec
while the wireless network thus runs at the speed of 6.75 Mb/sec. Thus, it is easy to scan the
documents from the scanner.
4.

Explain the tradeoffs between precise and imprecise interrupts on a superscalar machine.

5.

A DMA controller has five channels. The control...