Username: Reginald HobbsBook: Computer Systems: An Integrated Approach to Architecture and Operating Systems. No part of any
book may be reproduced or transmitted in any form by any means without the publisher's prior written permission. Use (other than
pursuant to the qualified fair use privilege) in violation of the law or these Terms of Service is prohibited. Violators will be prosecuted to
the full extent of the law.
226
Chapter 5 • Processor Performance and Pipelined Processor Design
the 1BM 360 series, and the IBM 3 70 series of high-end computer systems. Such systems, dubbed mainframes perhaps because of their being packaged in a large metal
chassis, were targeted primarily toward business applications. Gene Amdahl, who was
the chief architect of the IBM 360 series and whose name is immortalized in Amdahl'.s
law, discovered the principles of pipelining in the WISC computer, which are documented in his Ph.D. dissertation at the University of Wisconsin-Madison in 1952. 17
Seymour Cray, a pioneer in high-performan ce computers, founded Cray Research,
which ushered in the age of vector supercomputers with the Cray series, starting with the
Cray-1. Prior to founding Cray Research, Seymour Cray was the chief architect at
Control Data Corporation, which was the leader in high-performance computing in the
1960s, with offerings such as the CDC 6600 (widely considered the first commercial
supercomputer) and, shortly thereafter, the CDC 7600.
Simultaneous with the development of high-end computers, there was much interest in the development of minicomputers, DEC'.s PDP series leading the way ~th the
PDP-8, followed by the PDP-11, and, later on, the VAX series. Such computers initially
were targeted toward the scien tific and engineering communities. Low cost was the primary focus, as opposed to high performance; hence, these processors were designed
without employing pipelining techniques.
As we observed in Chapter 3, the advent of the "killer micros" in the 1980s, coupled with pioneering research in compiler technology and RISC architectures, paved
the way for instruction pipelines to become the implementation norm for processor design, except for very low-end embedded processors. Today, even game consoles in the
hands of toddlers use processors that employ principles of pipelining.
Just as a point of clarification of terminology, supercomputers were targeted toward
solving challenging computational problems that arise in science and engineering. These
grand challenge problems inspired the development of a research program by DARPA18 to
stimulate breakthrough computing technologies. Meanwhile, mainframes were targeted
toward technical applications, for business, finance, and other entities. These days, it is
normal to call such high-end computers servers. Also known as clusters, servers comprise a collection of computers interconnected by a high~speed network. Servers cater to
both scientific applications (e.g., IBMs BlueGene massively parallel architecture) and
technical applications (e.g., the IBM z series). The processor building blocks used in
such servers are quite similar and adhere to the principles of pipelining that we discussed
in this chapter.
Exercises
1. Answer True or False, with justification: For a given workload and a given ISA, reducing the CPI (clocks per instruction) of all the instructions will always improve
the performance of the processor.
17. .Source: bttpJ/en.wikipedia.orglwiki/Wisconsin_Integrally_Synchronized_Computer.
18. DARPA stands for a federal government entity, the Defense Advanced Rcsean;h ProjeGts Agency.
Username: Reginald HobbsBook: Computer Systems: An Integrated Approach to Architecture and Operating Systems. No part of any
book may be reproduced or transmitted in any form by any means without the publisher's prior written permission. Use (other than
pursuant to the qualified fair use privilege) in violation of the law or these Terms of Service is prohibited. Violators will be prosecuted to
the full extent of the law.
Exercises
,r
2. An architecture has three types of instructions that have the following CPI:
Type
CPI
A
2
B
5
c
3
An architect determines that he can reduce the CPI for B by some clever architectural trick, with no change to the CPis of the other two instruction types. However,
she determines that this change will increase Lhe clock cycle time by 15%. What is
the maximum permissible CPI ofB (rounded up to the nearest integer) that will
make this change wonhwhile? Assume that all the workloads which execute on
this processor use 40% of A, 10% of B, and 50% of C types of instructions.
3. What would be the execution time for a program containing 2.,000,000 instructions if the processor clock were running a l 8 MHz and each instruction took four
clock cycles?
4. Asman architect reimplements a given instruction-set architecture, halving the
CPI for 50% of the instructions, while increasmg the clock cycle time of the
processor by 10%. How much faster is the new implementation, compared with
the original? Assume that all instructions are equally likely to be used in
determining the execution time of any program.
5. A cenain change is being considered in the nonpipelined (multicycle) MIPS CPU
regarding the implementation of the ALU instructions. This change will enable
you to perform an arithmetic operation and write the result into the register file,
all in one dock cycle. However, doing so will increase the clock cycle time of the
CPU. Specifically, the original CPU operates on a 500 MHz clock, but the new
design will execute only on a 400 MHz clock.
\Vill this change improve or degrade performance? How many times faster (or
slower) will the new design be, compared with the original design? Assume that
instructions are executed with the following frequency:
Instruction
Frequency
LW
25%
SW
15%
ALU
45%
BEQ
10%
JMP
5%
227
Usemame: Reginald HobbsBook : Computer Systems: An Integrated Approach to Architecture and Operating Systems. No part of any
book may be reproduced or transmitted in any form by any means without the publisher's prior written permission. Use (other than
pursuant to the qualified fair use privilege) in violation of the law or these Terms of Service is prohibited. Violators will be prosecuted to
the full extent of the law.
228
Chapter 5 • Processor Performance and Pipelined Processor Design
The CPI of the instructions in the crigif1flltl~gn· ts' i.£ ftJHows:
Instruction
CPI
LW
5
SW
4
ALU
4
SEQ
3
JMP
3
6. Here are the CPls of various instruction classes:
Class
CPI
R-type
2
I-type
10
J-type
3
S-type
4
And the instruction frequency for two different implementation .of the same
program is as follows:
Class
Implementation 1
Implementation 2
R
3
10
3
1
J
5
2
s
2
3
Which implementation will execute faster and why?
@ what is the difference between static and dynamic instruction frequency?
8. Given these instructions and their CPI, answer the question that follows:
Instruction
CPI
ADD
2
SHIFT
3
Others
2 (average for all instructions including ADD and SHIFT)
ADD/SHIFT
3
Username: Reginald HobbsBook: Computer Systems: An Integrated Approach to Architecture and Operating Systems. No part of any
book may be reproduced or transmitted in any form by any means without the publisher's prior written permission. Use (other than
pursuant to the qualified fair use privilege) in violation of the law or these Terms of Service is prohibited. Violators will be prosecuted to
the full extent of the law.
Exercises
If the sequence ADD followed by Shilt appears in 20% of the dynamic frequency
F
,.-...,
1
'
of a program, what is the percentage improvement in the execution time of the
program with all {ADD, SHIFT} replaced by the new instruction?
Compare and contrast structural, data, and control h azards. How are the potential
negative effects on pipeline performance mitigated?
·
10.: How can a read-after-write (RAW) hazard be minimized or eliminated?
11. What is a branch target buffer and how is it used?
12. Why is a second ALU needed in the execute stage of the pipeline for a five-stage
pipeline implementation of the LC-2200?
13. In a processor with a five-stage pipeline, as shown in the picture that follows
(with buffers between the stages), explain the problem posed by a branch
instruction. Present a solution.
p
I
R
E
p
G
E
L
I
N
I
E
s
T
E
R
p
I
p
E
L
I
N
E
E
p
I
p
E
L
I
N
A
E
R
E
G
I
s
T
R
E
G
I
s
T
E
R
I I
MEM
p
I
R
E
p
G
I
E
L
I
N
E
s
T
E
R
(i4j Regard.less of whether we use a conservative approach or a branch prediction
\...../ ("branch not taken"), explain why there is always a two-cycle delay if the branch
is taken (i.e., two NOPs injected into the pipeline) b efore normal execution can
resume in the five-stage pipeline presented in Section 5.13.3.
15. With reference to Figure 5.6(a), identify and explain the role of the datapath elements that deal with the BEQ instruction. Explam in detail what exactly happens,
cycle by cycle, with respect to this datapath during the passage of a BEQ insouction. Assume a conservative approach to handling the control hazard. Your answer
should include both cases: branch taken and branch not taken.
16. Asman engineer decides to reduce the two-cycle "branch taken" penalty in the
five-stage pipeline down to one. Her idea is to directly use the branch target
address computed in the EX cycle to fetch the instruction. (Note that the approach presented in Section 5.13.3 requires the target address to be saved in
PC first.)
a. Show the modification to the datapath in Figure 5.6(a) to implement this
idea. [Hint: You have to simultaneously feed the target address to the PC and
the instruction memory if the branch is taken.I
b. Although this reduces the bubbles in the pipeline to one for branch taken,
it may not be a good idea. Why? [Hint: Consider cycle-time effects.]
229
Exercises
started incorporating ideas from the UNIX operating system, Apples Mac OS X, also
intended for PCs (Apple's Macintosh line), was a direct descendent of the UNIX operating system.
When Apple adopted the GUI (graphical user interface) in the Mac computers,
Microsoft followed suit, offering Windows on top of MS-DOS as a way for users to interact with the PC. Initial offerings of Windows were wrappers on top of MS-DOS, until
1995, when Microsoft introduced Windows 95, which integrated the Windows GUI
with the operating system. We have gone through several iterations of Windows operating systems since Windows 95, including Windows NT (NT stands for new technology),
Windows NT 4.0, Windows 2000, Windows XP, Windows Vista, and Windows Version
7 (as of 2009). However, the basic concepts at the level of the core abstractions in the
operating system have not changed much in these iterations. If anything, these concepts
have matured to the point where they are indistinguishable from those found in the
UNIX operating system. Similarly, UNIX-based systems have also absorbed and incorporated the GUI ideas that had their roots in the PC world.
Exercises
I. Compare and contrast process and program.
2. What items are considered to comprise the state of a process?
3. Which metric is the most user-centric in a timesharing environment?
4. Consider a preemptive priority processor scheduler. There are three processes, PI,
P2, and P3, in the job mix, that have the following characteristics:
Process
Arrival
Time
PI
0 sec
Priority
I
Activity
8 sec CPU burst followed by
4 sec 1/0 burst followed by
6 sec CPU burst and quit
P2
2 sec
3
64 sec CPU burst and quit
P3
4sec
2
2 sec CPU burst followed by
2 sec 1/0 burst followed by
2 sec CPU burst followed by
2 sec 1/0 burst followed by
2 sec CPU burst followed by
2 sec 1/0 burst followed by
2 sec CPU burst and quit
What is the turnaround time for each of P 1, P2, and P3?
What is the average waiting time for this job mix?
5. What are the deficiencies of FCFS CPU scheduling?
275
Purchase answer to see full
attachment