# Advanced computer Architecture

*label*Other

*timer*Asked: Apr 13th, 2016

**Question description**

**QUESTION 1:**

**Problem 1: **Loop Optimization (3*10=30) For the following code
fragments; rst list all the data dependencies (5) and then rearrange the code
to reduce the dependencies and identify regions of parallelism (5).

for i=1:n for j=1:m

A(i,j)=A(j-1,i+1)+S end

end

for i=1:n for j=1:m

A(i+1,j+1)=A(i,j)+A(i+1,j) end

end

for i=1:n for j=1:m

A(i,j)=A(i-1,j)+A(i+1,j)+A(i,j+1)+A(i,j-1) end

end

**Problem 2:
**Branch Prediction (20)

I am attaching the Diagram below

The Figure represents a 2-bit predictor, that predicts whether a branch will be taken or not depending on the state. A (m; n) predictor is one where we consider the outcome of the last m branches to on an n-bit predictor to predict the next branch. Assume we are using a (1,2) predictor. We have two 2-bit predictors. Predictor 1 is used when the previous branch was executed and Predictor 2 is used when the previous branch is not executed. Based on this; what will be the accuracy of prediction for the following codes. Assume in the rst step we start from state "predict taken". Show all steps (2*10).

a=T;

b=F;

for i=1:5

if(a==T) fa=T;g

fb=!b;g

if(a==b) fa=F; b=T;g

end

a=T;

b=F;

for i=1:5

if(a==T) fa=F;g

if(b==T)fb=F;g if(a==b)

fa=T; b=F;g

end

**Problem 3:** Parallel Algorithm Design (50 points) Design and
implement a parallel algorithm for nding the (i) maximum, (ii) standard
deviation and (iii) mode of a given set of positive numbers. Submit the code
and a table/graph listing the running time changes for 2,4,8,16,32 processors
for set size 5K 10K and 20K. (3*13=39)

Show the steps for applying the parallel algorithms on this sequence: 8,24,4,32,128,64,12, 56, 48, 4 (11)