CS152 Computer Architecture and Engineering

Homework #6

Spring 2002, Prof Bob Brodersen


Please include the TIME or TA NAME of the DISCUSSION section that you attend as well as your NAME and STUDENT ID. Homeworks and labs will be handed back in discussion sections.

 


Homework #6: Due Tuesday April 16, 2002 in class

1) Do problems 6.30 and 6.31 from P&H

2) Assume we have a processor with the following pipeline latencies:                          

Instruction producing result:   Instruction using result Latency (clock cycles)
Floating point ALU op. Floating point ALU op. 3
Floating point ALU op. Store double 2
Load double Floating point ALU op. 1
Load double Store double 0

                Now assume we have the following code: 

                daxpy:    
   
                                ld    $f2, 0($r1)
                multd $f4, $f2, $f0
                ld    $f6, 0($r2)     
                addd  $f6, $f4,$f6
                sd    0($r2),  $f6
                addiu $r1, $r1, 8
                addiu $r2, $r2, 8
                addiu $r3, $r3, -1
                beq   $r3, $0, daxpy

(a)     Assume one branch delay slot.  How many stalls per iteration do we have?  How many cycles per iteration?

 

(b)     Rewrite the code by rearranging the instructions in this loop to minimize stalls.  How many stalls do we have per iteration now?  How many cycles per iteration?

 

(c)     Now unroll the loop as many times as necessary to schedule it without stalls.  How many times must you unroll the loop?  How many cycles per iteration?

 

(d)     Now software pipeline the loop.  Omit startup and ending code.  Suppose the latency of our instructions goes up now.  What is the maximum latency we can have between two floating point ALU operations without stalling the software pipelined version of the code?

 

3) Suppose we are running the same daxpy loop as in problem 2, but this time we are running it on a processor using Tomasulo’s algorithm.  Let us assume the following execution times:

                Functional Unit type:                Cycles:                  # of FUs                 # of reservation stations

                Integer                                        1                              1                              5
                FP adder                                     4                             1                              3
                FP multiplier                               15                            1                              2

Complete the following table for the first three iterations of the loop.  The first two instructions have been completed for you.

                Instruction:                          Issue:                     Execute:                 Memory written:                CDB:
      Ld $f0,0($r1)      1           2           3              4
      addd $f4,$f0,$f2   1           5                          8