Consider the following code segment: li $s0,0 li $s1,0 li $t0, 16 loop: lw $s2, vals($s0) add $s1, $s1, $s2 addi $s0, $s0, 4 blt $s0, $t0, loop Assume this code is to be executed on a pipelined CPU that supports data forwarding and has a fixed latency for all branche. Determine how many cycles are required to execute the scheduled code(frompart a). Assume each instruction requires one clock cycle except for branches (which require three). This code contains aloop with a fixed iteration count of four.Rewrite the code and “unroll” the loop. In other words, remove the branch instruction and replicate the body of the loop four times. To maximize performance, you may, if you choose, adjust the way the array is indexed to eliminate the need for counter in $s0. After unrolling the loop, determine the new cycle count and calculate the speedup over the original loop from part b.
Consider the following code segment:
li $s0,0
li $s1,0
li $t0, 16
loop: lw $s2, vals($s0)
add $s1, $s1, $s2
addi $s0, $s0, 4
blt $s0, $t0, loop
Assume this code is to be executed on a pipelined CPU that supports data forwarding and has a fixed latency for all branche.
-
Determine how many cycles are required to execute the scheduled code(frompart a). Assume each instruction requires one clock cycle except for branches (which require three).
-
This code contains aloop with a fixed iteration count of four.Rewrite the code and “unroll” the loop. In other words, remove the branch instruction and replicate the body of the loop four times. To maximize performance, you may, if you choose, adjust the way the array is indexed to eliminate the need for counter in $s0.
-
After unrolling the loop, determine the new cycle count and calculate the speedup over the original loop from part b.
Trending now
This is a popular solution!
Step by step
Solved in 2 steps