hw4_cse490-590-su2021_sol

pdf

School

SUNY Buffalo State College *

*We aren’t endorsed by this school

Course

590LR

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

6

Uploaded by SuperHumanWombatMaster946

Report
CSE 490/590 Summer 2021 Homework 4 1. For the code sequence shown below. loop: l.d $f12, 0($f5) add.d $f6, $f6, $f12 daddui $f5, $f5, -8 bne $f5, $f9, loop // $f9 holds the address of the last value to be operated on a) Show loop unrolling so that there are four copies of the loop body Assume $f5, $f9 (that is, the size of the array) are initially a multiple of 32, which means that the number of loop iterations is a multiple of 4. Eliminate any obviously redundant computations and do not reuse any of the registers. l.d $f12, 0($f5) add.d $f7, $f7, $f12 l.d $f13, -8($f5) add.d $f8, $f8, $f13 l.d $f14, -16($f5) add.d $f10, $f10, $f14 l.d $f15, -24($f5) add.d $f11, $f11, $f15 daddui $f5, $f5, -32 bne $f5, $f9, loop add.d $f16, $f7, $f8 add.d $f17, $f10, $f11 add.d $f18, $f16, $f17 or l.d $f12, 0($f5) add.d $f6, $f6, $f12 l.d $f13, -8($f5) add.d $f6, $f6, $f13 l.d $f14, -16($f5) add.d $f6, $f6, $f14 l.d $f15, -24($f5) add.d $f6, $f6, $f15 daddui $f5, $f5, -32 bne $f5, $f9, loop b) Computer the number of cycles needed for 4 iterations 1. l.d $f12, 0($f5) 2. stall 3. add.d $f7, $f7, $f12
CSE 490/590 Summer 2021 Homework 4 4. l.d $f13, -8($f5) 5. stall 6. add.d $f8, $f8, $f13 7. l.d $f14, -16($f5) 8. stall 9. add.d $f10, $f10, $f14 10. l.d $f15, -24($f5) 11. stall 12. add.d $f11, $f11, $f15 13. daddui $f5, $f5, -32 14. stall 15. bne $f5, $f9, loop 16. add.d $f16, $f7, $f8 17. add.d $f17, $f10, $f11 18. stall 19. stall 20. stall 21. add.d $f18, $f16, $f17 or 1. l.d $f12, 0($f5) 2. stall 3. add.d $f6, $f6, $f12 4. stall 5. l.d $f13, -8($f5) 6. stall 7. add.d $f6, $f6, $f13 8. stall 9. l.d $f14, -16($f5) 10. stall 11. add.d $f6, $f6, $f14 12. stall 13. l.d $f15, -24($f5) 14. stall 15. add.d $f6, $f6, $f15 16. daddui $f5, $f5, -32 17. stall 18. bne $f5, $f9, loop 2. For the code sequence shown below L.D F0,0(R1) ADD.D F4,F0,F2 S.D F4,0(R1)
CSE 490/590 Summer 2021 Homework 4 L.D F0,-8(R1) ADD.D F4,F0,F2 S.D F4,-8(R1) Rename the registers as needed and schedule the sequence to minimize the stalls L.D F0,0(R1) stall ADD.D F4,F0,F2 Stall stall S.D F4,0(R1) L.D F5,-8(R1) stall ADD.D F6,F5,F2 Stall stall S.D F6,-8(R1) Schedulling L.D F0,0(R1) L.D F5,-8(R1) ADD.D F4,F0,F2 ADD.D F6,F5,F2 stall S.D F4,0(R1) S.D F6,-8(R1) 3. For the given code sequence below executed on a 2-issue processor. I1: LW r2, 0(r1) I2: LW r3, 4(r1) I3: LW r4, 8(r1) I4: LW r4, 12(r1) I5: ADD r6, r4, r5 I6: ADD r7, r2, r3 I7: ADD r8, r7, r6 I8: LW r9, 4(r8) a) Draw a pipeline diagram [Consider Data Forwarding] You can also follow datapath design in lecture b) Calculate IPC
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CSE 490/590 Summer 2021 Homework 4 IPC = 8/10 4. Consider the following code sequence. I1: lw $s4, 0($s1) I2: or $s2, $s4, $s1 I3: and $s6, $s5, $s3 Highlight the Hazard and discuss how out of order processor will help when lw $s4, 0($s1) encounters a cache miss? I1: lw $s4, 0($s1) I2: or $s2, $s4, $s1 I3: and $s6, $s5, $s3 I3 can execute and wait for write back stage until the data is loaded in $s4 in I1 and eventually forwarded to I2 5. Show loop unrolling so that there are four copies of the loop body for the .MIPS code Loop: L.D F0,0(R1) ADD.D F4,F0,F2 S.D F4,0(R1) DADDUI R1,R1,#-8 BNE R1,R2,Loop Assuming R1 R2 (that is, the size of the array) is initially a multiple of 32, which means that the number of loop iterations is a multiple of 4. Eliminate any obviously redundant computations and do not reuse any of the registers. Here is the result after merging the DADDUI instructions and dropping the unnecessary BNE operations that are duplicated during unrolling. Note that R2 must now be set so that 32(R2) is the starting address of the last four elements. Loop: L.D F0,0(R1) ADD.D F4,F0,F2 S.D F4,0(R1) // drop DADDUI & BNE L.D F6,-8(R1) ADD.D F8,F6,F2 S.D F8,-8(R1) //drop DADDUI & BNE L.D F10,-16(R1) ADD.D F12,F10,F2 S.D F12,-16(R1) //drop DADDUI & BNE L.D F14,-24(R1) ADD.D F16,F14,F2 S.D F16,-24(R1) DADDUI R1,R1,#-32 BNE R1,R2,Loop Observations We have eliminated three branches and three decrements of R1. The addresses on the loads and stores have been compensated to allow the DADDUI instructions on R1 to be merged. This optimization may seem trivial, but it is not; it requires symbolic substitution and simplification.
CSE 490/590 Summer 2021 Homework 4 Symbolic substitution and simplification will rearrange expressions so as to allow constants to be collapsed, allowing an expression such as ((i + 1) + 1) to be rewritten as (i + (1 + 1)) and then simplified to (i + 2). Without scheduling, every operation in the unrolled loop is followed by a dependent operation and thus will cause a stall. This loop will run in 27 clock cycles (13 stalls and 14 instructions) after each LD 1 stall, after each ADDD 2, after the DADDUI 1 stall, plus 14 instruction issue cycles 6. Show the unrolled loop in the question 5 after it has been scheduled for the pipeline with the latencies from the below figure. Loop: L.D F0,0(R1) L.D F6,-8(R1) L.D F10,-16(R1) L.D F14,-24(R1) ADD.D F4,F0,F2 ADD.D F8,F6,F2 ADD.D F12,F10,F2 ADD.D F16,F14,F2 S.D F4,0(R1) S.D F8,-8(R1) DADDUI R1,R1,#-32 S.D F12,16(R1) S.D F16,8(R1) BNE R1,R2,Loop Observation The execution time of the unrolled loop has dropped to a total of 14 clock cycles(no stalls), Every 4 iteration 14 cycle => Each iteration takes 3.5 clock cycles 7. According to in order execution and out of order execution (IF ID E W) in the following instructions, when should each instruction finish? For example, if I1 W at t4, your answer should be I1 = t4. Assumptions: i. We only consider IF, ID, E, W, four states. Mem stage is not here. ii. We only have One adder and One multiplier. iii. Number of cycles: ADD: 1 cyc MUL: 4 cyc iv. DATA FORWARDING is considered INSTRUCTIONS: I 1 : MUL R3 R1, R2 I 2 : ADD R5 R3, R4 I 3 : ADD R7 R2, R6 I 4 : ADD R10 R8, R9 I 5 : MUL R11 R7, R10 I 6 : ADD R5 R5, R11 INSTRUCTIONS:
CSE 490/590 Summer 2021 Homework 4 I 1 : MUL R3 R1, R2 I 2 : ADD R5 R3 , R4 I 3 : ADD R7 R2, R6 I 4 : ADD R10 R8, R9 I 5 : MUL R11 R7, R10 I 6 : ADD R5 R5, R11 8. In the following instruction sequence, find the hazards. Rename the registers to eliminate the anti and output dependences div.s r1,r2,r3 mult.s r4,r1, r5 add.s r1 ,r3, r6 sub.s r3,r1, r4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help