ECE668 Quiz4(Includes calculation)

docx

School

Florida Polytechnic University *

*We aren’t endorsed by this school

Course

5741

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

22

Uploaded by abonyamin1

Report
@ N TF .l 69%@ E&C-ENG568_E&C- ENG668_144587_FA22- Computer Architectulre Fall 2022 | Moodle home / My courses / Unit3 / Quiz4 Started on State Completed on Time taken Points Grade Question 1 Correct 10.00 points out of 10.00 " Flag question Monday, October 17, 2022, 7:42 PM Finished Monday, October 17, 2022, 8:32 PM 49 mins 52 secs 70.00/100.00 7.00 out of 10.00 (70%) Which of the following statements are false for a 5 stage MIPS pipeline with stages: IF, ID, EX, MEM and WB?
Question 1 Correct 10.00 points out of 10.00 " Flag question Which of the following statements are false for a 5 stage MIPS pipeline with stages: IF, ID, EX, MEM and WB? Select one or more: a. Store operations are only 4 active (doing something useful) during the IF, ID, EX and WB stages b. ALU operations are only active (doing something useful) during the IF, ID, EX and WB stages. c. Load operations are only g active (doing something useful) during the IF, ID, EX and WB stages. d. Branch operations areonly v active (doing something useful) during the IF, ID, EX and WB stages. Your answer is correct. ALU operations do not require the memory access step (MEM) that Loads and Stores do. Load operations are active during all 5 stages. Store operations do not require the write-back step (WB) that Loads and ALU operations do. They do not move data into the register file.
acuve (going sometning useful) during the IF, ID, EX and WB stages b. ALU operations are only active (doing something useful) during the IF, ID, EX and WB stages. c. Load operations are only b active (doing something useful) during the IF, ID, EX and WB stages. d. Branch operations areonly v active (doing something useful) during the IF, ID, EX and WB stages. Your answer is correct. ALU operations do not require the memory access step (MEM) that Loads and Stores do. Load operations are active during all 5 stages. Store operations do not require the write-back step (WB) that Loads and ALU operations do. They do not move data into the register file. 'Storing' of data is done in the MEM step. Stores are also active in the EX stage, as are Loads; the calculation of which memory address to go to is done here. This calculation is needed because of the use of relative addressing. Also, Branch operations do not require the memory access step (MEM) or the write-back step (WB). After the EX stage, the branch condition is known and the target has been calculated, nothing further is required for branch instructions. The correct answers are: Store operations are only active (doing something useful) during the IF, ID, EX and WB stages, Load operations are only active (doing something useful) during the IF, ID, EX and WB stages., Branch operations are only active (doing something useful) during the IF, ID, EX and WB stages.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 2 Correct 10.00 points out of 10.00 ¥ Flag question Suppose a program that executes 500 instructions is run using a 7-stage pipeline. In running the program, 142 stall cycles are inserted due to data dependencies. Assuming that the clock period of unpipelined processor is 10% less than the pipelined processor, calculate the Speedup due to pipelining. Round your answer to two decimal places. Answer: 491 v We first calculate the Pipeline.Stall.CPI = 142/ 500. Next, Speedup = (Pipeline.Depth / (1+Pipeline.Stall.CP1))* (CycleTimeynpipelined / CycleTimepipelined) = (7/ (1 +142/500) * 0.900 The correct answer is: 4.91
Question 3 Incorrect 0.00 points out of 10.00 ¥ Remove flag Suppose you have two programs to execute. Program 1 consists of 28% (Load + Store) instructions, Program 2 consists of only 3% (Load + store) instructions. You have two Machine choices for each program. Machine 1 contains a 2GHz, 20 stage pipeline with a dual ported memory and Machine 2 contains a 2.15GHz, 20 stage pipeline, but has a single ported memory. Assume that the clock rates above are for the pipelined case and that the clock rate for the unpipelined case is 2.8GHz for both machines. Which machine is the best choice for each program?
Assume that the clock rates above are for the pipelined case and that the clock rate for the unpipelined case is 2.8GHz for both machines. Which machine is the best choice for each program? (Hint: Calculate the speedup for each machine with Program 1 and 2 and compare). Select one: a. The best machine for Program 1 is Machine 2 and the best machine for Program 2 is also Machine 2. b. The best machine for Program 1 is Machine 1 and the best machine for Program 2 is Machine 2. c. The best machine for Program 1 is Machine 1 and the best machine for Program 2 is also Machine 1. d. The best machine for Program % 1 is Machine 2 and the best machine for Program 2 is Machine 1. Your answer is incorrect. First calculate the speedup for each machine N L Il S A SN I gn S
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
b. The best machine for Program 1 is Machine 1 and the best machine for Program 2 is Machine 2. c. The best machine for Program 1 is Machine 1 and the best machine for Program 2 is also Machine 1. d. The best machine for Program * 1 is Machine 2 and the best machine for Program 2 is Machine 1. Your answer is incorrect. First calculate the speedup for each machine with Program 1 and then do the same for Program 2, recognizing that the single ported memory will cost 1 stall cycle for each instruction following a load or store. The dual ported memory will not incur this penalty. The machine with the higher speedup for each program is obviously the best choice. Program 1: SpeedupMachine 1 = Pipe Depth / (1+0.28*0) * (Clockpipe / Clockynpipe) = 20 * (2GHz / 2.8GHz ) = 14.28 SpeedupMachine 2 = Pipe Depth / (1+0.28*1) * (Clockpipe / Clockynpipe) = 15.625 * (2.15GHz / 2.8GHz) = 12 Clearly Machine 1 is the best choice for Program 1. Program2: SpeedupMachine 1 = Pipe Depth / (1+0.03*0) * (Clockpipe / Clockynpipe) = 20 * (2GHz / 2.8GHz ) = 14.28 SpeedupMachine 2 = Pipe Depth / (1+0.03*1) * (Clockpjpe / Clockynpipe) = 19.4 * (2.15GHz / 2.8GHz) = 14.91 Itis clear that machine 2 is the best choice for Program 2. The correct answer is: The best machine for Program 1 is Machine 1 and the best machine for Program 2 is Machine 2.
Question 4 Incorrect 0.00 points out of 10.00 " Flag question For the instruction sequence given below, without getting into the actual cycle-wise implementation (or introducing stalls in the pipeline), conceptually identify the appropriate data or structural hazards for a generic pipeline (pick all the potential hazards): Assume that add instructions take 2 cycles and multiply take 4 cycles and there is only one Adder and Multiplier unit and neither one is pipelined. Further, assume that all the registers that do not depend on earlier instructions are readily available except for F5. Assume F5 is available only 5 cycles after the program control reaches instruction 13.
that all the registers that do not depend on earlier instructions are readily available except for F5. Assume F5 is available only 5 cycles after the program control reaches instruction 13. I1: F4 < F3+F2 12: F1 < F4*F2 13: F1 < F2 +F5 14: F2 < F3*F4 I5: F3 < F4 +F1 16: F2 < F3*F4 Select one or more: a. 12— 13 WAW (F1) x b. 11— 12 RAW (F4) c. 15— 16 RAW (F3) d. 12— 14 Multiplier e. 13— 15WAR (F1) x f. 13— 14 WAR (F2) x Your answer is incorrect. F1 is written in 13 and read in 15, this would produce a RAW hazard rather than a WAR hazard. The correct answers are: 11 12 RAW (F4), I5 16 RAW (F3), 12 = 14 Multiplier
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 5 Correct 10.00 points out of 10.00 " Flag question Fill in the pipeline diagram (on a separate sheet of paper) for the following sequence of instructions, assuming data forwarding and that load takes one cycle to execute, multiply takes three cycles to execute, and divide takes 6 cycles to execute. How many cycles will it take to execute this instruction sequence in a simple MIPS-like processor? Assume that there is only one execution unit and no two instructions can be in the Execute stage at the same time (for example, you cannot multiply and divide at the same time). Instruction1234/567891011/12 Ld f0, 0(r1) Ld f1,
Instruction(12345678910111% Ld f0, 0(r1) Ld f1, 0(r2) Mul 3, f1, f2 Ld f4, 0(r3) Ld f5, 0(r4) Div f6, f4, 5 Note: You are required to fill in only the total cycles for the execution of given instruction sequence. Answer: The correct answer is: 19
Question 6 Correct 10.00 points out of 10.00 " Flag question ——— A given pipelined processor executes floating-point instructions of the type Fi< Fj op Fk. It has an | stage (for instruction fetch) and a D stage (for instruction decode and operand preparation) each taking one clock cycle. The execution phase of a Floating-point ADD operation and a Floating- point Multiply operation takes 2 and 4 cycles, respectively. Each execution is followed by a single (clock) cycle write back (W) into the floating- point register file which can support simultaneous reads but only a single write per cycle. The processor executes the following sequence of floating-point instructions: S1:F2 < F1 +F4
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
S1:F2«<F1+F4 S2:F3 < F2*F4 S3:F3 < F4+F5 S4:F4 < F1*F2 S5:F1 < F2+F3 S6: F4 < F1*F2 Assume that the pipelined processor has a single floating-point adder and a single floating-point multiplier. Also assume that the floating-point adder (with execution time of 2 cycles) and the floating-point multiplier (with execution time of 4 cycles) can operate in parallel and are fully pipelined (with a throughput of 1) and any data hazard is detected and prevented by stalling the pipeline as needed. Show the exact timing for the above program (on a separate sheet of paper) for the case of the basic pipeline with data forwarding. How many cycles will the above program take to run once?
A AP B S LI} 1 & Assume that the pipelined processor has a single floating-point adder and a single floating-point multiplier. Also assume that the floating-point adder (with execution time of 2 cycles) and the floating-point multiplier (with execution time of 4 cycles) can operate in parallel and are fully pipelined (with a throughput of 1) and any data hazard is detected and prevented by stalling the pipeline as needed. Show the exact timing for the above program (on a separate sheet of paper) for the case of the basic pipeline with data forwarding. How many cycles will the above program take to run once? Answer: The correct answer is: 14
A processor has separate instruction and data caches, each requiring two cycles for any operation. It includes a single 2-cycles execution unit responsible for executing all ALU and LOAD instructions. As a result, the instruction pipeline has the following eight stages: IF1, IF2, ID, EX1, EX2, MEM1, MEM2, WB. In the absence of hazards the pipeline has a CPI of 1. The processor has no hardware support for dynamic scheduling. The instruction stream executed by the processor consists of 25% Load instructions and 40% ALU instructions. The frequencies of RAW data dependencies between these two instructions and the instructions following them are: Instruction i LOADALU RAW in 30% 20% instruction i+1 RAW in 12% [10% instruction i+2, hiit nAt in
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Instruction i LOADALU RAW in 30% 20% instruction i+1 RAW in 12% 10% instruction i+2, but not in instruction i+1 RAW in 10% 5% instruction i+3, but not in instruction i+1 nor i+2 RAW in 5% 1% instruction i+4, but not in earlier instructions Calculate the contribution to the CPI of the processor due to RAW data hazards assuming that NO DATA FORWARDING IS SUPPORTED. Note that the Register File can execute a write followed by a read in the same cycle. (Round to three decimal places). Answer: 0.937 v
MYy v U () instruction i+4, but not in earlier instructions Calculate the contribution to the CPI of the processor due to RAW data hazards assuming that NO DATA FORWARDING IS SUPPORTED. Note that the Register File can execute a write followed by a read in the same cycle. (Round to three decimal places). InstructionjLoad|Stalls w/o [Stalls with rALU [Stalls v | Fre. [ForwardinglForwarding|Freq.|[Forwar RAW in 30% |4 <] 20% 4 i+1 RAWIn [12% |3 2 10% [3 i+2 ONLY RAWin [10% [2 1 5% [2 i+3 ONLY RAW in 5% 1 0 1% |1 i+4 ONLY Contribution to CPI = 0.25[0.3*4 + 0.12*3 + 0.1*2 + 0.05] + 0.4[0.2*4 + 0.1*3 + 0.05*2 + 0.01] = 0.937 The correct answer is: 0.937
Question 8 Correct 10.00 points out of 10.00 " Flag question A processor has separate instruction and data caches, each requiring two cycles for any operation. It includes a single 2-cycles execution unit responsible for executing all ALU and LOAD instructions. As a result, the instruction pipeline has the following eight stages: IF1, IF2, ID, EX1, EX2, MEM1, MEM2, WB. In the absence of hazards the pipeline has a CPI of 1. The processor has no hardware support for dynamic scheduling. The instruction stream executed by the processor consists of 25% Load instructions and 40% ALU instructions. The frequencies of RAW data dependencies between these two instructions and the instructions following them are:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
the instructions following them are: Instruction i LOADALU RAW in 30% 20% instruction i+1 RAW in 12% [10% instruction i+2, but not in instruction i+1 RAW in 10% 5% instruction i+3, but not in instruction i+1 nor i+2 RAW in 5% 1% instruction i+4, but not in earlier instructions Calculate the contribution to the CPI of the processor due to RAW data hazards assuming that DATA FORWARDING IS SUPPORTED. Note that the Register File can execute a write followed by a read in the same cycle. (Round your answer to three decimal places).
Calculate the contribution to the CPI of the processor due to RAW data hazards assuming that DATA FORWARDING IS SUPPORTED. Note that the Register File can execute a write followed by a read in the same cycle. (Round your answer to three decimal places). Answer: 0.390 v Instruction|LoadStalls w/o [Stalls with JALU [Stalls v | Fre. [Forwarding|Forwarding|Freq.[Forwar RAWin [30% @4 3 20% 4 i+1 RAWIin [12% B 2 10% B i+2 ONLY RAWIin [10% [2 1 5% [2 i+3 ONLY RAWin (5% [1 0 1% 1 i+4 ONLY Contribution to CPI = 0.25[0.3*3 + 0.12*2 + 0.1] +0.4[0.2%1] = 0.390 The correct answer is: 0.39
Question 9 Incorrect 0.00 points out of 10.00 " Flag question A program running on a new processor spends 40% of its execution time executing divide instructions. The new processor executes divide instructions 10 times faster than the older processor. The rest of the program runs the same on both processors. What is the overall speed up of running this program on the new processor compared to the old processor?(Round your answer to 2 decimal places). Answer: 1.56 x x = New execution time 06x + 10 * 0.4x = 4.6x =0Id execution time Speed up = 4.6x/x = 4.6 The correct answer is: 4.6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 10 Correct 10.00 points out of 10.00 ¥ Flag question —J A particular program consists of 21% ALU instructions (taking 4 cycle each), 20% Load instructions (taking 10 cycles), 13% Store instructions (8 cycles) and the rest Branch instructions (2 cycles). Also assume that while running, the program executes 24% ALU instructions, 23% Load instructions, 15% Store instructions and the rest Branch instructions. What percentage of the time CPU is executing Branch instructions? (Round your answer to 2 decimal places). Answer: 14.56 v The correct answer is: 14.56
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help