1 HW Solutions

pdf

School

Rowan University *

*We aren’t endorsed by this school

Course

321

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

5

Uploaded by haejookim

Report
1 EGR 321 --- Computer Organization Homework 1 Solutions California Baptist University Please typeset your answers to the following homework problems. All problems are from the ARM Edition of the text. 1. Exercise 1.1 in text (page 54). Laptop computers, tablet computers, supercomputers, embedded microcontrollers 2. Exercise 1.3 in text (page 55). Step 1. The compiler translates a program written in a high-level programming language into instructions in assembly language. Step 2. The assembler then translates the symbolic version of the instructions in assembly language into the binary machine language program. 3. Exercise 1.5 in text (page 55). (a) P2 has the highest performance. The following is the reason. Assuming that we use instructions per second as an indication of performance, we have performance P1 = 3 × 10 9 /1.5 = 2 × 10 9 (instructions/sec) performance P2 = 2.5 × 10 9 /1.0 = 2.5 × 10 9 (instructions/sec) performance P3 = 4 × 10 9 /2.2 = 1.818 × 10 9 (instructions/sec) Note: We may also first calculate the CPU time and then note performance P1 / performance P2 = CPU time P2 / CPU time P1 By using the formula CPU Time = No. of instr. x CPI x Clock cycle time = No. of instr. x CPI / Clock rate We may calculate the CPU time for each processor (assuming No. of instr. as a given constant I). (b) Note that CPU time = No. of cycles x Clock cycle time = No. of cycles / Clock rate and hence No. of cycles = CPU time x Clock rate No. of cycles P1 = 10 × 3 × 10 9 = 30 × 10 9 No. of cycles P2 = 10 × 2.5 × 10 9 = 25 × 10 9 No. of cycles P3 = 10 × 4 × 10 9 = 40 × 10 9 Next note that CPU Time = No. of Instr. x CPI x Clock cycle time = No. of instr. x CPI / Clock rate and hence No. of instr. = No. of cycles/CPI No. of instr. P1 = 30 × 10 9 /1.5 = 20 × 10 9 No. of instr. P2 = 25 × 10 9 /1.0 = 25 × 10 9 No. of instr. P3 = 40 × 10 9 /2.2 = 18.182 × 10 9 (c) CPU time new = CPU time old × 0.7 = 7 s CPI new = CPI old × 1.2, then the new CPIs are CPI P1 = 1.5 x 1.2 = 1.8, CPI P2 = 1.0 x 1.2 = 1.2, CPI P3 = 2.2 x 1.2 = 2.64
2 Now using the formula Clock rate = No. of instr. x CPI / CPU time we have Clock rate P1 = 20 × 10 9 × 1.8/7 = 5.143 GHz Clock rate P2 = 25 × 10 9 × 1.2/7 = 4.286 GHz Clock rate P3 = 18.182 × 10 9 × 2.64/7 = 6.857 GHz 4. Exercise 1.6 in text (page 55). P2 is faster. The following is the reason. Class A: 10 5 instr. Class B: 2 × 10 5 instr. Class C: 5 × 10 5 instr. Class D: 2 × 10 5 instr. CPU time = No. of instr. × CPI / Clock rate For P1: CPU time for class A = 10 5 x 1 / 2.5 x 10 9 = 0.4 × 10 −4 s CPU time for class B = 2 x 10 5 x 2 / 2.5 x 10 9 = 1.6 × 10 −4 s CPU time for class C = 5 x 10 5 x 3 / 2.5 x 10 9 = 6 × 10 −4 s CPU time for class D = 2 x 10 5 x 3 / 2.5 x 10 9 = 2.4 × 10 −4 s Total CPU time for P1 = 10.4 × 10 −4 s For P2: CPU time for class A = 10 5 x 2 / 3 x 10 9 = 0.667 × 10 −4 s CPU time for class B = 2 x 10 5 x 2 / 3 x 10 9 = 1.333 × 10 −4 s CPU time for class C = 5 x 10 5 x 2 / 3 x 10 9 = 3.333 × 10 −4 s CPU time for class D = 2 x 10 5 x 2 / 3 x 10 9 = 1.333 × 10 −4 s Total CPU time for P2 = 6.667 × 10 4 s (a) CPI = CPU time × Clock rate/No. of instr. CPI P1 = 10.4 × 10 −4 × 2.5 × 10 9 /10 6 = 2.6 CPI P2 = 6.667 × 10 −4 × 3 × 10 9 /10 6 = 2.0 (b) No. of clock cycles P1 = 10 5 × 1 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3 = 26 × 10 5 No. of clock cycles P2 = 10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2 = 20 × 10 5 5. Exercise 1.7 in text (page 56) (for part (b), you may assume the average CPIs found in part (a)). CPU time = No. of instr. x CPI x Clock cycle time = No. of instr. x CPI / Clock rate (a) Compiler A CPI Compiler B CPI With compiler A CPI = CPU time / (No. of instr. x Clock cycle time) = 1.1 / (1 x 10 9 x 10 -9 ) = 1.1 With compiler B CPI = CPU time / (No. of instr. x Clock cycle time) = 1.5 / (1.20 x 10 9 x 10 -9 ) = 1.25 (b) Assuming that the execution time (i.e., CPU time) are both equal to T seconds. Clock rate A = No. of instr. A x CPI A / CPU time = 1 x 10 9 x 1.1 / T = 1.1 x 10 9 / T (cycles/s) Clock rate B = No. of instr. B x CPI B / CPU time = 1.20 x 10 9 x 1.25 / T = 1.5 x 10 9 / T (cycles/s) Clock rate A / Clock rate B = (1.1 x 10 9 / T) / (1.5 x 10 9 / T) = 0.73
3 So the clock rate of the processor running compiler A’s code is 0.73 times the clock rate of the processor r unning compiler B’s code. (c) er A speed-up Compiler B speed-up Compare the new compiler with compiler A, we have CPU time A / CPU time new = (No. of instr. A x CPI A ) / (No. of instr. new x CPI new ) = (1.00 x 10 9 x 1.1) / (600 x 10 6 x 1.1) = 1.667 The speed-up is 1.667 times versus compiler A. Compare the new compiler with compiler B, we have CPU time B / CPU time new = (No. of instr. B x CPI B ) / (No. of instr. new x CPI new ) = (1.20 x 10 9 x 1.25) / (600 x 10 6 x 1.1) = 2.273 The speed-up is 2.273 times versus compiler B. 6. Exercise 1.11 in text (page 57) (for this problem, you only need to work on subproblems 1.11.1 to 1.11.8). 1.11.1 Compiler A CPI Compiler B CPI CPI = CPU time / (No. of instr. x Clock cycle time) = 750 / (2.389 x 10 12 x 0.333 x 10 -9 ) = 0.94 1.11.2 SPECratio = 9650 / 750 = 12.87 1.11.3 CPU time = No. of instr. x CPI x Clock cycle time = No. of instr. x CPI / Clock rate If the number of instructions of the benchmark is increased by 10% without affecting the CPI, the increase in CPU time would be 10% (since the CPU cycle time is not affected), i.e., the CPU time will increase by 75 s. 1.11.4 CPU time = No. of instr. x CPI x Clock cycle time = No. of instr. x CPI / Clock rate If the number of instructions of the benchmark is increased by 10% and the CPI is increased by 5%, the CPU time will be equal to 1.1 x 1.05 of the original CPU time, i.e., the CPU time = 1.1 x 1.05 x 750 =866.25 s. 1.11.5 For this change, the SPECration = 9650 / 866.25 =11.14, which is a 13.44% (= (12.87- 11.14) / 12.87 x 100%) decrease. 1.11.6 CPI = CPU time / (No. of instr. x Clock cycle time) = 700 / (0.85 x 2.389 x 10 12 x 0.25 x 10 -9 ) = 1.38 1.11.7 The increase in CPI is (1.38-0.94) / 0.94 x 100% = 47% , which is greater than the 33% increase in the clock rate. CPI may be affected by a number of factors, e.g., instruction mix, and the clock rate (CPI may vary at a different ratio with respect to clock rate variation ratio).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 1.11.8 The CPU time has been reduced by (750 700) / 750 x 100% = 6.67%. 7. Exercise 1.12 in text (page 58). 1.12.1 Consider the CPU time as follows: For P1: CPU time = No. of instr. x CPI / Clock rate = 5.0 x 10 9 x 0.9 / (4 x 10 9 ) =1.125 s For P2: CPU time = No. of instr. x CPI / Clock rate = 1.0 x 10 9 x 0.75 / (3 x 10 9 ) = 0.25 s Though P1 has a larger clock rate, the CPU time for P2 is smaller and hence P2 has higher performance. 1.12.2 For P1, the CPU time for executing 1.0 x 10 9 instructions is CPU time = No. of instr. x CPI / Clock rate = 1.0 x 10 9 x 0.9 / (4 x 10 9 ) =0.225 s The number of instructions P2 can execute in 0.225 s is No. of instr. = CPU time x Clock rate / CPI = 0.225 x 3 x 10 9 / 0.75 = 9 x 10 8 1.12.3 MIPS = Clock rate / CPI For P1: MIPS = 4.0 x 10 9 / 0.9 = 4.44 x 10 9 = 4.44 x 10 3 millions of instructions per second For P2: MIPS = 3.0 x 10 9 / 0.75 = 4 x 10 9 = 4 x 10 3 millions of instructions per second Though P1 has a higher MIPS, the CPU time for P2 is smaller and hence P2 has higher performance (see 1.12.1). 1.12.4 MFLOPS = No. FP operations / (execution time x 10 6 ) For P1: MFLOPS = 0.4 x 5.0 x 10 9 / (1.125 x 10 6 )= 1.78 x 10 3 millions of floating-point operations per second For P2: MFLOPS = 0.4 x 1.0 x 10 9 / (0.25 x 10 6 )= 1.6 x 10 3 millions of floating-point operations per second Though P1 has a higher MFLOPS, the CPU time for P2 is smaller and hence P2 has higher performance (see 1.12.1). 8. Exercise 1.13 in text (pages 58-59). ( Correction: It should be: Consider a computer running a program that requires 250 s, with 70 s spent executing FP instructions, 85 s executing INT instructions, 55 s executing L/S instructions, and 40 s executing Branch instructions.) 1.13.1 T fp = 70 × 0.8 = 56 s, total time T = 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% 1.13.2
5 Total time T = 250 × 0.8 = 200 s, T fp + T l/s + T branch = 165 s, T int = 200-165 = 35 s. Reduction time INT: (85-35)/85*100% = 58.82% 1.13.3 Total time T = 250 × 0.8 = 200 s, T fp + T int + T l/s = 210 s > 200 s. No.