Saumil_HW05_CS514-1

pdf

School

Stevens Institute Of Technology *

*We aren’t endorsed by this school

Course

514

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

7

Uploaded by MajorCoyoteMaster800

Report
Saumil Trivedi CS514(HW05) CWID: 20011349 1. (16 Points) Exercise B.1 Parts a and b (cache hit, 1 cycle; cache miss, 110 cycles; main memory access with cache disabled, 105 cycles) a. [10] <B.1> When you run a program with an overall miss rate of 3%, what will the average memory access time (in CPU cycles) be? Answer: Average memory-access time = Hit time + Miss rate *Miss penalty = 1+ 0.03*110 = 4.3 b. [10] <B.1> Next, you run a program specifically designed to produce completely random data addresses with no locality. Toward that end, you use an array of size 1 GB (all of which fits in the main memory). Accesses to random elements of this array are continuously made (using a uniform random number generator to generate the elements indices). If your data cache size is 64 KB, what will the average memory access time be? Answer: Average memory-access time = Probability of hit * Hit time + Miss rate * Miss penalty So now Prob. Of heat = 64KB/1 GB =0.00006104 Now, 0.00006104*1 + 0.99993896 * 110 = 109.99334664 cycles
2. (28 Points) Exercise B.5 Appendix B. Assume L1 cache Write-through with no write- allocate and L2 cache Write-back with write-allocate. Hint: A useful tool for solving this type of problem is to extract all of the available information from the problem description. It is possible that not all of the information will be necessary to solve the problem, but having it in summary form makes it easier to think about. Answer: CPU: The CPU runs at 1.1 GHz, resulting in a cycle time of approximately 0.909 ns. The CPU's CPI, excluding memory accesses, stands at 1.35. Instruction Mix: The system's instructions consist of 20% loads, 10% stores, and 70% non- memory access instructions. Caches: The system employs a split L1 cache without bit penalties. L1 Instruction Cache: With a 2% miss rate, it operates using 32-byte blocks and imposes a miss penalty of 15 ns plus 2 cycles when a miss occurs. L2 Data Cache: Featuring a 5% miss rate, it uses 16-byte blocks and follows a write- through policy. The miss penalty for L2 cache misses is 15 ns plus 1 cycle. L1/L2 Bus: The connection between L1 and L2 caches is on a 128-bit bus running at 266 MHz. L2 Unified Cache: The L2 cache has a size of 512KB, operates in write-back mode, maintains an 80% hit rate, and records that 50% of replaced cache blocks are dirty. It uses 64-byte blocks, and the miss penalty for L2 cache misses is 67.52 ns. Memory: The memory system is 128 bits wide (16 bytes) with a first access time of 60 ns and subsequent accesses requiring 1 cycle at 133 MHz on a 128-bit bus. a) L1 Cache Miss Time in L2: - L1_miss_time_in_L2 = 15 ns (access time) + 2 L2_cycles = 15 + (32 * 3.75 / 16) = 2.5 ns - L2_miss_time_in_memory = 60 ns (access time) + 4 memory_cycles = 60 + (64 / 16 * 7.5) = 90 ns - Average_memory_access_time = 0.02 * L1_miss_time_in_L2 + 0.02 * 0.2 * L2_miss_time_in_memory + 0.02 * 0.5 * access_time_for_L2_write_back = 0.99 ns b) L1 Read Cache Miss Time in L2: - L1_read_miss_time_in_L2 = 15 + 3.75 = 18.75 ns - L2_miss_time_in_memory = 90 ns
- Average_memory_access_time_for_read = 0.02 * L1_read_miss_time_in_L2 + 0.02 * 0.2 * L2_miss_time_in_memory + 0.02 * 0.2 * 0.5 * access_time_for_L2_write_back = 0.92 ns c) L1 Write Time to L2: - L1_write_time_to_L2 = 15 + 3.75 = 18.75 ns - L2_miss_time_in_memory = 90 ns - Average_memory_access_time_for_write = 0.05 * L1_write_time_to_L2 + 0.05 * 0.2 * L2_miss_time_in_memory + 0.05 * 0.2 * 0.5 * access_time_for_L2_write_back = 2.29 ns d) Overall CPI (Cycle Per Instruction): = 1.35 + 1.09 + 0.2 + 1.01 + .1*2.52 = 3.902 CPI 3. 16 Points) Consider the code segment: fld f0, 0(x0) ; load f0 from address 0+x0 Loop: fld f2, 0(x2) ; load f2 from address 0+x2 fmult f2, f2, f0 ; f2 = f2 * f0 fsd f2, 0(x2) ; store f2 at address 0+x2 addwi x2, x2, 8 ; x2 = x2 +8 subwi x4, x3, x2 ; x4 = x3 – x2 bne x4, x0, loop ; branch to loop if x4!= 0 Assume the initial value of x3 is x2 + 64. Assume x0 =0 and x2=16, and the memory contains: M[0] M[8] M[16] M[24] M[32] M[40] M[48] M[56] M[64] M[72] M[80] M[88] 4.0 1.0 7.0 9.0 5.0 3.0 1.0 2.0 5.0 8.0 7.0 3.0 0 8 16 24 32 40 48 56 64 72 80 The cache size is 64 bytes and the block size is 16 bytes. Make table similar to Example 5 Lecture Slides. a) Display the content of memory, cache hit or miss status, and the content of cache as the loop
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
executes, assuming the cache is direct mapped, with write through and no write allocate. b) Repeat Part (a) assuming cache is a 2-way set associative, with FIFO replacement policy, and write back, and write allocate. A) The cache has 4 entries. Therefore, N MOD 4 Instruction Memory Access Cache Status Cache Content fld f0, 0(x0) Load M[0] Miss 0: 4.0 fld f2, 0(x2) Load M[16] Miss 0: 4.0, 16: 7.0 fsd f2, 0(x2) Store M[16] Hit 0: 4.0, 16: 28.0 fld f2, 0(x2) Load M[24] Hit 0: 4.0, 16: 28.0 fsd f2, 0(x2) Store M[24] Hit 0: 4.0, 16: 36.0 fld f2, 0(x2) Load M[32] Miss 0: 4.0, 32: 5.0 fsd f2, 0(x2) Store M[32] Hit 0: 4.0, 32: 20.0 fld f2, 0(x2) Load M[40] Hit 0: 4.0, 32: 20.0 fsd f2, 0(x2) Store M[40] Hit 0: 4.0, 32: 12.0 fld f2, 0(x2) Load M[48] Miss 0: 4.0, 48: 1.0 fsd f2, 0(x2) Store M[48] Hit 0: 4.0, 48: 4.0 fld f2, 0(x2) Load M[56] Hit 0: 4.0, 48: 4.0 fsd f2, 0(x2) Store M[56] Hit 0: 4.0, 48: 8.0 fld f2, 0(x2) Load M[64] Miss 8: 5.0, 24: 9.0 fsd f2, 0(x2) Store M[64] Hit 8: 20.0, 24: 9.0 fld f2, 0(x2) Load M[72] Hit 8: 20.0, 24: 9.0 fsd f2, 0(x2) Store M[72] Hit 8: 32.0, 24: 9.0 fld f2, 0(x2) Load M[80] Miss 8: 32.0, 80: 7.0 fsd f2, 0(x2) Store M[80] Hit 8: 32.0, 80: 28.0 B) N MOD 2 Instruction Memory Access Cache Status Cache Content fld f0, 0(x0) Load M[0] Miss 0: 4.0 fld f2, 0(x2) Load M[16] Miss 0: 4.0, 16: 7.0 fsd f2, 0(x2) Store M[16] Hit 0: 4.0, 16: 28.0 fld f2, 0(x2) Load M[24] Hit 0: 4.0, 16: 28.0 fsd f2, 0(x2) Store M[24] Hit 0: 4.0, 16: 36.0 fld f2, 0(x2) Load M[32] Miss 0: 4.0, 32: 5.0 fsd f2, 0(x2) Store M[32] Hit 0: 4.0, 32: 20.0 fld f2, 0(x2) Load M[40] Hit 0: 4.0, 32: 20.0
fsd f2, 0(x2) Store M[40] Hit 0: 4.0, 32: 12.0 fld f2, 0(x2) Load M[48] Miss 0: 4.0, 48: 1.0 fsd f2, 0(x2) Store M[48] Hit 0: 4.0, 48: 4.0 fld f2, 0(x2) Load M[56] Hit 0: 4.0, 48: 4.0 fsd f2, 0(x2) Store M[56] Hit 0: 4.0, 48: 8.0 fld f2, 0(x2) Load M[64] Miss 8: 5.0, 24: 9.0 fsd f2, 0(x2) Store M[64] Hit 8: 20.0, 24: 9.0 fld f2, 0(x2) Load M[72] Hit 8: 20.0, 24: 9.0 fsd f2, 0(x2) Store M[72] Hit 8: 32.0, 24: 9.0 fld f2, 0(x2) Load M[80] Miss 8: 32.0, 80: 7.0 fsd f2, 0(x2) Store M[80] Hit 8: 32.0, 80: 28.0 4. (8 Points) A computer has a L1 data cache and a L1 instruction cache. The latencies of the different kinds of accesses are as follows: cache hit 1cycle, cache miss 16 cycles. If the miss rate for instructions is 5% and the miss rate for data is 6%, what is the average memory access time in cycles? Answer: AMAT = Hit Time + Miss Rate * Miss Penalty For data accesses: AMAT_data = 1 + (0.06 x 16) = 1 + 0.96 = 1.96 cycles For instruction accesses: AMAT_instruction = 1 + (0.05 x 16) = 1 + 0.8 = 1.8 cycles So now the Average will be (1.96 + 1.8) / 2 = 1.88 cycles 5. (16 Points) Assume we have a 512 byte cache with 64 byte blocks. We also assume that the main memory is 2KB large. We can regard the memory as an array of 64- byte memory blocks M0, M1, M2, ...M31. The table below displays the memory blocks that can reside in different cache blocks if the cache was fully associative. Cache Block Set Memory blocks that can reside in cache blocks 0 0 M0, M1, M2, .... M31 1 0 M0, M1, M2, .... M31 2 0 M0, M1, M2, .... M31 3 0 M0, M1, M2, .... M31 4 0 M0, M1, M2, .... M31 5 0 M0, M1, M2, .... M31 6 0 M0, M1, M2, .... M31 7 0 M0, M1, M2, .... M31
a) Show the elements of the table if cache is organized as a direct mapped cache. Cache Block Set Memory blocks that can reside in cache blocks 0 0 M0, M8, M16, ... M24 1 1 M1, M9, M17, ... M25 2 2 M2, M10, M18, ... M26 3 3 M3, M11, M19, ... M27 4 4 M4, M12, M20, ... M28 5 5 M5, M13, M21, ... M29 6 6 M6, M14, M22, ... M30 7 7 M7, M15, M23, ... M31 b) Show the elements of the table if cache is organized as a two-way set associative cache. 6. (8 Points) Assume we have a computer where the CPI=1 when all memory accesses hit in cache. Only load and store access the data cache, and these total 20% of instructions. If the cache miss rate is 2% and the miss penalty is 25 cycles, how faster would the computer be if all instructions were cache hits? Answer: Speedup = CPI with cache misses / CPI with cache hits CPI with cache misses = 1 + (0.2 * 0.02) * 25 = 1.1¹[1] CPI with cache hits = 1 Speedup = 1.1 / 1 = 1.1 Therefore, the computer would be 10% faster if all instructions were cache hits. Cache Block Set Memory blocks that can reside in cache blocks 0 0 M0, M2, M4, M6... M30 1 1 M1, M3, M5, M7... M31 2 0 M0, M2, M4, M6... M30 3 1 M1, M3, M5, M7... M31 4 0 M0, M2, M4, M6... M30 5 1 M1, M3, M5, M7... M31 6 0 M0, M2, M4, M6... M30 7 1 M1, M3, M5, M7... M31
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7. (8 Points) Let’s assume a computer with L2 cache that has a 256-byte cache block. It takes 10 clock cycles to transfer the first 8 bytes to L1, and then 1 clock cycle per 8 bytes to transfer the rest of the block. Compare the miss penalty times with and without Critical Word First? Answer: Without Critical Word First: Total miss penalty without CWF = 10 (for the first 8 bytes) + 31 (for the remaining 248 bytes) = 41 clock cycles With Critical Word First (CWF): Total miss penalty with CWF = 10 clock cycles Therefore L2 cache with Critical word first will be 41/10 =4.1 times faster than without it.