CachesUpdated

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

3058

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

14

Uploaded by LieutenantPenguin2773

Report
ECE 3058 Architecture, Systems, Concurrency, and Energy in Computation Problem Set ________________________________________________________________________ Part A: Caches ------------------------------------------------------------------------------------------------------------ Direct-mapped Cache The following diagram shows how a direct-mapped cache is organized. To read a word from the cache, the input address is set by the processor. Then the index portion of the address is decoded to access the proper row in the tag memory array and in the data memory array. The selected tag is compared to the tag portion of the input address to determine if the access is a hit or not. At the same time, the corresponding cache block is read and the proper line is selected through a MUX. Figure A.1: A direct-mapped cache implementation In the tag and data array, each row corresponds to a line in the cache. For example, a row in the tag memory array contains one tag and two status bits (valid and dirty) for the cache line. For direct-mapped caches, a row in the data array holds one cache line. Four-way Set-associative Cache The implementation of a 4-way set-associative cache is shown in the following diagram. (An n way set- associative cache can be implemented in a similar manner.) The index part of the input address is used to find
the proper row in the data memory array and the tag memory array. In this case, however, each row (set) corresponds to four cache lines (four ways). A row in the data memory holds four cache lines (for 32-bytes cache lines, 128 bytes), and a row in the tag memory array contains four tags and status bits for those tags (2 bits per cache line). The tag memory and the data memory are accessed in parallel, but the output data driver is enabled only if there is a cache hit. Figure A.2: A 4-way set-associative cache implementation Problem A.3: Now George P. Burdell is studying the effect of set-associativity on the cache performance. Since he now knows the access time of each configuration, he wants to know the miss-rate of each one. For the miss-rate analysis, George is considering two small caches: a direct-mapped cache with 8 lines with 16 bytes/line, and a 4-way set-associative cache of the same size. For the set-associative cache, George tries out two replacement policies – least recently used (LRU) and round robin (FIFO). George tests the cache by accessing the following sequence of hexadecimal byte addresses, starting with empty caches. For simplicity, assume that the addresses are only 12 bits. Complete the following tables for the direct-
mapped cache and both types of 4-way set-associative caches showing the progression of cache contents as accesses occur (in the tables, ‘inv’ = invalid, and the column of a particular cache line contains the {tag,index} contents of that line). You only need to fill in elements in the table when a value changes. D-map Address line in cache hit? L0 L1 L2 L3 L4 L5 L6 L7 110 inv 11 inv inv inv inv inv inv no 136 13 no 202 20 no 1A3 102 361 204 114 1A4 177 301 206 135 D-map Total Misses Total Accesses
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4-way Address LRU line in cache hit? Set 0 Set 1 way0 way1 Way2 way3 way0 way1 way2 way3 110 inv Inv Inv inv 11 inv inv inv no 136 11 13 no 202 20 no 1A3 102 361 204 114 1A4 177 301 206 135 4-way LRU Total Misses Total Accesses
4-way Address FIFO line in cache hit? Set 0 Set 1 way0 way1 way2 way3 way0 way1 way2 way3 110 inv Inv Inv inv 11 inv inv inv no 136 13 no 202 20 no 1A3 102 361 204 114 1A4 177 301 206 135 4-way FIFO Total Misses Total Accesses Problem A.4: For the different replacement policies for the set-associative cache, which one has a smaller cache miss rate for the address stream in Problem A.3? Explain why. Is that replacement policy always going to yield better miss rates? If not, give a counter example using an address stream.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Part B: Direct Mapped vs Set Associative Caches ------------------------------------------------------------------------------------------------------------ Suppose we have a physically-indexed, physically-tagged L1 cache. All physical addresses are 12 bits . The L1 cache is direct-mapped , and has 4 sets . The block size is 16 bytes . Problem B.1: L1 Cache Size = ____________ bytes Index Size = ____________ bits Tag Size = ____________ bits Problem B.2: Table B-1 tracks the contents of the L1 cache for a sequence of memory accesses. The first entry shows the initial state of the cache. “X” represents an invalid/empty entry. To simplify things for you, valid entries in the cache are represented by the {Tag, Index}. Complete the Table for the given sequence of addresses. The first two have been filled for you. You only need to fill the elements in the Table when a value changes, except for the “L1 Hit?” column which needs to be filled out for all accesses. Access Address L1 Hit? (Yes/No) L1 State after Access {Tag, Index} Set0 Set1 Set2 Set3 ... X 45 F6 X 0x7B0 No 7B 0xDC4 No DC 0xDCF 0xF3C 0x8CB 0xB8B Table B-1: Contents of L1 Cache
Problem B.3: Suppose we double the size of the cache, and make it 4-way set associative. The block size is the same (16 bytes) Fill out the L1 cache contents for the given access pattern. The initial state of the L1 is given to you. “X” represents an invalid/empty entry. Valid entries are represented by the {Tag, Index}. Each L1 entry also tracks its insertion time/last access time in order to implement LRU replacement within each set. The LRU way is the candidate for replacement only if all ways have valid data, otherwise any of the ways with invalid data is replaced. Time Unit Access Addr Hit? (Yes/ No/ NA) L1 State after Access {Tag, Index} (Access Time) Set0 Set1 Way0 Way1 Way2 Way3 Way0 Way1 Way2 Way3 90 (7) 8C (3) X X 47(4) 4B(6) E1(7) X 10 0x7B0 11 0xDC4 12 0xDCF 13 0xF3C 14 0x8CB 15 0xB8B Table B.2 : Contents of Updated L1 cache
Part C: Cache Accesses and Writebacks [All addresses in this Part at Physical Addresses]. Suppose we are running the following code: #define ARRAY_SIZE 4 for (int i = 0; i < ARRAY_SIZE; i++) { S[i] = P[i] + Q[i] } The arrays S, P and Q have 4 entries each, and hold integer values (4 bytes at each entry). The memory addresses for each are shown below: Suppose this code translates to the following assembly instruction sequence. // Initial values: // Rp = 0x1A20, Rq = 0x1C60, Rs = 0x2BA0 // Ri = 4 loop: LD R1, (Rp) // R1 Mem[Rp] LD R2, (Rq) // R2 Mem[Rq] ADD R3, R1, R2 // R3 = R1 + R2 ST R3, (Rs) // Mem[Rs] R3 ADDI Rp, 4 // Rp Rp + 0x4 ADDI Rq, 4 // Rq Rq + 0x4 ADDI Rs, 4 // Rs Rs + 0x4 SUBI Ri, 1 // R4 R4 – 1 BNEZ Ri, loop // if R4 != 0, branch to loop This code produces the following 12 memory accesses to the cache hierarchy: LD 0x1A20 LD 0x1C60 ST 0x2BA0 LD 0x1A24 LD 0x1C64 ST 0x2BA4 LD 0x1A28 LD 0x1C68
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ST 0x2BA8 LD 0x1A2C LD 0x1C6C ST 0x2BAC We will analyze the miss rates for each. Question C-1 Consider a L1 cache connected to DRAM. The L1 is Direct Mapped, with 4 sets. Dirty data from L1 has to be written back to DRAM when the line gets evicted. On the next page, the initial state of the L1 is given. Dirty data is circled. Update the state of the L1 for each of the accesses. For each entry you can write the {tag, index} for simplicity and circle the dirty data. The writeback column specifies the cache line whose data is being written back to DRAM. Question C-2 What is the L1 Miss Rate (L1 Misses/L1 Accesses) ? Question C-3 How many times does a write back from the L1 to the DRAM take place?
Part D: Cache Basics Question D.1 For each of the following cache configurations, fill out the missing entries in the table. Assume a 32b address. The cache size refers to the data array (not tag array) size . Cache Block Size (Bytes) Number of Sets Associativity Cache Size (Bytes) 64B 1024 1 (Direct-Mapped) 64B 2 64kB 1024 4 128kB 16B 256 128kB Question D.2 Assume the following cache configuration: Cache Size = 4kB Cache Line Size = 64B Address = 32b For the following sequence of memory accesses, identify the index and tag for the specific cache configurations. Direct-Mapped 4-way Set Associative Fully Associative Memory Address Index Tag Index Tag Index Tag 0x7A348 0xDEED01
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Part E: Cache Access In this part, you will compare the performance of a direct mapped and a fully-associative cache for the same access pattern. In both Question B.1 and B.2, you will need to complete the given tables for the given sequence of memory accesses. “X” represents an invalid/empty entry. The first entry shows the initial state of the cache. The first access has been filled for you. You only need to fill the elements in the Table when a value changes, except for the “L1 Hit?” column which needs to be filled out for all accesses. Question E.1 Suppose we have direct-mapped L1 cache of size 64B . The cache block size is 16 bytes. Complete the following Table. Access Address L1 Hit? (Yes/No) L1 State after Access {Tag, Index} Set0 Set1 Set2 Set3 ... X X F6 X 0x710 No 71 0xD54 0xADC 0x71C 0xD5B 0xAD0 Table B-1: Contents of Direct Mapped L1 Cache
Question E.2 Suppose we have fully-associative L1 cache of size 64B . The cache block size is 16 bytes. Assume a LRU replacement policy. Complete the following Table. Time Access Address L1 Hit? (Yes/No) L1 State after Access {Tag, Index} (Access Time) Way0 Way1 Way2 Way3 ... 50 (8) X F6 (3) 33 (4) 10 0x710 No 71 (10) 11 0xD54 12 0xADC 13 0x71C 14 0xD5B 15 0xAD0 Table B-2: Contents of Fully-Associative L1 Cache Question E.3 For the access patterns shown above, which of the two caches - Direct Mapped or FullyAssociative - has a higher hit rate? What characteristic of the access pattern leads to one of the caches having more misses than the other?