prelim1-20fa-soln (3)

pdf

School

Cornell University *

*We aren’t endorsed by this school

Course

3410

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

10

Uploaded by MinisterGalaxy9309

Report
Fall 2020 Prelim 1 Solutions CS 3410, Cornell University Please turn off and stow away all electronic devices including smart watches. You may not use them for any reason during the exam. Do not bring them with you if you leave the room temporarily. This is a closed book and notes examination. You may use the 4-sided reference sheet provided. There are 9 problems . Make sure you have the whole exam. You have 90 minutes to complete 90 points. Use your time accordingly. Question Points Score 1 4 2 6 3 9 4 18 5 10 6 6 7 5 8 12 9 20 Total: 90 It is a violation of the Academic Integrity Code to look at any exam other than your own, to look at any other reference material, or to otherwise give or receive unauthorized help. We also ask that you not discuss the exam with anyone who has not yet taken it. Academic Integrity is expected of all students of Cornell University at all times, whether in the presence or absence of members of the faculty. Understanding this, I declare I shall not give, use or receive unauthorized aid in this examination. Signature: Date Name: NetID
1. [4 points] “On the king’s gate the CMOS grew gray.” —Helen Hunt Jackson (sort of) Recall that a p-transistor connects source to drain when a negative/zero charge is applied to the gate. What function does the following CMOS diagram implement? (Hint: truth table!) Correct Answer: OR diagram shows how when A=1, source connects to OUT A B OUT ground Vdd “hop over” means wires do not connect + + 2. “I know numbers are beautiful. If they aren’t beautiful, nothing is.” —Paul Erdos (a) [2 points] Express the unsigned integer 11001010 in decimal (base 10). Correct Answer: 128 + 64 + 8 + 2 = 202 (b) [2 points] Express the 2s complement number 011001010 in octal (base 8). group in 3s: 011 001 010 Correct Answer: o312 (c) [2 points] Express the decimal (base 10) number -48 in a 16-bit 2’s complement number. Make sure your answer is 16 bits long. 48 10 in 2’s complement: 0110000 flip bits: 1001111 now add 1, yields - 48 10 in 2’s complement: 1010000 now extend to 16-bit: Correct Answer: 1111 1111 1101 0000 Page 2
3. De secret’s in de mux . A multiplexer , or mux , takes multiple inputs and—based on a control input—and selects one of the inputs to be the output. The converse of the multiplexer is the demultiplexer (or demux ). Also called a data distrib- utor, a demultiplexor is an important component in networking. A 1:4 demux takes a single input ( D ) and routes it to one of several outputs ( Y 0 - Y 3 ) based on the 2-bit select input S 1 S 0 . The non-selected outputs remain 0. (a) [4 points] Complete the truth table for a 1-bit 1:4 demux. (Use the diagram on the left to disambiguate which select signal is associated with which output.) 00 01 10 11 Y 0 Y 1 Y 2 Y 3 S 0 S 1 D 1-bit 1:4 Demux Example Input/Ouputs: D = 1 S = 10 O = 0 1 00 1 0 0 0 1 0 S 1 S 0 D Y 0 Y 1 Y 2 Y 3 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 1 (b) [5 points] Using only basic 1- and 2-input logic gates (see reference sheet), draw the gate- level implementation of a 1-bit 1:2 demux . Label all inputs and outputs using the names/conventions shown above. Solution: Y 0 Y 1 S D Page 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4. Register to vote! This is the 32-bit, 32 register, dual-read and single-write register file presented in lecture: (a) [5 points] We want the register file to have 64-bit registers instead of 32-bit registers . Which signals need to be modified/replicated/removed to support the new register file? Correct Answer: A,F,G Data signals need to be changed from 32 to 64 bit Which components need to be modified/replicated/removed to support the new register file? Correct Answer: X,Y,Z Registers need to hold twice as many bits. Internally, muxes need to select 64 bit values, not 32 bit values. Decoder remains entirely unchanged. (b) [5 points] We want the register file to be dual-write instead of single-write . Which signals need to be modified/replicated/removed to support the new register file? Correct Answer: A,B,C Need 2 data inputs (replicate A), 2 names of output registers (replicate B), 2 write enable signals (replicate C) Which components need to be modified/replicated/removed to support the new register file? Correct Answer: W,X Need two decoders (replicate W), need two data inputs to the register file X Page 4
3 shall be the number thou shalt count, and the number of the counting shall be 3. Suppose we want the RISC-V register file from the previous page to support three simul- taneous reads in order to support a new instruction that can add three register values at the same time. (c) [2 points] In the context of the performance equation, why might this be a good idea? If an instruction can add three values together then it does the work of 2 ADDS in a standard 2-input ISA. So the dynamic instruction count goes down, which is a good thing. (d) [2 points] In the context of the performance equation, why might this be a bad idea? This change is likely to slow down the clock because your register file just got larger, forwarding and stall logic just got more complicated, and the ALU will need to need to be modified to support adding 3 values (or you must perform 2x 2-input ADDs in the execute stage. (e) [2 points] Would this change be supported by RISC designers? Correct Answer: NO (f) [2 points] Why or why not? Lot of reasons (you only needed one): Hard to encode a third register input name in a 32-bit instruction. This focuses on the uncommon case rather than the common case. This change complicates the pipeline and the forwarding/stall logic. Page 5
5. [10 points] “If you RISC nothing, then you RISC everything.” — Geena Davis Given the start state of the 32-bit register file, step through the following assembly. Draw the final state of the register file in decimal (base 10) after the code completes. 0 x0 0 x1 1 x2 1 x3 0 x4 3 x5 10 x6 3 x7 Register File BEFORE x0 x1 x2 x3 x4 x5 x6 x7 Register File AFTER 2 x8 x8 We will only grade the contents of these 9 boxes. 0 2 3 1 0 9 10 3 8 SLL x8, x3, x5 XOR x5, x5, x6 LOOP: BGE x2, x7, END AND x4, x3, x2 BNE x4, x0, INC ADD x1, x1, x2 INC: ADDI x2, x2, 1 BLT x2, x7, LOOP END: Page 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
6. [6 points] The Dukes of Hazzard. The following code will be run on a 5-stage (FDXMW) pipelined processor with no forwarding logic. The register file is written to in the first half of the cycle and read from during the second half. How many nop instructions need to be injected between each of the original instructions in order to avoid data hazards and preserve correctness? Enter the minimum amount . (This can be 0.) lw x1, 4(x3) Correct Answer: 0 nops add x2, x2, x3 Correct Answer: 1 nops lw x3, 2(x1) Correct Answer: 2 nops sra x1, x3, x5 7. [5 points] C is for Covid that’s not funny What is printed when this program runs? 3 int x = 13, y = 20, z = 3; 4 5 int smile( int y) { 6 int x = y - z; 7 printf( "2: x = %d\n" , x); 8 printf( "3: y = %d\n" , y); 9 return (z+y); 10 } 11 12 int main() { 13 int z = y; 14 y += x; 15 printf( "1: y = %d\n" , y); 16 x = smile(z); 17 printf( "4: x = %d\n" , x); 18 printf( "5: z = %d\n" , z); 19 } 1: y = 33 2: x = 17 3: y = 20 4: x = 23 5: z = 20 Page 7
8. Keep It Simple, Silly! . (a) [6 points] Minimize the following Boolean expression using the Laws and Theorems of Boolean Algebra (see reference sheet). Show your work, applying only one rule per line : a ¯ bc + ¯ a ¯ bc + bd + cd = ( a + ¯ a ) ¯ bc + bd + cd , Distributive Property = ¯ bc + bd + cd , Complementarity = ¯ bc + bd , Consensus (b) [6 points] Given the Karnaugh Map below, find the minimized sum-of-product expression. cd ab -→ 00 01 11 10 00 0 0 1 1 01 0 1 1 0 11 1 1 0 0 10 0 0 X 1 Correct Answer: a ¯ d + b ¯ cd + ¯ acd Page 8
9. “Speed is irrelevant if you are going in the wrong direction.” - Mahatma Gandhi Assume you have hardware components with the following delays: Program Memory: 11 ns, Register File: 8 ns to read, 7 ns to write, ALU: 9 ns, Plus-4-Adder (for the PC update): 3 ns, Data Memory: 13 ns. Assume all other delays are negligible. (a) [3 points] Estimate the clock period of a single-cycle processor created with the above components. add up all components except the PC adder which is not on the critical path 11+8+7+9+13 = 48 ns (b) [3 points] Estimate the clock period of a multi-cycle processor using the above compo- nents. Assume that registers are added after the Program Memory, Register File, ALU, and Data Memory (as you saw in class) to divide up the single cycle into multiple cycles. cycle time determined by the slowest component, in this case data memory 13 ns (c) [3 points] Estimate the clock period of a 5-stage (FDXMW) pipelined processor using the above components. since WB and Decode both use the register file, we need to support both reading and writing in half a cycle. reading takes longer (8 ns), so we would have to set the clock to be twice the time of a read. The half period calculation was a very subtle answer. Almost full points were also given if you simply added 7+8 and got 15 16 ns Page 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Now consider a workload of 100,000 instructions with the following instruction mix: 50% ALU, 20% Load (half of these are followed by a use), 10% Store, 20% Branches (75% of these are taken). (d) [2 points] Estimate the CPI of this workload on a single-cycle processor . = 1 CPI (e) [2 points] Estimate the CPI of this workload on a multi-cycle processor using the above components. As we did in class, assume that each instruction type has the number of cycles as dictated by the number of structures it must access in serial. ALU: .5 x 4 stages, Loads: .2 x 5 stages, Stores: .1 x 4 stages, Branches: .2 x 3 stages = 4.0 CPI (f) [3 points] Estimate the CPI of this workload of a 5-stage (FDXMW) pipelined processor using the above components. 1 + [Load-Use Stalls] .2 x .5 x 1 + [Taken Branches] .2 x .75 x 2 = 1.4 CPI Consider the single-cycle processor . You double the size of the register file, which makes it 3 ns slower for both reading and writing. But now the compiler can keep more values in the register file, so there are half as many loads in the workload as before. (g) [2 points] Is this new processor better at executing the newly compiled workload than the original processor was at executing the original workload? Correct Answer: No. (h) [2 points] Justify your answer. The single-cycle processor’s clock period just went up 6 ns (from 48 ns to 54 ns, 1/8 slower or 12.5% slower. Meanwhile the dynamic instruction count decreased by 10%. Unfortunately, the instruction decrease (10%) does not compensate for the slower clock (12.5%). There will be a net slowdown. Page 10