sp23-sols
pdf
keyboard_arrow_up
School
New York University *
*We aren’t endorsed by this school
Course
152
Subject
Computer Science
Date
Jan 9, 2024
Type
Pages
30
Uploaded by 15242001593a
CS 152/252A
Spring 2023
Computer Architecture
Final
•
You have 170 minutes unless you have DSP accommodations. Exam questions are roughly in the order they were covered
in lecture. If a question is used for a clobber, it’s labeled either (MT1) or (MT2) depending on the exam it clobbers.
•
You must write your student ID on the bottom-left of every page of the exam (except this first one). You risk losing credit
for any page you don’t write your student ID on.
•
For questions with length limits, do not use semicolons or dashes to lengthen your explanation.
•
The exam is closed book, no calculator, and closed notes, other than three double-sided cheat sheet that you may reference.
•
For multiple choice questions,
□
means mark
all options
that apply
*$
means mark a
single choice
First name
Last name
SID
Exam Room
Name and SID of person to the right
Name and SID of person to the left
Discussion TAs (or None)
While the statement of the Honor Code itself is brief, it is an affirmation of our highest ideals as Golden Bears.
Honor code
: “As a member of the UC Berkeley community, I act with honesty, integrity, and respect for others.”
By signing below, I affirm that all work on this exam is my own work, and honestly reflects my own understanding of the
course material. I have not referenced any outside materials (other than one double-sided cheat sheet), nor collaborated
with any other human being on this exam. I understand that if the exam proctor catches me cheating on the exam, that I
may face the penalty of an automatic "F" grade in this class and a referral to the Center for Student Conduct.
Signature:
1
THIS PAGE IS INTENTIONALLY LEFT BLANK
2
Q1. [20 pts] Iron Law & Pipelining (MT 1)
For questions (a) to (d), determine whether each given statement is true. If false, point out and replace the incorrect part.
Example: Lowering CPU clock frequency will (1) decrease
(2) seconds-per-cycle
because (3) each clock cycle now takes longer
.
Answer:
The statement is false because part (1) is wrong.
Replace with:
increase
(a)
[3 pts] Adding a branch delay slot might (1) increase
(2) instructions-per-program
because (3) the branch predictor might
not be accurate
.
Which option best describes the statement above?
*$
The statement is true.
*$
The statement is false because part (1) is wrong.
*$
The statement is false because part (2) is wrong.
'!
The statement is false because part (3) is wrong.
If the statement is false, replace the incorrect part (under 1 sentence) so that the statement becomes true:
The compiler will have to add extra instructions to fill the slot
(b)
[3 pts] In a classic 5-stage pipeline, supporting precise exceptions might (1) increase
(2) cycles-per-instruction
due
to (3) added logic complexity
.
Which option best describes the statement above?
*$
The statement is true.
*$
The statement is false because part (1) is wrong.
'!
The statement is false because part (2) is wrong.
*$
The statement is false because part (3) is wrong.
If the statement is false, replace the incorrect part (under 1 sentence) so that the statement becomes true:
seconds-per-cycle
SID: ________________________
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(c)
[3 pts] Moving from a single-threaded core to SMT-enabled core might (1) increase
(2) time-per-cycle
due
to (3) duplicated microarchitecture structures (PC, ArchRF, ...) and scheduling logic
.
Which option best describes the statement above?
'!
The statement is true.
*$
The statement is false because part (1) is wrong.
*$
The statement is false because part (2) is wrong.
*$
The statement is false because part (3) is wrong.
If the statement is false, replace the incorrect part (under 1 sentence) so that the statement becomes true:
N/A
(d)
[3 pts] Stripmining on a vector processor might (1) increase
(2) instructions-per-program
due to (3) handling cases where
the iteration count is not divisible by the vector length
.
Which option best describes the statement above?
'!
The statement is true.
*$
The statement is false because part (1) is wrong.
*$
The statement is false because part (2) is wrong.
*$
The statement is false because part (3) is wrong.
If the statement is false, replace the incorrect part (under 1 sentence) so that the statement becomes true:
N/A
(e)
[4 pts] A 5-stage pipeline that has a 1-cycle ALU and a multi-cycle
unpipelined
FPU has WAW and structural hazards.
We would like to pipeline the FPU to potentially help with these issues.
(i)
[2 pts] Does pipelining the FPU help with WAW hazards?
*$
Yes
'!
No
(ii)
[2 pts] Does pipelining the FPU help with structural hazards?
'!
Yes
*$
No
(f)
[4 pts] Assume an out-of-order core is executing a store instruction before a load instruction to different addresses (i.e.
instruction 0 = store to address A, instruction 1 = load to address B). For this particular core, the core designer allows for
the load to complete before the store (the load is reordered before the store) if the store takes longer than the load to issue
or execute. By completing before the store, the load is allowed to bring its data into the cache, potentially evicting older
data. Note that the load can only complete, i.e the register for the load will not be modified until it commits.
Assume the older store throws an exception
after
the younger load completes. Is this behavior still valid for precise
exceptions?
'!
Yes
*$
No
SID: ________________________
4
Q2. [20 pts] Microcode Grab Bag (MT 1)
The questions in this section may be answered independently of one another.
It may be helpful to refer to
Appendix A
on microcoding while answering this question.
(a)
[4 pts] The developer before you had tried to implement an instruction in microcode. However, since they didn’t take
CS 152/252A, their implementation might have a bug! They’ve left you the instruction and pseudocode, as well as their
potentially buggy microcode.
Instruction
:
BUGGY rd, rs1, rs2
Pseudocode
:
if (R[rs1] != 0) {
R[rd] = R[rd] + M[R[rs2]];
}
Microcode implementation:
(i)
[2 pts] If there is an incorrect line of microcode in the above implementation, what line contains an error? If no lines
contain errors, please mark "None of the above".
*$
Line 1
*$
Line 2
'!
Line 3
*$
Line 4
*$
Line 5
*$
None of the above
(ii)
[2 pts] Why is the line you marked above incorrect? You may write
at most 2 sentences
of explanation.
Note
: if you marked "None of the above" for the previous subpart, leave this part blank.
Line 3 should be a S (spin) for uBr instead of N (next). This is since we must wait for memory to
load into A, which requires us to spin on that line until the memory value has arrived. (
Note:
This
was the intended error. However, there were two additional, accidental errors: ALUEn should be 0
to prevent the case of both memory and ALU writing to the bus at the same time; MemWr should
be 0 to prevent writing garbage or some other value to the memory address. Credit was given if the
student selected the correct option in part (i) and had at least one of these reasons.)
SID: ________________________
5
(b)
[16 pts]
Reverse Engineering Microcode
The aforementioned developer was, unfortunately, also a firm believer in self-documenting code and chose not to explain
what some of the microcoded instructions do! In this part, we consider an implementation of the microcoded instruction
mystery
. The microcode for this instruction can be found on page 7 of your exam booklet.
(i)
[9 pts] Analyze the encoded control signals for the
mystery
instruction and
complete the pseudocode
in the space
provided in the table on page 7. If a row encodes multiple pseudo-operations, write both operations in the same
pseudocode box. Unless the pseudocode for a row is already provided, you need to fill out the pseudocode for every
row with microcode signals in the table.
(ii)
[3 pts] In
one sentence
(or less), name or describe the
high-level operation
that this instruction implements.
No credit will be given for simply translating the pseudocode to English.
This is an implementation of the
strlen
function.
—or—
This instruction counts the number of sequential, non-zero valued bytes in a buffer.
(iii)
[2 pts] What is/are the input register(s) of the mystery instruction?
■
rs1
□
rs2
□
rd
□
A
□
B
(iv)
[2 pts] What is/are the output register(s) of the mystery instruction?
□
rs1
□
rs2
■
rd
□
A
□
B
SID: ________________________
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Microcode Implementation of the MYSTERY Instruction
State
Pseudocode
IR
Ld
Reg
Sel
Reg
Wr
Reg
En
A
Ld
B
Ld
ALUOp
ALU
En
MA
Ld
Mem
Wr
Mem
En
Imm
Sel
Imm
En
μBr
Next State
FETCH0:
MA ← PC;
A ← PC
*
PC
0
1
1
*
*
0
1
0
0
*
0
N
*
IR ← Mem
1
*
0
0
0
*
*
0
0
0
1
*
0
S
*
PC ← A+4
0
PC
1
0
0
*
INC_A_4
1
*
0
0
*
0
D
*
. . .
NOP0:
microbranch
back to FETCH0
*
*
0
0
*
*
*
0
*
0
0
*
0
J
FETCH0
MYSTERY:
0
*
0
0
0
1
COPY_A
1
*
0
0
*
0
N
*
0
Rd
1
0
*
*
SUB
1
*
0
0
*
0
N
*
0
Rs1
0
1
1
*
*
0
*
0
0
*
0
N
*
AGAIN:
0
*
0
0
0
*
COPY_A
1
1
0
0
*
0
N
*
0
*
0
0
0
1
*
0
0
0
1
*
0
S
*
0
*
0
0
0
0
COPY_B
0
*
0
0
*
0
EZ
FETCH0
0
*
0
0
1
0
INC_A
1
*
0
0
*
0
N
*
0
Rd
0
1
0
1
*
0
*
0
0
*
0
N
*
0
Rd
1
0
0
*
INC_B
1
*
0
0
*
0
J
AGAIN
SID: ________________________
7
Microcode Implementation of the MYSTERY Instruction
State
Pseudocode
IR
Ld
Reg
Sel
Reg
Wr
Reg
En
A
Ld
B
Ld
ALUOp
ALU
En
MA
Ld
Mem
Wr
Mem
En
Imm
Sel
Imm
En
μBr
Next State
FETCH0:
MA ← PC;
A ← PC
*
PC
0
1
1
*
*
0
1
0
0
*
0
N
*
IR ← Mem
1
*
0
0
0
*
*
0
0
0
1
*
0
S
*
PC ← A+4
0
PC
1
0
0
*
INC_A_4
1
*
0
0
*
0
D
*
. . .
NOP0:
microbranch
back to FETCH0
*
*
0
0
*
*
*
0
*
0
0
*
0
J
FETCH0
MYSTERY:
B ← A
0
*
0
0
0
1
COPY_A
1
*
0
0
*
0
N
*
R[Rd] ← A – B
0
Rd
1
0
*
*
SUB
1
*
0
0
*
0
N
*
A ← R[Rs1]
0
Rs1
0
1
1
*
*
0
*
0
0
*
0
N
*
AGAIN:
MA ← A
0
*
0
0
0
*
COPY_A
1
1
0
0
*
0
N
*
B ← M
0
*
0
0
0
1
*
0
0
0
1
*
0
S
*
if EZ(B)
goto FETCH0
0
*
0
0
0
0
COPY_B
0
*
0
0
*
0
EZ
FETCH0
A ← A + 1
0
*
0
0
1
0
INC_A
1
*
0
0
*
0
N
*
B ← R[Rd]
0
Rd
0
1
0
1
*
0
*
0
0
*
0
N
*
R[Rd] ← B + 1
goto AGAIN
0
Rd
1
0
0
*
INC_B
1
*
0
0
*
0
J
AGAIN
SID: ________________________
8
Q3. [12 pts] Caches (MT 1)
Way prediction is an optimization technique used in set-associative caches. The principle is that we predict which cache way is
most likely going to be accessed for a particular memory request. If our prediction is correct, there is no need to check the other
ways in the cache. If it is incorrect, we proceed as though with a normal set associative access.
(a)
First, consider a two way set associative cache which is designed with either way-prediction (one data way is read at a
time) or concurrent data access (both data ways are read at the same time).
(i)
[2 pts] True or False: In all circumstances, conventional concurrent data access caches have an AMAT
less than or
equal
to that of way-predicted cache.
'!
True
*$
False
In no more than 2 sentences, justify your answer:
Even with 100% prediction accuracy, the way-predicted cache will perform a tag check and data
access, which is the same as the concurrent access cache.
(ii)
[2 pts] True or False: A concurrent data access cache will consume
equal or more
power than a way predicted cache
in all circumstances.
'!
True
*$
False
In no more than 2 sentences, justify your answer:
In the case of a correct prediction, the way-predicting cache only has to activate 1 way, but in all cases
the concurrent access cache will activate all the ways.
Figure 1: General 2-Way Set Associative Cache
(b)
(i)
[2 pts] Consider a 16 kB two-way set associative cache
without
way prediction. Given a hit time of 5 cycles, a hit
rate of 80%, and an L2 access time of 30 cycles, what is the AMAT of this cache?
5+0.2*30 = 11
SID: ________________________
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(ii)
[2 pts] Consider an 8kB
direct mapped
cache. Given a hit time of 2 cycles, a hit rate of 60%, and an L2 access time
of 30 cycles, what is the AMAT of this cache?
2+0.4*30 = 14
(iii)
[1 pt] Now consider a 16 kB two-way set associative cache
with
way prediction. What is the hit rate of this cache?
*$
60%
*$
70%
'!
80%
*$
None of the above
(iv)
[3 pts] What is the AMAT of this cache?
(0.6*2 + 0.4*5) + 0.2*30 = 9.2
(v)
[1 pt] In the case of a data cache, which of the following is the preferred input to the way-predictor (reduces latency
to cache access)?
*$
The data address
'!
PC
(vi)
[2 pts] Justify your answer to the previous part.
The predicted way must be available before the actual data access to as to reduce latency. This is best
done with the load PC.
SID: ________________________
10
Q4. [18 pts] Virtual Memory (MT 1)
Recall from discussion that superpages are memory pages of large sizes. Most general purpose processors support superpages
because of many benefits they can bring to the table. Processes can specify if they would like to allocate superpages or "regular
pages" for their workload. This question will explore superpaging in more detail.
L1 Index [23:18]
L2 Index [17:12]
Offset [11:0]
PPN [23:18]
Offset [17:0]
Virtual Address
Physical Address
PTE
L1 Table
(a) VM Concept Blitz
(i)
[2 pts] Let’s recap some virtual memory concepts.
Select all that are true.
■
Systems with virtual memory can give the illusion of more memory than is physically available.
■
Paging provides a layer of security.
□
All modern systems must have virtual memory.
■
Virtual memory is expensive from a hardware and runtime perspective.
*$
None of the above
(ii)
[2 pts] Let’s consider some superpaging concepts.
Select all that are true.
□
Superpaging reduces hardware complexity.
□
A system that supports superpaging is less prone to external fragmentation.
■
With superpaging, TLB memory scope increases.
■
With superpaging, disk traffic can increase.
*$
None of the above
SID: ________________________
11
(b)
[10 pts]
Super Translation
Consider a system that uses
32-bit
words,
10-bit
virtual addresses,
16-byte
pages, and
three-level
page tables. This
system supports superpages. The system memory has a latency of
90 ns
. A secondary storage (disk) is attached to the
system. The disk has a latency of
2.5 ms
and a speed of
0.1 byte/ns
.
In addition to contents, each PTE contains 1 bit indicating whether the pointed page is on disk, and 1 bit indicating if that
page is a superpage. If a page is on disk, it needs to be transferred into memory before its content can be read. Note that
here we have simplified the page fault handling process.
Below are the virtual addresses of two memory accesses, the content of the Page Table Base Register, and a table showing
the contents of a portion of physical memory used for page tables. Recall that the content of a PTE stores the page number
of the next-level page table.
$ddr
&ontents
6uperpage"
2n 'isk"
0x00
0x04
0
0
0x04
0x06
0
1
0x08
0x0C
0x20
1
0
0x10
0x14
0x18
0xA1
0
1
0x1C
0x20
0x24
0x15
0
1
0x28
0x0A
0
1
0x2C
0x78
0
0
0x30
0x05
0
0
0x34
0x38
0x3C
0x90
1
1
0x40
0x44
0x48
0x12
1
1
0x4C
0x01
0x50
0x09
1
0
0x54
0x58
0x5C
0x02
0
1
$ccess 2rder
9irtual $ddress
1
0x0FA
2
0x349
3age 7able %ase 5egister
0x30
(i)
[4 pts] What is the physical address of the first memory access?
0x78A
(ii)
[4 pts] What is the physical address of the second memory access?
0x9049
(iii)
[2 pts] Which of the two memory accesses has a
lower
latency? Here, latency is defined as the time between when
an access begins the translation process and when the target byte is retrieved from memory. Assume that no other
latency is involved except the ones mentioned above.
Hint: disk access time = latency + size-of-transfer / rate-of-transfer
'!
First memory access
*$
Second memory access
SID: ________________________
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(c) Superpage Scenarios
In the following scenarios, which paging mechanism would be better and why?
(i)
[2 pts] Context switching among many apps that have a working set of 1 MB on a system with 4 MB pages and 16
MB superpages.
*$
Superpaging
'!
Regular Paging
*$
Neither is better than the other
In
no more than 2 sentences
, justify your answer:
We do not need superpages for the size of the working sets because it ends up wasting memory. This
would result in potentially hitting disk more with enough apps running in parallel.
(ii)
[2 pts] Training a machine learning model with a massive dataset (4 GB) – requiring multiple iterations through the
data – on a system with 4 MB pages and 16 MB superpages.
'!
Superpaging
*$
Regular Paging
*$
Neither is better than the other
In
no more than 2 sentences
, justify your answer:
The spatial locality of the workload, paired with the need for more memory scope at the TLB level,
points to superpaging. With larger TLB scope, CPI is less (no need for as many expensive page table
walks during memory accesses) and hence, a faster completion of the task at hand.
SID: ________________________
13
Q5. [20 pts] Out-of-Order Pipelines (MT 2)
(a)
[8 pts]
Avenue (Issue) Q
In this part, you are working on the design of an out-of-order processor which has separate functional units for integer
operations, floating point, and memory. Each functional unit has its own issue queue (also referred to as a reservation
station or issue buffer) from which it is able to issue dispatched instructions.
(i)
[1 pt] A coworker suggests that it is a good idea to use a large ROB (>200 entries) and size the issue queue of each
individual functional unit such that it is the size of the ROB. When considering power, performance, and area trade
offs, is your coworker’s suggestion a good one?
*$
Yes
'!
No
(ii)
[3 pts] In no more than two sentences, justify your answer:
A 200 entry issue queue for each functional unit is not feasible. Issue queues are expensive from a logic and
area perspective because they are responsible for evaluating both whether an instruction can issue at all and,
amongst the ready candidates, which should issue. Additionally, updating large IQs when information about
writebacks becomes available is very expensive as it requires essentially treating the IQ as content addressable
memory (equivalently, a very highly associative cache!).
—
or
—
It doesn’t make sense to size each IQ to the ROB as it is very unlikely that we’d ever actually
use
such capacity
as most reasonable programs are a mix of integer, memory, and floating point operations.
(iii)
[1 pt] Another coworker suggests that the ROB should have more entries than the total sum of all issue queue entries
across all functional units. When considering power, performance, and area trade offs, is your coworker’s suggestion
a good one? You may ignore considerations around scaling the physical register file and free list.
'!
Yes
*$
No
(iv)
[3 pts] In no more than two sentences, justify your answer:
Scaling the ROB is cheap. Instructions are dispatched in-order and the ROB does not have any complex content
addressable memory behaviors. Since the ROB sets the maximum number of in-flight instructions in the pipeline
(including undispatched and issued instructions as well as completed but uncommitted instructions!) whereas the
IQ sizes determines the maximum number of instructions which can be considered for issue in a cycle. If the ROB
is smaller or equal to the sum of the issue queues, the issue queues (and thus the amount of ILP available to us!)
will be underutilized as the ROB will fill before we reach capacity. To keep the backend saturated, we want to be
able to refill all IQs whenever an instruction issues as well as to track all possibly instructions as the flow through
execution. We always want to scale the cheaper structure to keep the more expensive structure saturated: why the
ROB grows to fully cover the IQs.
SID: ________________________
14
(b)
[12 pts]
They see me rollin’, they hatin’
In the tables below, update the ROB, rename table, and freelist to reflect the state of the processor after executing the given
program and completing any necessary rollbacks using a multi-cycle unwind procedure. Assume that all instructions
which can be committed are committed
before
any rollback operations begin. Additionally, assume that the pagefault
exception is detected after the
sub
instruction has already begun executing and that the branch mispredict is resolved at
some point after the pagefault exception.
•
The free list operates as a FIFO queue; entries are popped from the left and freed entries are pushed on the right.
•
When removing an item from the freelist, do not cross out entries; instead, mark an “X” in the row immediately
below.
•
If an instruction does not write to the register file, mark an “X” in the ROB.
•
When completing the rename table, do not cross out entries. Instead, write the new physical register in the next box
to the right. You may not need to use all spaces.
The first instruction has been completed in the tables for you.
All three of the tables below will be graded.
PC
Instruction
00
addi x2, x2, #1
04
ld
x2, 0(x2)
08
beq
x2, x0, label ; mispredicted as not taken, resolved very late
0c
mul
x3, x2, x2
10
st
x3, 0(x2)
; pagefault exception, detected early
14
sub
x2, x2, x2
...
ff
label: /* ommitted */
ROB
#
Operation
Rd
Previous Rd
Committed
Rolledback?
0
addi
p8
p0
Y
N
1
ld
p2
p8
Y
N
2
beq
X
X
Y
N
3
mul
p9
p5
N
Y
4
st
X
X
N
Y
5
sub
p6
p2
N
Y
Freelist
p8
p2
p9
p6
p0
p8
p6
p9
X
X
X
X
Rename Table
Arch. Register
Physical Register
x2
p0
p8
p2
p6
p2
x3
p5
p9
p5
(i)
[1 pt] After completing rollback, should an exception be raised?
*$
Yes
'!
No
The pagefault occurred along a mispredict path. Exceptions caused by rolledback instructions must be suppressed.
(ii)
[1 pt] After completing rollback, at what
PC
should execution continue at? Write
exception
if execution should
continue at the exception handler.
ff
SID: ________________________
15
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Q6. [10 pts] Multithreading (MT2)
(a)
Match each of advantages and disadvantages to the most appropriate type of multi-threading. Each advantage/disadvan-
tage should only be used once, so use it for the type of multithreading it
best
applies to.
Keep scratch work away from
the multiple choice options for each question.
1. Not possible on a single-issue processor.
2. Can, but not necessarily effective at, hiding the throughput losses from both long and very short stalls.
3. Useful only for reducing the penalty of very high-cost stalls, where pipeline refill is negligible compared to the stall
time.
4. Most effective at minimizing both horizontal and vertical waste.
5. In general, slows down execution of an individual thread, even a thread that is ready to execute and doesn’t have
stalls.
6. Doesn’t need thread switching to be extremely low overhead.
(i)
[2 pts] Coarse-grained multithreading:
Advantage:
*$
1
*$
2
*$
3
*$
4
*$
5
'!
6
Disadvantage:
*$
1
*$
2
'!
3
*$
4
*$
5
*$
6
(ii)
[2 pts] Fine-grained multithreading:
Advantage:
*$
1
'!
2
*$
3
*$
4
*$
5
*$
6
Disadvantage:
*$
1
*$
2
*$
3
*$
4
'!
5
*$
6
(iii)
[2 pts] Simultaneous multithreading:
Advantage:
*$
1
*$
2
*$
3
'!
4
*$
5
*$
6
Disadvantage:
'!
1
*$
2
*$
3
*$
4
*$
5
*$
6
(b)
[4 pts] Suppose we have a superscalar out-of-order CPU and want to add support for simultaneous multithreading to it.
Which of the following CPU components need to be duplicated to maintain program correctness?
■
Program Counter (PC) register
□
Physical registers
□
Functional units
□
Functional unit issue queues
□
Data memory ports
■
Architectural Register file
□
Branch predictor
*$
None of the above
SID: ________________________
16
Q7. [20 pts] Vectorizing Data Processing (MT2)
(a)
In data processing, one of the most basic types of processing is summing a column of a table (otherwise known as a
reduction) to a single overall value. Here is the psuedocode for an iterative sum reduction of a table’s column:
# Assumptions:
#
- ‘table’ is a 2D array of type table[row][col] and has
#
‘numRows’ rows and ‘numCols‘ columns
#
- ‘table’ is stored in column-major order
#
(all column values are stored contiguously in memory)
#
- ‘col‘ < ‘numCols‘ (within the bounds of the array)
int sum_col(int table[][], int col, int numRows):
int sum = 0;
for (int i = 0; i < numRows; ++i):
sum += table[i][col];
We would like to convert the code to a vectorized implementation for a substantial performance speedup. You initially
describe the following pseudocode for the vector implementation:
1. Set a scalar register sum to
0
2. Run a stripmine loop (each iteration does a vector length (
VL
) element partial sum)
(a) Naively load VL contiguous elements of column
col
into a vector register
(b) Sum the entire vector register and add partial sum to overall scalar register sum
(c) If more rows exist, loop back to (a) while also modifying next
VL
to be
max(VL, number of rows left)
3. Return the overall scalar register sum
(i)
[2+2 pts] If the table given was stored in row-major order (i.e. row values are stored in contiguous memory locations),
would the prior vector implementation break?
'!
Yes
*$
No
Explain in at most
two
sentences.
(Option 1) Now you would need to have strided load instructions that skip across row elements.
(Option 2) You were already using strided load instructions (of stride 1) so the prior instruction is
fine (just need to modify the stride).
(ii)
[2+2 pts] The sum operation in the vector implementation is done iteratively (partial sums are iteratively added in
the stripmine loop to the overall scalar sum). Is it possible to also vectorize these sums?
'!
Yes
*$
No
Explain in at most
three
sentences.
Use a binary-addition tree. Store all partial sums in another vector register, then reduce that vector
register.
SID: ________________________
17
(b)
While getting the sum of column values is great, sometimes data processing requires sorting the output values and returning
a new table. For the sake of simplicity, ignore the table from the previous parts. Instead we would like to sort a single
array in a vectorized way. To do this, we can use a vectorized version of the quicksort algorithm, which recursively sorts
an array by partitioning it (splitting it into 2 arrays) based on a chosen pivot element. This algorithm is known to be fast
if the partitioning step in quicksort can be vectorized. The
iterative
pseudocode for this partitioning operation is given
below assuming that the partition is done for an array that fits within a vector register completely:
1. set ‘pivotValue’ to a random element
2. move all values < ‘pivotValue’ one-by-one to left side of the array
3. move all values >= ‘pivotValue’ one-by-one to right side of the array
To help with this you are given new instruction called
vcompress
that allows elements selected by a vector mask register
from a source vector register to be packed into contiguous elements at the start of a destination vector register.
Example use of ‘vcompress’ instruction
8 7 6 5 4 3 2 1 0
Element number
1 1 0 1 0 0 1 0 1
v0 = mask reg.
8 7 6 5 4 3 2 1 0
v1 = source reg.
1 2 3 4 5 6 7 8 9
v2 = dest. reg.
<--- execute vcompress v2, v1, v0 <---
0 0 0 0 8 7 5 2 0
v2 *after* vcompress
(i)
[1+1+1+1 pts] You are given the following pseudocode that uses this new
vcompress
instruction to implement the
partitioning step in a vectorized/optimized way. Fill in the blanks with phrases.
1. Make a mask based on if elements are
greater (or less) than
the
pivotValue
.
2. Use
vcompress
with that mask to move elements into the vector register.
3. Store the ‘compressed’ destination vector to memory.
4.
Invert
the mask.
5. Repeat step(s)
2
.
6. Store values to memory
at an offset (memory now has contiguous vector)
.
7. Load contiguous vector length memory back into the vector register.
(ii)
[3 pts] If the partitioning step also required you to return the index of the
pivotValue
in the final vector register,
how could you use the mask register given to the
vcompress
to determine the index?
'!
(1) Sum all
1
’s in mask and use sum as index
*$
(2) Use highest index of 1 as index
*$
Use either (1) or (2)
*$
You can’t use the mask to determine the index
SID: ________________________
18
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(iii)
[2+3 pts] Assuming the
vcompress
instruction
doesn’t
exist, with strictly vector instructions (i.e.
no iterative
scalar loops), what single type of vector instruction not used in the pseudocode might allow you to implement this
functionality?
Scatter/Gather Load/Store
Why? Explain in at most
three
sentences.
(3 sentence version) Same as before, you just would replace the vcompress step and final vector
register load with scatter (and potentially gather) operations to store the values in contiguous memory.
Make a mask based on if elements are greater than the pivot. Scatter the values into consecutive
memory locations. Invert mask. Scatter the values into consecutive memory locations with offset.
Load the value back into the vector register.
SID: ________________________
19
Q8. [24 pts] Cache Coherence
(a)
[9 pts]
MOESI: Must Observe, Every State Identified!
Important
assumptions
for this question:
1.
?
is a processor with a private cache. So is
?
, and so is
?
.
2. All private caches follow the MOESI protocol, and can snoop on other caches using a shared bus.
3. The MOESI protocol referred to in this part can be viewed in the below diagram that contains the possible MOESI
state transitions.
SID: ________________________
20
For each of the following subparts, please select the initial and next cache states to represent the transition which the given
description best describes.
Here are what the options refer to: M (Modified), O (Owned), E (Exclusive), S (Shared), I (Invalid).
Important:
Consider each subpart
independently
of the others.
Assume location
?
is initially neither
in
?
’s nor
?
’s
cache, at the
start of each subpart
.
(i)
[2 pts] Both
?
and
?
have read from location
?
. At this point,
?
’s cache is in state
StateA1
, and
?
’s cache is in
state
StateB1
. Now,
?
writes to location
?
. At this point,
?
’s cache is in state
StateA2
, and
?
’s cache is in state
StateB2
.
?
’s cache transitions from
StateA1
to
StateA2
.
StateA1
:
*$
M
*$
O
*$
E
'!
S
*$
I
StateA2
:
'!
M
*$
O
*$
E
*$
S
*$
I
?
’s cache transitions from
StateB1
to
StateB2
.
StateB1
:
*$
M
*$
O
*$
E
'!
S
*$
I
StateB2
:
*$
M
*$
O
*$
E
*$
S
'!
I
(ii)
[2 pts]
?
reads
from location
?
. At this point,
?
’s cache is in state
StateA1
, and
?
’s cache is in state
StateB1
.
Now,
?
reads from location
?
. At this point,
?
’s cache is in state
StateA2
, and
?
’s cache is in state
StateB2
.
?
’s cache transitions from
StateA1
to
StateA2
.
StateA1
:
*$
M
*$
O
*$
E
*$
S
'!
I
StateA2
:
*$
M
*$
O
*$
E
'!
S
*$
I
?
’s cache transitions from
StateB1
to
StateB2
.
StateB1
:
*$
M
*$
O
'!
E
*$
S
*$
I
StateB2
:
*$
M
*$
O
*$
E
'!
S
*$
I
(iii)
[2 pts]
?
writes
to location
?
. At this point,
?
’s cache is in state
StateA1
, and
?
’s cache is in state
StateB1
. Now,
?
reads from location
?
. At this point,
?
’s cache is in state
StateA2
, and
?
’s cache is in state
StateB2
.
?
’s cache transitions from
StateA1
to
StateA2
.
StateA1
:
*$
M
*$
O
*$
E
*$
S
'!
I
StateA2
:
*$
M
*$
O
*$
E
'!
S
*$
I
?
’s cache transitions from
StateB1
to
StateB2
.
StateB1
:
'!
M
*$
O
*$
E
*$
S
*$
I
StateB2
:
*$
M
'!
O
*$
E
*$
S
*$
I
(iv)
[3 pts]
?
,
?
,
?
read from location
?
.
?
writes to location
?
.
?
reads from location
?
.
?
writes to location
?
.
?
reads from location
?
. At this point,
?
’s cache is in state
StateA1
,
?
’s cache is in state
StateB1
, and
?
’s cache is
in state
StateC1
.
StateA1
:
*$
M
*$
O
*$
E
'!
S
*$
I
StateB1
:
*$
M
*$
O
*$
E
*$
S
'!
I
StateC1
:
*$
M
'!
O
*$
E
*$
S
*$
I
SID: ________________________
21
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(b)
[15 pts]
Right Writes Despite Network Plights...
For directory based cache coherence, we have so far assumed that the network is reliable. What if it’s not? Without this
assumption, say we now have an
unreliable
network between the caches and the directory controller.
If a cache sends a request or response to the directory controller, the message might get dropped by the network instead
of reaching the directory controller. In that case, the directory controller would never see
that specific
message, since it
did not make it through the unreliable network
successfully. Similarly, a message
from the directory controller
may
never reach the cache it was intended for.
How can we still
ensure coherency
under these conditions?
For this problem, consider the following scenario.
Let
?
and
?
be cores. Let
??
represent the directory controller. The available messages are:
•
WriteReq(X)
: a write request to store data
?
into memory location
?
, from a cache to
??
.
•
WriteRsp()
: a write response for memory location
?
, from
??
to a cache.
•
ReadReq()
: a read request to load from memory location
?
, from a cache to
??
.
•
ReadRsp(X)
: a read response containing the data
?
that was at memory location
?
, from
??
to a cache.
•
InvReq()
: an invalidate request for removing memory location
?
from the cache, from
??
to a cache.
•
InvRsp()
: an invalidate response that memory location
?
has been removed from the cache, from a cache to
??
.
For this question,
assume
that
?
and
?
read and write from a
single memory location
?
, which is initialized to 0.
?
’s code:
read()
write(X)
?
’s code:
if read() == X:
write(Y)
R = read()
else:
R = Z
(i)
[2 pts] Assuming that
?
’s and
?
’s caches are coherent, what are the possible values of
R
?
□
0
□
?
■
?
■
?
SID: ________________________
22
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
As
?
and
?
execute their code, the following series of events takes place:
The same series of events in a list format:
1.
?
sends
??
a
ReadReq()
.
2.
??
receives
?
’s
ReadReq()
, and sends
?
a
ReadRsp(0)
.
3.
?
receives
??
’s
ReadRsp(0)
.
4.
?
sends
??
a
WriteReq(X)
.
5.
??
receives
?
’s
WriteReq(X)
.
6.
??
sends
?
a
WriteRsp()
, but this never makes it through the network to
?
.
7.
?
sends
??
a
ReadReq()
.
8.
??
receives
?
’s
ReadReq()
, and sends
?
a
ReadRsp(X)
.
9.
?
receives
??
’s
ReadRsp(X)
.
10.
?
sends
??
a
WriteReq(Y)
.
11.
??
receives
?
’s
WriteReq(Y)
, and sends
?
an
InvReq()
.
12.
?
receives
??
’s
InvReq()
, and sends
??
an
InvRsp()
.
13.
??
receives
?
’s
InvRsp()
, and sends
?
a
WriteRsp()
.
(ii)
[1 pt] After the above events have occurred, from the view point of
?
, what value does the memory location
?
currently have?
*$
0
*$
?
*$
?
*$
?
'!
Not in
?
’s cache
(iii)
[1 pt] After the above events have occurred, from the view point of
?
, what value does the memory location
?
currently have?
*$
0
*$
?
'!
?
*$
?
*$
Not in
?
’s cache
(iv)
[1 pt] After the above events have occurred, from the view point of
??
, what value does the memory location
?
currently have?
*$
0
*$
?
'!
?
*$
?
SID: ________________________
23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
So far, from
?
’s perspective, its
WriteReq(X)
never receives a response. Let’s assume that to resolve the issue of
?
never
receiving a write response from
??
,
?
sets up a timer.
With this new addition, when
?
sends out its initial write request, it starts the timer. If
?
has not received a corresponding
write response after some amount of time
?
1
,
?
will send another write request, and restart the timer.
Assume that
no
changes are made to
??
so far, and that upon receiving any write request from
?
,
??
will be able to
successfully send
?
a write response through the network.
?
’s timer goes off after
??
sends a
WriteRsp()
to
?
in the initial series of events. Then, the following events occur:
1.
?
sends another
WriteReq(X)
to
??
.
2.
??
receives
?
’s
WriteReq(X)
.
3.
??
sends
?
a
InvReq()
.
4.
?
receives
??
’s
InvReq()
and sends an
InvRsp()
.
5.
??
receives
?
’s
InvRsp()
and sends
?
an
WriteRsp()
.
(v)
[2 pts] After all the above events have occurred, from the view point of
?
, what value does the memory location
?
currently have?
*$
0
'!
?
*$
?
*$
?
*$
Not in
?
’s cache
(vi)
[2 pts] After all the above events have occurred, from the view point of
?
, what value does the memory location
?
currently have?
*$
0
*$
?
*$
?
*$
?
'!
Not in
?
’s cache
(vii)
[2 pts] After all the above events have occurred, from the view point of
??
, what value does the memory location
?
currently have?
*$
0
'!
?
*$
?
*$
?
Now,
?
wants to read what the value at memory location
?
actually is. The following events occur:
1.
?
sends
??
a
ReadReq()
.
2.
??
receives
?
’s
ReadReq()
.
3.
??
sends
?
a
ReadRsp(_)
.
4.
?
receives
??
’s
ReadRsp(_)
.
(viii)
[3 pts] What is the blank character (
_
) that was sent in the
ReadRsp(_)
in the above events? In other words, what
value does
?
actually get from its second
read()
?
*$
0
'!
?
*$
?
*$
?
*$
Not in
?
’s cache
(ix)
[1 pt] Considering the example broken down in the previous parts, should
?
send multiple write requests at any time
for the same
write()
function call in its code?
*$
Yes
'!
No
SID: ________________________
24
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Q9. [24 pts] Memory Consistency and Synchronization
(a) Memory Consistency True/False
For parts (i) through (iv), determine whether each given statement about memory consistency is true or false. Explain
your reasoning
in no more than two sentences
.
(i)
[1 pt] Memory consistency models are not applicable to uniprocessor systems.
*$
True
'!
False
A single processor can still have multiple threads that can share memory and thus, warrants a memory
consistency model.
(ii)
[1 pt] Memory consistency models are not applicable to systems without caches.
*$
True
'!
False
Though caches can certainly complicate implementations of memory consistency models, they are
not the reason for existence of memory consistency models. Memory consistency models pertain to
the order of execution of loads and stores in a single processor, and interactions with other processors
can affect the memory values.
(iii)
[2 pts] On an multicore system with 4 processors that utilize out-of-order completion, it is possible to implement
sequential consistency.
'!
True
*$
False
An example of this would be if the out-of-order cores implemented in-order commit of instructions,
and memory operations directly entered a FIFO queue connected to main memory.
(iv)
[2 pts] Adding a data prefetch unit alters the behavior of a sequentially consistent system.
'!
True
*$
False
Although the viable results of the code on the sequentially consistent system will not change, the
probability
of the results will change. The orderings of execution are dependent on the speed of
loading and storing data in memory, which the data prefetch unit affects.
SID: ________________________
25
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(b) Building a Strong Fence
(i)
[2 pts] In
no more than 2 sentences
, explain how the code below can digress from the desired functionality – refer
to the commented code to understand the goal. Assume the processor abides by a
weak memory consistency model
(fully relaxed constraints).
Ready
can be set to 1 without
A
being updated, allowing
B
to be updated with the wrong value of
A
.
Since there is no coded relationship between
Ready
and
A
, the processor can set
B
to
A
before checking
whether
Ready
equals 1.
(ii)
[4 pts]
Optimally insert fences in the code below for it to achieve the desired functionality.
Recall that
fence w,r
means that all write instructions prior to the fence must complete before all read instructions
after the fence. Combining constraints into one fence instruction will count as multiple fences;
fence w, wr
will
count as two fences, for instance. So optimally inserting fences would require choosing minimally invasive fences.
Write in your fence instructions (with proper syntax) between the lines of assembly code below.
Note that
x2
and
x3
in both P1 and P2 point to the memory address for
A
and
Ready
respectively. Note that
x6
in
P2 points to the memory address for
B
.
P1
P2
li x1
1
li
x1
1
sw x1
0(x2)
#A = 1
loop:
lw
x5
0(x3)
#While (Ready != 1);
sw x1
0(x3)
#Ready = 1;
bne x5
x1
loop
lw
x4
0(x2)
sw
x4
0(x6)
#B = A
P1
li x1 1
sw x1 0(x2) #A = 1
FENCE W, W
sw x1 0(x3) #Ready = 1;
P2
li x1 1
loop: lw x5 0(x3x) #While (Ready != 1);
bne x5 x1 loop
FENCE R, R
lw x4 0(x2)
sw x4 0(x6) # B = A
SID: ________________________
26
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(c)
[12 pts]
Implementing Synchronization Primitives
Recall the
load-reserved
and
store-conditional
synchronization primitives discussed in lecture. Your pet hamster gives
you the following two RISC-V atomic instructions, and tasks you with implementing a lock function for critical sections
of code in his new indie video game. The instructions use special register(s) to hold the reservation flag and address, and
the outcome of store-conditional.
lr.w rd, rs1
•
R[rd] = M[R[rs1]]
•
place reservation on M[R[rs1]]
sc.w rd, rs1, rs2
•
if M[R[rs1]] is reserved, then R[rd] = 0 and M[R[rs1]] = R[rs2]
•
else, R[rd] = 1
(i)
[4 pts] The first step is to implement the EXCH function, which uses the load-reserved and store-conditional syn-
chronization primitives to atomically exchange the value stored in
Mem[a0]
with
a1
. Fill in the first empty box with
the instruction that’s supposed to be in [BLANK 1], and the second empty box with the instruction that’s supposed
to be in [BLANK 2].
// Arguments:
//
a0: The memory address for the atomic exchange
//
a1: The value to be atomically written to Mem[a0]
// Returns:
//
a0: The previous value of Mem[a0]
EXCH:
lr.w t0, a0
[BLANK 1]
[BLANK 2]
mv a0, t0
ret
BLANK 1
sc.w t1, a0, a1
BLANK 2
bnez t1, EXCH
SID: ________________________
27
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(ii)
[3 pts] With this new atomic exchange synchronization function, you work with your hamster to develop the following
lock function for critical sections of code:
// Arguments:
//
a0: The memory address of the lock
LOCKIT:
addi, sp, sp, -8
sw ra, 0(sp)
sw s0, 4(sp)
mv s0, a0
spin:
mv a0, s0
li a1, 1
jal ra, EXCH
bnez a0, spin
lw ra, 0(sp)
lw s0, 4(sp)
addi sp, sp, 8
ret
However, you begin to notice significant performance issues in certain sections of the code when multiple threads are
competing for a lock. You are currently running the game on a potato which implements the
MSI
(Modified, Shared,
Invalid) cache coherence protocol. Why might the above lock function not be ideal for our particular coherence
setup?
Explain using (2) sentences max
.
Under the MSI cache coherence protocol, each processor would be generating write requests during
the EXCH function and thus invalidating each other during the spin procedure. This would generate
excessive bus traffic and degrade performance.
SID: ________________________
28
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(iii)
[5 pts] Concerned about the portability of the new indie game to platforms that implement MSI coherence, you
work with your hamster to implement a new LOCKIT function that avoids the issue above. Fill in the corresponding
blanks in the skeleton below to complete the LOCKIT function.
// Arguments:
//
a0: The memory address of the lock
LOCKIT:
addi, sp, sp, -8
sw ra, 0(sp)
sw s0, 4(sp)
mv s0, a0
spin1:
mv a0, s0
li a1, 1
spin2:
[BLANK 1]
[BLANK 2]
[BLANK 3]
bnez a0, spin1
lw ra, 0(sp)
lw s0, 4(sp)
addi sp, sp, 8
ret
BLANK 1
lw t0, 0(s0)
BLANK 2
bnez t0, spin2
BLANK 3
jal ra, EXCH
SID: ________________________
29
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
CS152 Handout #1 8 Appendix A. A Cheat Sheet for the Bus-based RISC-V Implementation
For your reference, we have reproduced the bus-based datapath diagram as well as a summary of some important information about microprogramming in the bus-based architecture. Remember that you can use the following ALU operations: ALUOp ALU Result Output COPY_A A COPY_B B INC_A_1 A+1 DEC_A_1 A-1 INC_A_4 A+4 DEC_A_4 A-4 ADD A+B SUB A-B SLT Signed(A) < Signed(B) SLTU A < B Table H1-2: Available ALU operations Also remember that
Br (
micro
branch) column in Table H1-3 represents a 3-bit field with six possible values: N, J, EZ, NZ, D, and S. If
Br is N (next), then the next state is simply (
current state
+ 1). If it is J (jump), then the next state is unconditionally
the state specified in the Next State column (i.e., an unconditional microbranch). If it is EZ (branch-if-equal-zero), then the next state depends on the value of the ALU’s zero
output signal (i.e., a conditional microbranch). If zero
is asserted (== 1), then the next state is that specified in the Next State column, otherwise, it is (
current state
+ 1). NZ (branch-if-not-zero) behaves exactly like EZ, but instead performs a microbranch if zero is not asserted (!= 1). If
Br is D (dispatch), then the FSM looks at the opcode and function fields in the IR and goes into the corresponding state. If S, the
PC spins if busy? is asserted, otherwise goes to (
current state +1). IR
A
B
32 GPRs + PC (32-bit)
RegWr
RegEn
MemWr
MemEn
MA
addr
addr
data
data
rs2
rs1
1(RA)
RegSel
Memory
zero?
ALUOp
Opcode
rd
32(PC)
busy?
lRLd
IntRq
Bus
ALd
BLd
MALd
ALU
ALUEn
Immed
Select
ImmEn
ImmSel
SID: ________________________
30
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you

Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781305480537
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Recommended textbooks for you
- Programming with Microsoft Visual Basic 2017Computer ScienceISBN:9781337102124Author:Diane ZakPublisher:Cengage LearningEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENTProgramming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:Cengage
- Microsoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,EBK JAVA PROGRAMMINGComputer ScienceISBN:9781305480537Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENTOperations Research : Applications and AlgorithmsComputer ScienceISBN:9780534380588Author:Wayne L. WinstonPublisher:Brooks Cole

Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage

Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,

EBK JAVA PROGRAMMING
Computer Science
ISBN:9781305480537
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole