Homework 6

docx

School

Montclair State University *

*We aren’t endorsed by this school

Course

280

Subject

Computer Science

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by PresidentTeamOkapi31

CSIT 545 (Fall 2023) Homework # 6 (Due December 13,2023 by 11:59 PM on Canvas “Assignment”) 1. [ 40 Points ] Assuming a benchmark program (i.e., workload) with p% of the instructions parallelizable, please analyze the speedup of a 10-processor system vs. 1-processor system using the following two methods: a. [ 15 Points ] Strong scaling (i.e., assuming fixed workload when analyzing the two systems). a. Strong Scaling: In strong scaling, we assume a fixed workload and analyze the speedup when increasing the number of processors. The speedup (S) is given by: S=T1/Tp where T1 is the execution time on a single processor and Tp is the execution time on p processors. b. [ 15 Points ] Weak scaling (i.e., assuming the workload of the parallelizable part grows proportionally to the increase in the number of processors). b. Weak Scaling: In weak scaling, we assume that the workload of the parallelizable part grows proportionally to the increase in the number of processors. The speedup (W) is given by: W= T1/ p.Tp where T1 is the execution time on a single processor and Tp is the execution time on p processors, and p is the number of processors. c. [10 points] Are the speedup results you obtained in (a) and (b) the same? If they are the same, please explain why you obtained the same results using two different methods; if they are different, do they lead to opposite conclusions about whether the performance is improved when scaling from 1 to 10 processors? Please justify your answer. c. Comparison and Justification: The speedup results obtained in (a) and (b) may or may not be the same. If the percentage of parallelizable instructions (p) is relatively high, strong scaling may result in better speedup because the fixed workload benefits from the increased parallelism on multiple processors. However, in weak scaling, the workload is allowed to grow with the number of processors. If the increase in the workload compensates for the increase in processors, weak scaling may result in a more modest speedup compared to strong scaling. In general, the results from (a) and (b) may be different. If the speedup in (a) is greater than in (b), it suggests that the program does not scale well with an increasing number of processors when the workload grows proportionally. Conversely, if the speedup in (b) is greater than in (a), it suggests that the program benefits from increased parallelism when the workload remains fixed. It's important to note that the results do not necessarily lead to opposite conclusions about whether the performance is improved when scaling from 1 to 10 processors. The

interpretation depends on the specific values obtained for speedup in each case and the characteristics of the workload and parallelization. 2. [ 60 Points ] Given the following C program: for (i = 0; i < 50; i = i + 1) A[i] = A[i] + s; We assume that the base address of array A in memory is in register x18, and variable s is in register x22. a. [20 Points] Please write the functionally equivalent assembly code using the regular single processor version of RISC-V, with a comment (using “//”) to explain each line of the code. // Load the base address of array A into a register lui x19, %hi(A) // Load upper immediate: x19 = upper 20 bits of A's address addi x19, x19, %lo(A) // Add immediate: x19 = x19 + lower 12 bits of A's address // Load the value of s into a register ld x23, 0(x22) // Load doubleword: x23 = *(x22 + 0) // Loop: Iterate 50 times li x30, 0 // Initialize loop counter x30 = 0 loop_start: bge x30, 50, loop_end // Branch if x30 >= 50 to loop_end ld x24, 0(x19) // Load doubleword: x24 = *(x19 + 0) add x24, x24, x23 // Add: x24 = x24 + x23 sd x24, 0(x19) // Store doubleword: *(x19 + 0) = x24 addi x19, x19, 8 // Add immediate: x19 = x19 + 8 (assuming 8 bytes per doubleword) addi x30, x30, 1 // Increment loop counter j loop_start // Jump to loop_start loop_end: b. [20 Points] Please write the functionally equivalent assembly code using the vector extension of RISC-V, with a comment (using “//”) to explain each line of the code. (Note: You can make assumptions about the vector length and element size for the vector processor, which can be expressed in either a comment in English or using a RISC-V instruction. Also, the RISC-V vector instruction descriptions in the page https://github.com/riscv/riscv - v spec/blob/master/v - spec.adoc may help you understand and choose the right instructions to use for this program.) // Load the base address of array A into a vector register vsetvli t0, 4, e8 // Set vector length to 4, element size to 8 bytes (assuming 8-byte doublewords) lv v19, 0(x18) // Load vector: v19 = *(x18 + 0)

// Load the value of s into a scalar register ld x23, 0(x22) // Load doubleword: x23 = *(x22 + 0) // Loop: Iterate 50 times li x30, 0 // Initialize loop counter x30 = 0 loop_start: bge x30, 50, loop_end // Branch if x30 >= 50 to loop_end vadd v19, v19, x23 // Vector add: v19 = v19 + x23 addi x30, x30, 1 // Increment loop counter j loop_start // Jump to loop_start loop_end: sv v19, 0(x18) // Store vector: *(x18 + 0) = v19 c. [20 Points] Comparing your code in (a) and (b) , please discuss at least two benefits of using vector processor and justify your answers using the actual code you wrote. (Hint: Page 14 of Slides 8 may provide you some insights about the benefits you may consider discussing. Benefits of Using Vector Processor: Parallelism: Vector processors can perform the same operation on multiple data elements simultaneously. In the vectorized version, the addition operation is applied to the entire vector in a single instruction (vadd v19, v19, x23), leading to parallel execution. This parallelism reduces the loop overhead and improves overall throughput. Reduced Instruction Overhead: In the regular single processor version, we need multiple instructions to load, operate, and store data for each element in the loop. In the vector processor version, a single vector instruction (vadd) handles multiple elements at once, reducing the number of instructions executed in the loop. This reduction in instruction count can lead to improved efficiency and reduced program execution time. By examining the vectorized code, we can see that the vector processor performs the same computation on multiple data elements in parallel, demonstrating the advantages of parallelism and reduced instruction overhead. These benefits contribute to enhanced performance in scenarios where vectorization is applicable.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

truefalse1.docx

ENGL107_U4_OUTLINE.docx

Deeper learning approaches.docx

ECE 563 - Project 1.docx

15523073.docx

Decision_Trees_Practice_Questions.pdf

Homework 4.docx

Module 3 Homework Assignment (1).pptx

CSIT555_hw1.docx

Module 5 Homework Assignment (1).pptx

Heffron_Joseph_Lab_05.docx

Subnet Questions -solution.doc

Recommended textbooks for you

C++ for Engineers and Scientists

Computer Science

ISBN:9781133187844

Author:Bronson, Gary J.

Publisher:Course Technology Ptr

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

Microsoft Visual C#

Computer Science

ISBN:9781337102100

Author:Joyce, Farrell.

Publisher:Cengage Learning,

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning