HW1

pdf

School

Illinois Institute Of Technology *

*We aren’t endorsed by this school

Course

530

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by ProfCloverDinosaur47

Report
ECE-530 High Performance VLSI IC Systems Illinois Institute of Technology High Performance VLSI IC Systems ECE-530 Homework 1 System‐Level Power Reduction by using WATTCH Professor - Kyuwon Choi TA - Nader Alnatsheh Submitted By - Abhishek Varma Date - 02-01-2024
ECE-530 High Performance VLSI IC Systems A) Sorting Algorithm Comparison 1) In less than 50 words, describe the five sorting algorithms and explain the pros and cons of each one of them, and from your explanation, which algorithm you guess will have the best power estimation results. By estimating the power drawn by the sorting algorithms on the same buffer size of 100 it can be concluded that Shell sort algorithm is the most power effective algorithm while Selection sort is the most power intensive sorting algorithm. Shell SA - It is the fastest sorting algorithm among the five algorithms. Shell sort is similar to Insertion sort the main difference between these two is the gap value here we can declare a gap value which helps to compare 2 digits placed at different indexes apart with a gap value. This unique feature aids in optimizing the sorting process and conserving computational resources. Shell Sort essentially becomes equivalent to Insertion Sort when (gap = 1). Here in our code we are diving the gap size by half after every for loop execution. Pros - Consume less power among the five. Cons - Won’t be effective if gap value is not selected properly (gap!=1). Merge SA - In Merge sort we are subdividing the unsorted array into smaller portion until we get the individual elements. Once we get the individual array elements we will start merging the array with sorted elements until we get the similar sized array with sorted elements in it (It uses divide and conquer approach). Pros - Fast. Cons - Diving and merging causes a lot of data movement and use more memory space. Heap SA - It is an ordered binary tree where we create a max heap (parent>child) from the unsorted array. With the given unsorted array first we create a binary tree in which we place the max value on the top by doing the comparisons once we get the largest element in the tree on the top we will swap the top most element with the bottom one this is called max heap. Pros - Efficiently do deletion and insertion of new elements. Cons - Slow and complex. Insert SA - In Insertion sort we have the sorted side (left) and unsorted side (right) here we will start with the 1 index and compare it with the 0 index if the value in the 1 index is bigger than the 0 index then we will swap both the values by using a temporary register in this way it will goes until the right most index. Pros - Simple to implement. Cons - a lot of data movement happens. Selection SA - It is the most Power Intensive sorting algorithm it goes through every index and compare it with each other to get the smallest value and then it will swap it with the 0 index and to get the second sorted value it will again go for search and replace it with the 1 index in this way we will be getting a sorted array. Pros - Simple to implement. Cons - Consume a lot of power. 2) Sorting Algorithm Average Power Shell SA 15.4417 Merge SA 15.6744 Heap SA 15.7788 Insert SA 16.1978 Selection SA 16.3992 Table 1. Power Estimation Results
ECE-530 High Performance VLSI IC Systems 3) Did your answer in 2 matched your answer in 1 ? Yes, Shell sort is the most power effective algorithm among the 5 it will perform better when we have the data which is almost sorted in this scenario it doesn’t need to swap the values this is one of the reason why this is power effective. 4) Comparison analysis for each algorithm (why do you have more/less power for some sorting algorithms) The reason why we are having different power consumption results on the same data for different sorting algorithms is - 1. Some of the algorithms take advantage of how the data is arranged initially if the data is spread sparsely then the algorithm will require more power to sort it like shell sort. 2. Based on how the algorithm is implementing the sorting also tells why we are getting the difference in the power for example Selection sort which draws the most power will go through the entire buf for each element means for the 0th location element it is going to search for all 99 other elements and then store it in the temporary variable which consumes a lot of power it will do the same for all other remaining elements. Algorithms that have more Comparison logic and which do a lot of swapping incurs more power. With Loop Unrolling & Default Cache Configuration Before unrolling the loop, the power was 14.7937, and the CPI was 0.9921. After recursively implementing loop unrolling across multiple for loops, I successfully unrolled a total of five for loops, each being the innermost loop. Despite attempting to unroll the outer for loop, the weights in the output file (output.txt) remained consistent, resulting in an overall increase in power consumption. Analyzing the table below, it is evident that in the third loop, power increased due to the unrolling process. However, upon further unrolling in the fourth iteration, a significant reduction in power was observed, dropping from 14.7387 to 14.4364. This reduction in power would not have been achievable without the inclusion of the third loop in the unrolling process. After successfully implementing the loop unrolling on the given file Slack.c overall power has been reduced from 14.7937 to 14.3617 marking a 2.96% decrease in power. Figure 1. Average Power Before Unrolling
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ECE-530 High Performance VLSI IC Systems Figure 2. Average Power After Unrolling Table 2. Average Power Value and CPI Values With Each Unrolling Table 3. Average Power Before and After Unrolling Figure 3. Slack.output Slack.c ORIGINAL FIRST SECOND THIRD FOURTH FIFTH Avg_total_power_cycle_cc3 14.7937 14.6862 14.6466 14.7387 14.4364 14.3617 Sim_CPI .9921 1.0070 1.0086 1.0033 1.0439 1.0575 Average Power (Before) Average Power (After) 14.7937 14.3617 (2.96%)
ECE-530 High Performance VLSI IC Systems With Loop Unrolling & Modified Cache Configuration Figure 4. Average Power After Unrolling & Default Cache Here are four tables displaying various cache configurations. It is noteworthy that the most energy-efficient parameters for both level-1 (L1) and level-2 (L2) caches are achieved when both cache sizes are set to 128. Table 4. Average Power Value and CPI Values for L2 Data Cache = 128 Table 5. Average Power Value and CPI Values for L2 Data Cache = 256 Table 6. Average Power Value and CPI Values for L2 Data Cache = 512 L2 data = 128 Average Power L1 data (128 sets) L1 data (256 sets) L1 data (512 sets) L1 inst (128 sets) 10.8832 / 1.5338 11.5957 / 1.5323 12.6141 / 1.5323 L1 inst (256 sets) 12.8008 / 1.1358 13.6374 / 1.1343 14.8305 / 1.1343 L1 inst (512 sets) 13.8797 / 1.0681 14.7381 / 1.0666 15.9598 / 1.0666 L2 data = 256 Average Power L1 data (128 sets) L1 data (256 sets) L1 data (512 sets) L1 inst (128 sets) 10.9776 / 1.5249 11.6896 / 1.5246 12.7102 / 1.5246 L1 inst (256 sets) 12.9054 / 1.1268 13.7407 / 1.1266 14.9381 / 1.1266 L1 inst (512 sets) 13.9932 / 1.0584 14.8503 / 1.0581 16.0774 / 1.0581 L2 data = 512 Average Power L1 data (128 sets) L1 data (256 sets) L1 data (512 sets) L1 inst (128 sets) 11.0107 / 1.5240 11.7223 / 1.5240 12.7432 / 1.5240 L1 inst (256 sets) 12.9385 / 1.1260 13.7730 / 1.1260 14.9707 / 1.1260 L1 inst (512 sets) 14.0266 / 1.0575 14.8827 / 1.0575 16.1102 / 1.0575
ECE-530 High Performance VLSI IC Systems Table 7. Average Power Value and CPI Values for L2 Data Cache = 1024 EXTRA WORK - Using the BLOCK PARTITIONING Method in (Second Loop) - Below I have compared my previous result with the code using the Block Partitioning although there is not much difference between the Average Power when not using the Modified Cache which is (0.22%) but as we use the Block Partitioning with the Modified Cache we can see a significant reduction in power which is (1.10%) . Figure 8. Average Power With Loop Unrolling using Block Partitioning Table 9. Average Power With Loop Unrolling & Modified Cache using Block Partitioning Figure 5. Average Power With Loop Unrolling & Modified Cache using Block Partitioning L2 data = 1024 Average Power L1 data (128 sets) L1 data (256 sets) L1 data (512 sets) L1 inst (128 sets) 11.3743 / 1.5240 12.0854 / 1.5240 13.1062 / 1.5240 L1 inst (256 sets) 13.2804 / 1.1260 14.1142 / 1.1260 15.3119 / 1.1260 L1 inst (512 sets) 14.3617 / 1.0575 15.2171 / 1.0575 16.4446 / 1.0575 Average Power (Before) Average Power (After) Average Power (After Block Partitioning) 14.7937 14.3617 (2.96%) 14.3301 (0.22%) L2 data = 128 Average Power L1 data (128 sets) L1 data (128 sets) L1 inst (128 sets) 10.8832 / 1.5338 10.7637 / 1.5439 (1.10%)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help