practice_final

pdf

School

University at Buffalo *

*We aren’t endorsed by this school

Course

578

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by ProfessorBarracuda3939

CSE 4/587 - Final Exam 12/14/22 @ 8AM, NSC 201 Name: UBIT: Person #: Seat #: Academic Integrity My signature on this cover sheet indicates that I agree to abide by the academic integrity policies of this course, the department, and university, and that this exam is my own work. Signature: ____________________________________ Date: __________________________ Instructions 1. This exam contains 7 total pages (including this cover sheet). Be sure you have all the pages before you begin. 2. Clearly write your name, UBIT name, person number, and seat number above. Additionally, write your UBIT name at the top of every page now. 3. You have 3 hours to complete this exam. Show all work where appropriate, but keep your answers concise and to the point. 4. After completing the exam, sign the academic integrity statement above. Be prepared to present your UB card upon submission of the exam paper. 5. You must turn in all of your work. No part of this exam booklet may leave the classroom. DO NOT WRITE BELOW Q1 Q2 Q3 Q4 Q5 Q6 Total 20 30 40 15 35 10 150

UBIT Name: CSE 4/587 - Fall 2022 Question 1 - Ethics in Data Science [20 Points] The diagram to the right shows the data science pipeline we've referred to over the course of the semester. Each stage of the pipeline has been labeled with a number, which you can use in the following questions related to ethics in data science. a) Name and define ( in at most one sentence ) 4 of the 6 types of colloquial [12 points] biases discussed in class. b) For each of the 4 types of bias mentioned above, name one of the stages [4 points] in the data science pipeline where this type of bias may show up. c) Choose ONE of the 4 types of bias mentioned above and briefly describe a [4 points] solution to correct for that type of bias, and which stage(s) of the pipeline this fix would involve. 2

UBIT Name: CSE 4/587 - Fall 2022 Question 2 - Distributed Systems [30 Points] a. List 2 distinct benefits that Spark has over MapReduce [6 points] b. Briefly explain how HIVE improves programmer productivity when [2 points] compared to MapReduce c. In one sentence explain the primary way fault-tolerance is achieved in: i. HDFS [4 points] ii. Spark [4 points] d. Explain the difference between a transformation and an action in Spark [4 points] 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

UBIT Name: CSE 4/587 - Fall 2022 e. Name one transformation that results in a narrow dependency and draw a [5 points] DAG showing how this transformation acts on the RDDs and their partitions f. Name one transformation that results in a wide dependency and draw a [5 points] DAG showing how this transformation acts on the RDDs and their partitions 4

UBIT Name: CSE 4/587 - Fall 2022 Question 3 - Spark [40 Points] For Question 3, refer to the following Spark code. Assume that the number of files in the files list is 2, and that fullOuterJoin is a transformation resulting in a narrow dependency. 1 2 3 4 5 6 7 8 9 10 11 12 result = None for file in files: start = sc.textFile(file) middle = start.flatMap(lambda x: x.split(' ')) \ .map(lambda x: (x, 1)) \ .reduceByKey(lambda a,b: a + b) if result == None: result = middle else: result = result.fullOuterJoin(middle) result.collect() a. Draw the lineage graph for the RDD result right before the call to [10 points] result.collect() . Include nodes for all intermediate RDDs, even if they are unnamed. 5

UBIT Name: CSE 4/587 - Fall 2022 b. How many stages will your DAG be broken up into for execution? [10 points] Draw the stage boundaries in your diagram. c. Identify in the above code ( excluding line 11 ) one instance of the following: i. A transformation that results in a wide dependency [4 points] ii. A transformation that results in a narrow dependency [4 points] iii. An action [4 points] d. How many "jobs" will the above code run? [4 points] e. What algorithm is the above code an implementation of? [4 points] 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

UBIT Name: CSE 4/587 - Fall 2022 Question 4 - Classifiers [15 Points] a) Name the three classification algorithms we described in lecture. For each [6 points] algorithm state whether it is a structural classifier, or statistical classifier. b) In the second half of the class we used two motivating examples when [4 points] discussing classifiers: spam email classification, and ad click probability. State what was used as the features for each example, and give one reason why k-NN may not be effective for these two use cases. c) Naive bayes is an exceedingly simple algorithm. Name one reason why it [2 points] can still be effective for something like spam classification. d) How can we adapt binary classifiers to multi-class problems? [3 points] 7

UBIT Name: CSE 4/587 - Fall 2022 Question 6 - Misc [10 Points] a) What are the four Vs of data intensive computing? [2 points] b) Name two sources of uncertainty in data. [2 points] c) Briefly explain why fault tolerance in distributed systems is important. [2 points] d) Name one concrete example of a data intensive application that you interact [2 points] with on a frequent basis. e) Name two solutions you could employ if the dataset you are working on [2 points] becomes too big to store/process in a timely manner. 11

UBIT Name: CSE 4/587 - Fall 2022 Scrap Paper 12

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

UBIT Name: CSE 4/587 - Fall 2022 Scrap Paper 13

UBIT Name: CSE 4/587 - Fall 2022 Scrap Paper 14

Related Documents

Lab 11-15-23.docx

SCS 100 Module Four Activity Template.docx

SCS 100 Module Five Activity Template.docx

7-1 project.docx

Lab7_Answer_Sheet_Pro JA.docx

Leela Srija Alla.pdf

F23 Midterm Exam.pdf

practice_final_key.pdf

StudyGuideForExam2-2023F.pdf

Impact of Computing Resources.pdf

Kyle Jones Milestone One.docx

M08-Assignment.docx

Recommended textbooks for you

Programming Logic & Design Comprehensive

Computer Science

ISBN:9781337669405

Author:FARRELL

Publisher:Cengage

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning