Leela Srija Alla

pdf

School

University at Buffalo *

*We aren’t endorsed by this school

Course

4

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

8

Uploaded by ProfessorBarracuda3939

Report
.on Only. Created with Aspose.PDF, Copyright 2002-2023 Aspose Pty Ltd. Fall 2023 CSE 487/587B Midterm Exam 10/06/2023 @ 13:00 hrs, Norton 190 Name:| Leela Srija Alla Person #| 5539261 UBIT:| lalla Seat#| ES Academic integrity My signature on this cover sheet indicates that | agree to abide by the academic integrity policies of this course, the department, and university, and that this exam is my own work. Signature:; /Kl/ pate: _ &Y Odoben 2023 Instructions 1. This exam contains 7 total pages (including this cover sheet). Be sure you have all the pages hefore you begin. 2, Clearly write your name, UBIT name, person number, and seat number above. Additionally, write your UBIT name at the top of every page now. 3. You have 3 hours to complete this exam. Show all work where appropriate, but keep your answers concise and to the point. 4, After completing the exam, sign the academic integrity statement above, Be prepared to present your UB card upon submission of the exam paper. 5. You must turn in all of your work. No part of this exam booklet may leave the classroom, DO NOT WRITE BELOW Q1 Q2 Q3 Total L& I 15 45 20 20 20 60
.8IT Name: CSE 4/587 - Fall 2023 Question 1 - [20 Points] a) Differentiate between Structured and unstructured data. [3 points] Sirackuved data - Dot that has o schemo before storing into dnatobas (} \/4._ Relational Dotabkase 27 Uneruckured doka- Pata that hat no sthema wnkl it is read €x: sotial medi b) Name one concrete example of a data-intensive application that you interact [1 points] lo™ t;;:t_m with on a frequent basis Spam cdasification , Sccoch €ngines. Ceanch €ngine Lke goode }/L some,H-\\‘j Hred e tire Jazfd, ¢) In one sentence, mention the objectives of: [4 points] iy Data cleaning 8 t of doton Wi O Data with no M ixing valuer and a&k the 23}\\' vfarmc\ h iy EDA - and b dea 4 the dota wiky Aosls like h"“’?("‘"" ] o G Un stan plot so we can oo petform dotn cleaning o (/ f,nsinz.wima tater oM. tune Selectiom O 9 - d) What are the different types of analysis that we can perform on a dataset? [4 points] Des C(;PH\IC A—ng(U s Summonis alien of Dater. viflghor\‘c Anal5$|‘s Data dta"""g’ W’“”B The «forma"‘ Predickive Hnalysis Use machine Algocithms te woke dusivion, P(u&ipfivc fiha(fliv‘sf Use Oth-'zahm\ methods 1o opfimélc the v / models. #
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
L UBIT Name: CSE 4/587 - Fall 202¢ e) For each of the following, determine whether you would use an algorithm to [6 points] predict, cluster, or classify. iy Determining the sale price of a stock given historical prices Predict v iv) Categorize forum users into distinct categories based on usage habits lams {u o~ v) Determine whether or not it will rain on a given day M‘& Hedick cla w‘FH vi) Fillin missing artists in your music collection Paedict X vii) Estimate how many people will show up to a concert Daedict (/ Divide your dataset in order to apply different models to each segment Awten |/ f) In linear regression, what does the error term, ¢, capture? Why we need to use it in the linear [2 point] viii) regression model. $. coptures the nove in the dotar. s the opl—ima& hext {r"m'n: toe line we gob with £ wil tell ws ek eontides lne P dice and Line B % oun ’G‘.k' CD-QU.JQH% best fit & not Y ae® Uoe B ‘,— | ow Dine so & ine 8 2> e A E-
AT Name: CSE 4/587 - Fall 2023 Question 2 - [20 Points] (/ a) Lists two weaknesses of k-Means. [4 points] A( > Should have domain kno LO[‘ASG and give an optimal k o4 iPpue . Sphenieol disten amd thuas not h(\ndh’ng oudrerns well 5 Comnot handle cluter vaappinfi (somehmes data paint <an betong fo both domer) cluatery) b) Name the 3 classifiers we have covered in the class. For each one and [6 points] e state whether it is structural or statistical. V' 5 k- Nearest Mdflhhmm NN - shruckoal v - Naive QaSa closifien - stabiskicol _fi/"’ linean Regvesmiom and loglshc Qt*jfck\‘c-n— statistical o \/ c) Can we use Linear regression or K-NN algorithm to detect Spam emails? If not then A why? [2 points ] = Uneon fegrewion caprot be wsed ,{.‘g( spom emnotl clawifiodion as 1t does* not daw-{—\j the dota but pJu.dJch > NN can be ced foc cpom clawfication bat can Jead o pooc vesudbs, cane high dirmentiond dato.. d) From the following classification data, compute accuracy, precision, and recall [6 points g v Actually Positive Actually Negative Predicted Positive 5 30 Predicted Negative 20 45 Y Peewna 'P"'T” | SHHS | 0| e ud (0D 100 \ ol—pJ obsayvakion = X ?mu'w‘u-n = 'L')— = 1 5 —— 5 0°2 TP+FP <120 ac _ §< Recall > TP o % g S 5\/3, TP4Pn 3D
UBIT Name: CSE 4/587 - Fall 202, 6 )( . How does SVM allow Us to classify higher dimensional datasets? [2 Point] - dosrfiers trat woct dor highex dimentrand SY M can we ANY = e doonets. B 1 alvo hondin ouHien by altowing miseloamificchion it- can bhandle high dicnenhion datesets. Question 3 - [20 Points] a. Imagine we are trying to predict the weather based on the hat-wearing habits of the UB student population. Specifically, we have conducted a study over the past 4 academic years with 1000 student volunteers. Each day we have collected data about whether it was sunny or not, and which student volunteers wore hats that day. The dataset contains information for 1200 days 180 of these days were categorized as sunny Each day also has recorded whether or not each of the 1000 volunteers wore a hat O X 1. What are the "features” of this dataset, and how many features are there? [3 points] —— r,\‘v\o\tpmd,onl' vasrabte (%) - Studumt teasing hat (0 &) JJ’P‘-ndf—*’\cL"”fi 5\) wohekhes studunt © weaning o hot & nok. TThere ® —thirc. {TThe. Sestiges ate. Khe Stuckrt, volurteers. There ae (000D 2 Based on our analysis, Alice wore a hat on 400 days over the duration of [3 points) 5 the experiment. 150 of these days were sunny days. Given these counts, use Bayes Law to estimate the probability that Alice is wearing a hat on a sunny day. PC Plce ww’\i‘fifi o) = ‘fiz’ ) ga(jcg louwo | 200 \ p(A]s) = P(%n) PP 4 . . 1€0 Pl 5“”‘"% / Hlrece wearing hat) = 4_5-5 P(s) | &0 C Sunny) 7 —— P j 1200 pral) = AEZ g doL TR PCAl doo X zep = 2 =2 132] 1200
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
SIT Name: CSE 4/587 - Fall 2023 u b. What are the four V s of Big data? 4 poirts] Volume: Honding lange Volusnes of dater. i i v K di dornains \-/ Venieky : Handung doto, in diffacnt formak Afrom diffoen . B is hondled veiocity: Speed with which datar s hendle Reliabilety 6b dotn. PDatn can be uncedoin Somebimes \/Wu"'j H u c. List two differences betwee\r}-mean and K-NN. [4 points'\]/ —_ NN b clossifying olgorithm white k- meaans f+ > chteng elgarithe, (] etieons S ur\gu,Pm\'scd whene as kNN 5&[)2,\;\1(‘%&_ 4. [6 points] e CRIM ZN INDUS CHAS NOX RM AGE DI$ RAD TAX PTRATIO B LSTAT 0 000832 180 231 00 0538 6575 652 40800 1.0 2860 153 396,90 4908 1 002731 00 107 00 0469 6421 7B 49871 20 2420 178 396.90 .14 2 002729 00 707 00 0468 7185 611 49871 20 2420 17.8 39283 403 3 003237 0.0 2.8 0.0 0459 6998 458 60622 30 2220 187 35463 284 4 008805 G0 218 00 0458 7.447 542 60622 30 2220 18.7 396.80 533 5 002885 00 218 00 €458 6430 587 60822 3.0 2220 187 38412 5.21 8 008826 12.5 787 00 0524 6012 666 55605 50 3110 162 36560 1243 7 0.14455 125 7.87 00 0.524 6172 861 58505 50 311.0 152 39680 1915 8 021124 125 787 00 0.524 5831 1000 60821 50 3110 152 38663 26.93 9 017004 125 7.87 00 0524 6004 859 65821 50 3110 152 38671 17.10 1. For the given dataset above what are features? >?¢ 'F‘alfuvu ore CRIM, IN, INDUs , CHAS, Nox, RM, Aerd, PTRATO, &% 2. How many rows should be there if there are ‘N’ columns in the dataset ? Num ber 8] YOws should be very (ahSc,\, Haan N / ROW&M77N > 3. What if there are some¥eatures missing for a particular row ? what you will do ? ~ Ce N We @n eithe olhoP the rom of Hill the mining featune waing f DS, RAD, TA X, mean & Other Aechnigues —_— \_/
CSE 4/587 - Fall 2(52;,, UBIT Name: Scrap Paper PN FP TN ~ ?&S\@ Puu = 1PN {;:] . ?(MQ < Ford 180 oy @Y ey 200 b e st shaderch 159 yos Sig ¥l —E; ° sunhy Y Yoo > ! n 140 vos - Mo Al . 00 | P Pruncain) = B2 P99 <5, 1000 fi% o0 P Grao 20 PLS| Prom - 45T ealic 6 J 00 o Pre - 130 7