FT-Review-NS

pdf

School

University of Houston *

*We aren’t endorsed by this school

Course

3337

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by samiyakhtar

COSC 3337: Data Science I Fall 2023 Sample Final Exam Questions Duration: 105 minutes 1. Hierarchical Clustering a) A dataset consisting of object A, B, C, D, E and F with the following distance matrix is given: distance A B C D E F A 0 9 8 1 3 11 B 0 2 6 5 12 C 0 7 10 4 D 0 15 13 E 0 14 F 0 a) Assume single 1 link hierarchical clustering is applied to the dataset! What dendrogram will be returned? [7] b) How does hierarchical clustering differ from more classical clustering algorithms, such as K- Means and DBSCAN? 2. Outlier Detection a) What is an outlier? b) Why is it usually more challenging to detect object anomalies than attribute anomalies? c) Propose either a Model-based Outlier Detection Approach or an outlier detection approach that uses clustering to determine object anomalies in a dataset 2 . Describe how you proposed approach detects outliers in some detail! 1 When assessing the distance between clusters the minimum distance is used. 2 If you use an approach that is neither model-based nor cluster-based you will not receive any points for your answer!

3. Data Science Data Science has gained a lot of importance in the last 5 years? Why do you believe this is the case? Write an essay answering this question, limiting yourself to 7-14 sentences! Use full sentences in your writing! 4. Preprocessing a) Dimensionality reduction is quite important in many data mining/data analysis projects. Why do you believe this is the case? b) What is the goal of Principal Component Analysis? What is a principal component? How are principal components used for dimensionality reduction? 5. SVM and Kernels a) The soft margin support vector machine solves the following optimization problem: What does the first term minimize (be precise!)? What role does C play? How many examples are misclassified in the figure below!

6. DBSCAN a) What are the characteristics of a border point in DBSCAN? b) Assume you run DBSCAN with MinPoints=6 and epsilon=0.1 for a dataset and obtain 4 clusters and 2% of the objects in the dataset are classified as outliers/noise points. Now you run DBSCAN with MinPoints=8 and epsilon=0.1. How do expect the clustering results to change?

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Data sharing via HIE.docx

HG_Scraping_Urls.py

Assignment-02-Solutions.pdf

Assignment-05-Solutions.pdf

Assignment-03-Solutions.pdf

Assignment-04-Solutions.pdf

Rev-Fi23.docx

CIS256L Project 1.3_Frankie_Gunn.docx

CIS256L_InClass_Activity_1.2_Frankie_Gunn.docx

CIS 251 Guided Practice 5.6.docx

Unit 2 CT.docx

PO_BOX.py

Recommended textbooks for you

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

C++ for Engineers and Scientists

Computer Science

ISBN:9781133187844

Author:Bronson, Gary J.

Publisher:Course Technology Ptr

Programming Logic & Design Comprehensive

Computer Science

ISBN:9781337669405

Author:FARRELL

Publisher:Cengage

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
Oracle 12c: SQL
Computer Science
ISBN:9781305251038
Author:Joan Casteel
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning