FT-Review-NS

pdf

School

University of Houston *

*We aren’t endorsed by this school

Course

3337

Subject

Computer Science

Date

Jan 9, 2024

Type

pdf

Pages

3

Uploaded by samiyakhtar

Report
COSC 3337: Data Science I Fall 2023 Sample Final Exam Questions Duration: 105 minutes 1. Hierarchical Clustering a) A dataset consisting of object A, B, C, D, E and F with the following distance matrix is given: distance A B C D E F A 0 9 8 1 3 11 B 0 2 6 5 12 C 0 7 10 4 D 0 15 13 E 0 14 F 0 a) Assume single 1 link hierarchical clustering is applied to the dataset! What dendrogram will be returned? [7] b) How does hierarchical clustering differ from more classical clustering algorithms, such as K- Means and DBSCAN? 2. Outlier Detection a) What is an outlier? b) Why is it usually more challenging to detect object anomalies than attribute anomalies? c) Propose either a Model-based Outlier Detection Approach or an outlier detection approach that uses clustering to determine object anomalies in a dataset 2 . Describe how you proposed approach detects outliers in some detail! 1 When assessing the distance between clusters the minimum distance is used. 2 If you use an approach that is neither model-based nor cluster-based you will not receive any points for your answer!
3. Data Science Data Science has gained a lot of importance in the last 5 years? Why do you believe this is the case? Write an essay answering this question, limiting yourself to 7-14 sentences! Use full sentences in your writing! 4. Preprocessing a) Dimensionality reduction is quite important in many data mining/data analysis projects. Why do you believe this is the case? b) What is the goal of Principal Component Analysis? What is a principal component? How are principal components used for dimensionality reduction? 5. SVM and Kernels a) The soft margin support vector machine solves the following optimization problem: What does the first term minimize (be precise!)? What role does C play? How many examples are misclassified in the figure below!
6. DBSCAN a) What are the characteristics of a border point in DBSCAN? b) Assume you run DBSCAN with MinPoints=6 and epsilon=0.1 for a dataset and obtain 4 clusters and 2% of the objects in the dataset are classified as outliers/noise points. Now you run DBSCAN with MinPoints=8 and epsilon=0.1. How do expect the clustering results to change?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help