PS#9

pdf

School

California Lutheran University *

*We aren’t endorsed by this school

Course

IDS575

Subject

Computer Science

Date

Apr 3, 2024

Type

pdf

Pages

8

Uploaded by SuperHumanCrabPerson1153

Report
Q1 K-means 55 Points Q1.1 5 Points Which of following tasks might be suitable to apply K-means algorithm? Q1.2 7 Points Suppose our current three cluster centroids are :(1,2), :(-3,0) and :(4,2). And we have a training example : (-1,2). After a cluster assignment step, what will be? Q1.3 7 Points Given historical weather records, predict if tomorrow's weather will be sunny or rainy. Given many emails, you want to determine if they are Spam or Non- Spam emails. From the user usage patterns on a website, figure out what different groups of users exists. Given sales data from a large number of products in a supermarket, estimate future sales for each of these products. μ 1 μ 2 μ 3 x ( i ) c ( i ) 1 2 3 not assigned
Select all correct about K-means: Q1.4 12 Points Given the following 6 data points, simulate K-means clustering manually with K = 2. Each example consists of two features and initially assigned to the cluster k. Once an example has been assigned to a particular centroid, it will never be reassigned to another centroid K-Means will always give the same results regardless of the initialization of the centroids. On every iteration of K-means, the cost funtion J(c(1), ..., c(m), μ1, ..., μk (the distortion function) should either stay the same or decrease; in particular, it should not increase For some datasets, the "right" or "correct" value of K (the number of clusters) can be ambiguous, and hard even for a human expert looking carefully at the data to decide.
Compute the centroid of each cluster. mu1={3, 11/4}, mu2={5/2, 3/2} Q1.5 12 Points Continue with above question. Assign each data point to its closest centroid, reporting the new cluster label k. (in format of a list in order of i, eg. [1,2,1,1,1,2] [1,1,1,2,1,2] Q1.6 12 Points Repeat above two steps until convergence. Once the centroids and the cluster labels stop changing, report the cluster label k for each data point and final centroids cluster labels: [1,1,1,2,2,2] Final Centroids: k=1 centroid (2/3, 11/3) k=2 centroid (5, 1) Q2 PCA 45 Points Q2.1 7 Points Which of following tasks might be suitable to apply PCA? (select all correct)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q2.2 7 Points What happens when you get features in lower dimensions using PCA? (select all correct) Q2.3 7 Points Consider the following dataset, which vector could be the first principal compoment ? (select all possible ones) Data compression: Reduce the dimension of your data As a replacement for (or alternative to) linear regression Data visualization: To take 2D data, and find a different way of plotting it in 2D (using k=2) The features will still have interpretability The features will lose interpretability The features must carry all information present in data The features may not carry all information present in data μ μ 1
Q2.4 5 Points Suppose we have eigenvalue and vector pairs of the scatter matrix S as , and . What should be the first principal component direction? μ A μ B μ C μ D (0.1, μ ) a (5, μ ) b (1, μ ) c μ a μ b μ c
Q2.5 7 Points If and are the first two principal components vectors, what statements are correct about them? (select all) Q2.6 7 Points Consider we are using PCA to generalize k-dim representations of our data. We tried different ks and have variability of each principal component as shown in the figure where the x-axis is k, and the y- axis is . Which k might be a good cutoff? μ 1 μ 2 is orthogonal to μ 1 μ 2 is parallel to μ 1 μ 2 variance along is bigger than variance along μ 1 μ 2 variance along is bigger than variance along μ 2 μ 1 λ j λ j
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q2.7 5 Points When do we need kernel PCA? 1 2 5 10 When we need to capture non-linearity in the data When there are kernels in the data When any two features are highly correlated When we need to reduce the number of dimensions to half of the original dimension. GRADED Problem Set (PS) #09 STUDENT Urvashiben Patel TOTAL POINTS 100 / 100 pts QUESTION 1 K-means 55 / 55 pts 1.1 (no title) 5 / 5 pts 1.2 (no title) 7 / 7 pts
1.3 (no title) 7 / 7 pts 1.4 (no title) 12 / 12 pts 1.5 (no title) 12 / 12 pts 1.6 (no title) 12 / 12 pts QUESTION 2 PCA 45 / 45 pts 2.1 (no title) 7 / 7 pts 2.2 (no title) 7 / 7 pts 2.3 (no title) 7 / 7 pts 2.4 (no title) 5 / 5 pts 2.5 (no title) 7 / 7 pts 2.6 (no title) 7 / 7 pts 2.7 (no title) 5 / 5 pts