PS#8

pdf

School

California Lutheran University *

*We aren’t endorsed by this school

Course

IDS575

Subject

Computer Science

Date

Apr 3, 2024

Type

pdf

Pages

12

Uploaded by SuperHumanCrabPerson1153

Report
Q1 Gaussian Discriminant Analysis 60 Points Q1.1 5 Points Which is the goal of linear discriminant analysis? Q1.2 5 Points In GDA, the are defined as Q1.3 5 Points Which of the following is not true when using GDA? to minimize the variance between the classes to maximize the within class variance to maximize the variance between the classes None of above μ y the median for independent variables for a group the average for independent variables for a group the center values for independent variables for all group the averages for independent variables for all groups
Q1.4 5 Points There are two classes following Gaussian distribution which are centered at and . They have identical covariance matrix. Which is the separating decision boundary? Q1.5 5 Points We have four data points with two classes in different color as shown below. Which of the following models (without additional complexity) can achieve zero training error for classification? The prediction of GDA is the same as Logistic Regression Discriminant analysis is one of generative models Two Gaussian distributions must share the same covariance If p(x|y) is Gaussian, GDA is better than logistic regression given the same data (−1, 2) (1, 4) y x = 3 x + y = 3 x + y = 6 and are possible x + y = 3 x + y = 6 None of above
Q1.6 5 Points A teacher wants to evaluate high school students based on their Math and English grades. Students are classified as either successful(group1) or not-successful(group2) in their college applications. The teacher has data on 20 students as following: Logistic regression PCA LDA None of above
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Assume the data follows Gaussian and these two groups share the same shape. Please use MLE to estimate the parameters: 0.5 Q1.7 10 Points ϕ = μ = group 1
[683.8 654.2] Q1.8 10 Points [610.7 605.7] Q1.9 10 Points [1094.35 -296.15] [-296.15 2867.55] Q2 Naive Bayes Model 40 Points Q2.1 5 Points What is the number of parameters needed to represent a Bernoulli Naive Bayes classifier with n Boolean variables and a Boolean label ? Q2.2 10 Points μ = group 2 Σ = 2 n + 1 n + 1 2 n n
Our data has three boolean input variables a,b,c and a single boolean output K as shown below: According to the Naive Bayes classifier, what is 0.1875 Q2.3 10 Points Assume three word types a,b,c. Consider a Naive Bayes model with the following conditional probability table: Giving a new sample Compute the probability of predicting x as positive. (Please write detailed steps rather than just the result) Q2.3.pdf Download P ( a = 1 ∧ b = 0 ∧ c = 1∣ K = 1) x = (1, 0, 1)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1 of 2 Q2.4 5 Points Which class should it be predicted? positive negative
Q2.5 5 Points Consider a 5 letter long string (only contains lowercase letters). If we want to classify whether it's a valid English word using multinomial naive Bayes model. This task is equivalent to prepare a dice with faces, tossing times. (Assume the whole English vocabulary has 5000 words.) Q2.6 5 Points When applying Laplace smoothing on multinomial Naive Bayes model, if we add to the numerator instead of , like What should be added in the denominator? In this case, we should add 0.5 times number of classes in target (0.5 * |v|) in denominator. Q3 Kernel Methods (from PS6) 35 Points Q3.1 4 Points N L N=5000, L = 5 N=26, L= 5000 N=26, L=5 N=5, L=26 0.5 1
If we consider all M-order interaction as features giving original X having N attributes: , what is the computational complexity in general to naively compute feature maps? Q3.2 7 Points Giving and are valid kernels, select all valid kernels from : ( x , x , ..., x ) 1 2 n O ( N ) 2 O ( M ) 2 O ( N ) M O ( M ) N K 1 K 2 K ( x , z ) := K ( x , z ) + 1 K ( x , z ) 2 K ( x , z ) := K ( x , z ) − 1 K ( x , z ) 2 (a>0) K ( x , z ) := aK ( x , z ) 1 (b<0) K ( x , z ) := bK ( x , z ) 1 Note that the matrix K is the Hadamard product (or element-by-element product) of K1 and K2. K ( x , z ) := K ( x , z ) K ( x , z ) 1 2 p(x): a polynomial function with positive coefficients K ( x , z ) := p ( K ( x , z )) 1 ( a real valued function) K ( x , z ) := f ( x ) f ( z ) f : R n R . ( : another valid kernel over K ( x , z ) := K ( ϕ ( x ), ϕ ( z )) 3 K 3 R d R d
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Q3.3 10 Points Given a uni-variance radial basis kernel , prove the feature mapping of x and z are distanced at most . (Hint: Think about ) Q3.3.pdf Download 1 of 1 Q3.4 7 Points K ( x , z ) = exp {− } 2 ∣∣ x z ∣∣ 2 2 ∣∣ ϕ ( x ) − ϕ ( x )∣∣ 2
Choose all the factors that affects the efficiency of SVM to reduce errors and overfitting: Q3.5 7 Points Select all true about kernel in SVM: selection of kernel kernel parameters soft-margin parameter C how to assign {-1,1} to different labels Kernel function map low dimensional data to high dimensional space a valid kernel has only one feature mapping To transform the problem from nonlinear to linear GRADED Problem Set (PS) #08 STUDENT Urvashiben Patel TOTAL POINTS
128 / 135 pts QUESTION 1 Gaussian Discriminant Analysis 60 / 60 pts 1.1 (no title) 5 / 5 pts 1.2 (no title) 5 / 5 pts 1.3 (no title) 5 / 5 pts 1.4 (no title) 5 / 5 pts 1.5 (no title) 5 / 5 pts 1.6 (no title) 5 / 5 pts 1.7 (no title) 10 / 10 pts 1.8 (no title) 10 / 10 pts 1.9 (no title) 10 / 10 pts QUESTION 2 Naive Bayes Model 40 / 40 pts 2.1 (no title) 5 / 5 pts 2.2 (no title) 10 / 10 pts 2.3 (no title) 10 / 10 pts 2.4 (no title) 5 / 5 pts 2.5 (no title) 5 / 5 pts 2.6 (no title) 5 / 5 pts QUESTION 3 Kernel Methods (from PS6) 28 / 35 pts 3.1 (no title) 4 / 4 pts 3.2 (no title) 0 / 7 pts 3.3 (no title) 10 / 10 pts 3.4 (no title) 7 / 7 pts 3.5 (no title) 7 / 7 pts
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help