Homework 8C

pdf

School

University of Arkansas *

*We aren’t endorsed by this school

Course

4143

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

6

Uploaded by SuperBatMaster87

Report
1 Homework 8 Problem 3 Suppose that we use bootstraping (i.e., sample with replacement) to sample 5 data points out of the training dataset { x 1 ,x 2 ,...,x 10 }. Answer the following questions 1. Is it possible for the sampled bootstrap to be { x 1 ,x 2 ,x 3 ,x 4 ,x 5}? If no, please explain why; otherwise calculate the probability of getting this specific bootstrap sample. Yes, x 1 ,x 2 ,x 3 ,x 4 ,x 5 can be obtained as a bootstrap sample because bootstrapping involves sampling with replacement, which means that each data point has a chance of being selected multiple times. This formula can be used to calculate the likelihood of receiving this specific bootstrap sample: Probability of selecting x 1 in one draw = 1/10 (since there are 10 data points) Probability of selecting x 2 ,x 3 ,x 4 and x 5 in subsequent draws = 1/10 each Probability of getting {x 1 ,x 2 ,x 3 ,x 4 ,x 5 } = (1/10) * (1/10) * (1/10) * (1/10) * (1/10) = (1/10)^5 = 1/100,000 2. Is it possible for the sampled bootstrap to be {x 1, x 1, x 1, x 1, x 1 }? If no, please explain why; otherwise calculate the probability of getting this specific bootstrap sample. No, the sampled bootstrap cannot be x 1, x 1, x 1, x 1, x 1 . This is because bootstrap sampling is done with replacement, which means that each data point can be chosen multiple times. However, in this case, the sample contains only one unique data point (x1) repeated five times. Because the sampling is done with replacement, this specific bootstrap sample cannot be obtained. 3. Compute the probability of a data sample, say x 1 , NOT being included in the bootstrap. Because there are 9 other data points besides x1, the probability of x1 not being included in a single draw is 9/10. The probability of x1 not being included in any of the five draws in bootstrapping with replacement is: Probability of x1 not being selected in one draw = 9/10
2 Probability of x1 not being selected in all five draws = (9/10)^5 = 59049/100000 = 0.59049 = 59.05 %
3 Problem 4 In a trained classification tree, each leaf node may contain data points from different classes. For instance, a leaf node might have 10 data points from the positive class (+1) and 5 data points from the negative class (-1). Suppose that we use this classification tree to predict a test input x , and as the tree’s decision process unfolds, this input eventually arrives at a specific leaf node L . 1. If the leaf node L contains 2 positive training samples and 5 negative samples, how will this tree classify x ? What is the estimated probability, P[ x is +1], that x is positive? Since the leaf node L contains 2 positive training samples and 5 negative samples, the classification would be determined by the majority class at that leaf node. For instance, since there are more negative samples (5) than positive samples (2), the tree would classify the test input x as belonging to the negative class (-1). 𝑃[ 𝑥 + 1] = 𝑁????? ?? ???𝑖?𝑖?? ??????? ?? ???? ???? 𝑇???? ?????? ?? ??????? ?? ???? ???? = 2 2 + 5 = 2 7 2. Suppose we produce ten bootstrapped samples from a data set containing +1 and -1 classes. We then apply a classification tree to each bootstrapped sample and, for a specific test input x , produce 10 estimates of P[ x is +1]: 0 . 1 , 0 . 15 , 0 . 2 , 0 . 2 , 0 . 55 , 0 . 6 , 0 . 6 , 0 . 65 , 0 . 7 , 0 . 75 . There are two common ways to combine these results together into a single class prediction. One is the majority vote approach discussed in this chapter. The second approach is to classify based on the average probability. In this example, what is the final classification under each of these two approaches? Majority Vote Approach: In the majority vote approach, the final classification is determined by the majority of the individual classifications. Counting the number of estimates grater than or equal to 0.5, we have 6 estimates greater than or equal to 0.5 (0.55, 0.6, 0.6, 0.65, 0.7, 0.75) Therefore, the majority vote would result in a classification of +1. Average Probability Approach:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 In the average approach, the final classification is based on the average of the individual probabilities. 𝑨𝒗𝒆𝒓?𝒈𝒆 𝑷𝒓𝒐???𝒊𝒍𝒊𝒕𝒚 = (0.1+0.15+0.2+0.2+0.55+0.6+0.6+0.65+0.7+0.75) 10 ≈ 0.47 Since 0.46 is less than 0.5, the final classification under the average probability approach is -1. As a result, the final classification under the majority vote approach is +1, whereas the final classification under the average probability approach is -1 for the specific test input x based on the given set of probability estimates.
5 Problem 5 Single-layer perceptrons are simplified neural networks. In this exercise, we use the sign function (the “hard” version of the Logistic function) as the activation function and assume that the input layer has two variables x 1 and x 2 only. , Formally, such perceptron can be expressed in the following way: G p ( x 1 ,x 2 ) := sign ( w 1 x 1 + w 2 x 2 + w 3 ) , where w 1 ,w 2 ,w 3 represent the parameters. 1. Given two binary variables x 1 and x 2 , the AND operator is defined as: {1 if x 1 = x 2 = 1, AND( x 1 ,x 2 ) := (0 otherwise. Is it possible to find parameters w 1 ,w 2 ,w 3 such that G p ( x 1 ,x 2 ) = AND( x 1 ,x 2 ) for all x 1 ,x 2 {0 , 1}? If no, please explain why; otherwise identify the parameters and show that your answer is correct. Yes, it is possible to find w 1 ,w 2 ,w 3 parameters such that G p ( x 1 ,x 2 ) = AND( x 1 ,x 2 ) for all x 1 ,x 2 {0 , 1}. Set the parameters to w 1 = 1, w 2 = 1, and w 3 = -1.5. The perceptron will output 1 when both x 1 and x 2 are 1, and 0 otherwise with these parameter values. This is consistent with the behavior of the AND operator. 2. Given two binary variables x 1 and x 2 , the OR operator is defined as: {1 if x 1 = 1 or x 2 = 1, OR( x 1 ,x 2 ) := 0 otherwise.
6 Is it possible to find parameters w 1 ,w 2 ,w 3 such that G p ( x 1 ,x 2 ) = OR( x 1 ,x 2 ) for all x 1 ,x 2 {0 , 1}? If no, please explain why; otherwise identify the parameters and show that your answer is correct. Yes, it is possible to find parameters w 1 ,w 2 ,w 3 such that G p ( x 1 ,x 2 ) = OR( x 1 ,x 2 ) for all x 1 ,x 2 {0 , 1}. The parameters can be set as w 1 = 1, w 2 = 1, w 3 = -0.5. With these parameter values, the perceptron will output 1 only when both x 1 and x 2 are 1, and 0 otherwise. This matches the behavior of the OR operator. 3. Given two binary variables x 1 and x 2 , the XOR operator is defined as: {1 if x 1 x 2 , XOR( x 1 ,x 2 ) := 0 otherwise. Is it possible to find parameters w 1 ,w 2 ,w 3 such that G p ( x 1 ,x 2 ) = XOR( x 1 ,x 2 ) for all x 1 ,x 2 {0 , 1}? If no, please explain why; otherwise identify the parameters and show that your answer is correct. No, it is not possible to find parameters w 1 ,w 2 ,w 3 such that G p ( x 1 ,x 2 ) = XOR( x 1 ,x 2 ) for all x 1 ,x 2 {0 , 1}. The XOR operator is not linearly separable, meaning it cannot be represented by a single-layer perceptron. The XOR operator requires a more complex network structure, such as a multi-layer perceptron, to accurately represent its behavior.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help