EECS189 Dis10 Solution

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

189

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

5

Uploaded by DoctorDugong8581

Report
CS 189/289A Introduction to Machine Learning Fall 2023 Jennifer Listgarten, Jitendra Malik DIS10 1 Probabilistic Graphical Models Recall that we can represent joint probability distributions with directed acyclic graphs (DAGs). Let G be a DAG with vertices X 1 , ..., X k . If P is a (joint) distribution for X 1 , ..., X k with (joint) probability mass function p , we say that G represents P if p ( x 1 , · · · , x k ) = k ZZZZZZZ i = 1 P ( X i = x i | pa( X i )) , (1) where pa( X i ) denotes the parent nodes of X i . (Recall that in a DAG, node Z is a parent of node X i ff there is a directed edge going out of Z into X .) Consider the following DAG Z Y X S Figure 1: G , a DAG (a) Write down the joint factorization of P S , X , Y , Z ( s , x , y , z ) implied by the DAG G shown in Figure 1. Solution: P S , X , Y , Z ( s , x , y , z ) = P ( X = x ) P ( Z = z ) P ( Y = y | X = x , Z = z ) P ( S = s | X = x , Y = y ) . (b) Is S Z | Y ? Solution: No. As a counterexample, consider the case where all nodes represent binary ran- dom variables, P ( X = 1) = P ( Z = 1) = 0 . 5, Y = X Z , and S = X Y , where is the XOR operator. Then we can see that S = Z , whereas knowing Y does not fully determine S or Z . A version of these solutions from a previous semester erroneously said that this conditional independence did hold. As a result, you may have wrongly heard in section that this statement is true, via faulty algebraic manipulation and / or other algorithms such as the Bayes ball (d- separation). Running these algorithms correctly should show that S and Z are indeed not conditionally independent given Y . DIS10, © UCB CS 189 / 289A, Fall 2023. All Rights Reserved. This may not be publicly shared without explicit permission. 1
If X is fully removed from G , then we do indeed have S Z | Y . This is left as an exercise in algebraic manipulation of probability distributions. (c) Is S X | Y ? Solution: No. Consider the same example from above with binary random variables. Knowing Y does not determine S , but knowing both X and Y does. DIS10, © UCB CS 189 / 289A, Fall 2023. All Rights Reserved. This may not be publicly shared without explicit permission. 2
2 PGMs: Sleeping in Class In this question, you’ll be reasoning about a Dynamic Bayesian Network (DBN), a form of a Prob- abilistic Graphical Model. Your favorite discussion section TA wants to know if their students are getting enough sleep. Each day, the TA observes the students in their section, noting if they fall asleep in class or have red eyes. The TA makes the following conclusions: 1. The prior probability of getting enough sleep, S , with no observations, is 0.7. 2. The probability of getting enough sleep on night t is 0.8 given that the student got enough sleep the previous night, and 0.3 if not. 3. The probability of having red eyes R is 0.2 if the student got enough sleep, and 0.7 if not. 4. The probability of sleeping in class C is 0.1 if the student got enough sleep, and 0.3 if not. (a) Formulate this information as a dynamic Bayesian network that the professor could use to filter or predict from a sequence of observations. If you were to reformulate this network as a hidden Markov model instead (that has only a single observation variable), how would you do so? Give a high-level description (probability tables for the HMM formulation are not necessary.) Solution: Our Bayesian Network has three variables: S t , whether the student gets enough sleep, R t , whether they have red eyes in class, and C t , whether the student sleeps in class. S t is a parent of S t + 1 , R t , and C t . The network can be provided pictorally, or fully through condi- tional probability tables (CPTs.) The CPTs for this problem are given by: P ( s 0 ) = 0 . 7 P ( s t + 1 | s t ) = 0 . 8 P ( s t + 1 s t ) = 0 . 3 P ( r t | s t ) = 0 . 2 P ( r t s t ) = 0 . 7 P ( c t | s t ) = 0 . 1 P ( c t s t ) = 0 . 3 To reformulate this problem as an HMM with a single observation node, we can combine the 2-valued variables r t and c t into a single 4-valued variable, multiplying together the emission probabilities. (b) Consider the following evidence values at timesteps 1, 2, and 3: (a) e 1 = not red eyes, not sleeping in class (b) e 2 = red eyes, not sleeping in class (c) e 3 = red eyes, sleeping in class DIS10, © UCB CS 189 / 289A, Fall 2023. All Rights Reserved. This may not be publicly shared without explicit permission. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Compute state estimates for timesteps t at 1, 2, and 3; that is, calculate P ( S t | e 1: t ). Assume a prior on P ( s 0 ) that is consistent with the prior in the previous part; that is, P ( s 0 ) = 0 . 7. Solution: We can apply the filtering (forward computation) method. We walk through the computation step-by-step: P ( S 0 ) = 0 . 7 , 0 . 3 P ( S 1 ) = YYYYYYY s 0 P ( S 1 | s 0 ) P ( s 0 ) = 0 . 8 , 0 . 2 0 . 7 + 0 . 3 , 0 . 7 0 . 3 = 0 . 65 , 0 . 35 P ( S 1 | e 1 ) = α P ( e 1 | S 1 ) P ( S 1 ) = α 0 . 8 0 . 9 , 0 . 3 0 . 7 ⟩⟨ 0 . 65 , 0 . 35 After normalizing, we get the following (the rest of the solution(s) will normalize implicitly.) = 0 . 8643 , 0 . 1357 P ( S 2 | e 1 ) = YYYYYYY s 1 P ( S 2 | s 1 ) P ( s 1 | e 1 ) = 0 . 7321 , 0 . 2679 P ( S 2 | e 1 : e 2 ) = α P ( e 2 | S 2 ) P ( S 2 | e 1 ) = 0 . 5010 , 0 . 4990 P ( S 3 | e 1 : e 2 ) = YYYYYYY s 2 P ( S 3 | s 2 ) P ( s 1 | e 1 : e 2 ) = 0 . 5505 , 0 . 4495 P ( S 3 | e 1 : e 3 ) = α P ( e 3 | S 3 ) P ( S 3 | e 1 : e 2 ) = 0 . 1045 , 0 . 8955 (c) Compute smoothing estimates P ( S t | e 1:3 ) for each timestep, using the same evidence as the previous part. Solution: First, we do the backwards computations: P ( e 3 | S 3 ) = 0 . 2 0 . 1 , 0 . 7 0 . 3 = 0 . 02 , 0 . 21 P ( e 3 | S 2 ) = YYYYYYY s 3 P ( e 3 | s 3 ) P ( s 3 | S 2 ) = 0 . 02 0 . 8 + 0 . 21 0 . 2 , 0 . 02 0 . 3 + 0 . 21 0 . 7 = 0 . 0588 , 0 . 153 DIS10, © UCB CS 189 / 289A, Fall 2023. All Rights Reserved. This may not be publicly shared without explicit permission. 4
P ( e 2 : e 3 | S 1) = YYYYYYY s 2 P ( e 2 | s 2 ) P ( e 3 | s 2 ) P ( s 2 | S 1 ) = 0 . 0233 , 0 . 0556 Now, we can combine them with the forwards computation and normalize. P ( S 1 | e 1 : e 3 ) = α P ( S 1 | e 1 ) P ( e 2 : e 3 | S 1 ) = 0 . 7277 , 0 . 2723 P ( S 2 | e 1 : e 3 ) = α P ( S 2 | e 1 : e 2 ) P ( e 3 | S 2 ) = 0 . 2757 , 0 . 7243 P ( S 3 | e 1 : e 3 ) = 0 . 1045 , 0 . 8955 (d) Compare, in plain English, the filtered estimates you computed for timesteps 1 and 2 with the smoothed estimates. How do the two analyses di ff er? Solution: The smoothed analysis shows that the time the student started sleeping poorly is one timestep earlier than filtering only computation by incorporating future observations that indicated lack of sleep at the last step. DIS10, © UCB CS 189 / 289A, Fall 2023. All Rights Reserved. This may not be publicly shared without explicit permission. 5