final-hmm-vpi

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

188

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

4

Uploaded by SargentPencilManatee29

Report
CS 188 Spring 2023 Final Review: HMMs & VPI Q1. Vehicle Perception Indication A vehicle is trying to identify the situation of the world around it using a set of sensors located around the vehicle. Each sensor reading (SRead) is based off of an object’s location (LOC) and an object’s movement (MOVE). The sensor reading will then produce various values for its predicted location (SLoc) and predicted movement (SMove). The user will receive these readings, as well as the the image (IMAGE) as feedback. (a) The vehicle takes an action, and we assign some utility to the action based on the object’s location and movement. Possible actions are MOVE TOWARDS, MOVE AWAY, and STOP. Suppose the decision network faced by the vehicle is the following. (i) Based on the diagram above, which of the following could possibly be true? VPI (Image) = 0 VPI (SRead) < 0 VPI (SMove , SRead) > VPI (SRead) VPI (Feedback) = 0 *$ None of the above (ii) Based on the diagram above, which of the following must necessarily be true? VPI (Image) = 0 VPI (SRead) = 0 VPI (SMove , SRead) = VPI (SRead) VPI (Feedback) = 0 *$ None of the above 1
Let’s assume that your startup has less money, so we use a simpler sensor network. One possible sensor network can be represented as follows. You have distributions of P (LOC) , P (MOVE) , P ( SRead | LOC , MOVE) , P ( SLoc | SRead ) and utility values U ( a, l, m ). (b) Complete the equation for determining the expected utility for some ACTION a . EU ( a ) = (i) (ii) (iii) U ( a, l, m ) (i) *$ l P ( l ) *$ sloc P ( sloc | l ) *$ l sloc P ( sloc | l ) *$ 1 (ii) *$ m P ( m ) *$ m P ( sloc | m ) *$ l m sloc P ( sloc | l ) P ( sloc | m ) *$ 1 (iii) *$ l m sloc P ( sloc | l ) P ( sloc | m ) *$ + l m sloc P ( sloc | l ) P ( sloc | m ) *$ + l m sloc P ( sloc | l, m ) P ( l ) P ( m ) *$ 1 (c) Your colleague Bob invented a new sensor to observe values of SLoc . (i) Suppose that your company had no sensors till this point. Which of the following expression is equivalent to VPI( SLoc )? VPI( SLoc ) = ( sloc P ( sloc ) MEU( SLoc = sloc )) max a EU( a ) VPI( SLoc ) = MEU( SLoc ) MEU( ) VPI( SLoc ) = max sloc MEU( SLoc = sloc ) MEU( ) *$ None of the above (ii) Gaagle, an established company, wants to sell your startup a device that gives you SRead . Given that you already have Bob’s device (that gives you SLoc ), what is the maximum amount of money you should pay for Gaagle’s device? Suppose you value $ 1 at 1 utility. VPI( SRead ) VPI( SRead ) VPI( SLoc ) VPI( SRead, SLoc ) VPI( SRead, SLoc ) VPI( SLoc ) *$ None of the above 2
2 Particle Filtering Apprenticeship We are observing an agent’s actions in an MDP and are trying to determine which out of a set { π 1 , . . . , π n } the agent is following. Let the random variable Π take values in that set and represent the policy that the agent is acting under. We consider only stochastic policies, so that A t is a random variable with a distribution conditioned on S t and Π. As in a typical MDP, S t is a random variable with a distribution conditioned on S t 1 and A t 1 . The full Bayes net is shown below. The agent acting in the environment knows what state it is currently in (as is typical in the MDP setting). Unfortunately, however, we, the observer, cannot see the states S t . Thus we are forced to use an adapted particle filtering algorithm to solve this problem. Concretely, we will develop an efficient algorithm to estimate P | a 1: t ). (a) The Bayes net for part (a) is Π s t 1 s t s t +1 a t 1 a t a t +1 · · · · · · · · · · · · (i) Select all of the following that are guaranteed to be true in this model for t > 3: S t ⊥⊥ S t 2 | S t 1 S t ⊥⊥ S t 2 | S t 1 , A 1: t 1 S t ⊥⊥ S t 2 | Π S t ⊥⊥ S t 2 | Π , A 1: t 1 S t ⊥⊥ S t 2 | Π , S t 1 S t ⊥⊥ S t 2 | Π , S t 1 , A 1: t 1 None of the above We will compute our estimate for P | a 1: t ) by coming up with a recursive algorithm for computing P , S t | a 1: t ). (We can then sum out S t to get the desired distribution; in this problem we ignore that step.) (ii) Write a recursive expression for P , S t | a 1: t ) in terms of the CPTs in the Bayes net above. P , S t | a 1: t ) We now try to adapt particle filtering to approximate this value. Each particle will contain a single state s t and a potential policy π i . (iii) The following is pseudocode for the body of the loop in our adapted particle filtering algorithm. Fill in the boxes with the correct values so that the algorithm will approximate P , S t | a 1: t ). 1. Elapse time: for each particle ( s t , π i ), sample a successor s t +1 from . The policy π in the new particle is . 2. Incorporate evidence: To each new particle ( s t +1 , π ), assign weight . 3. Resample particles from the weighted particle distribution. (b) We now observe the acting agent’s actions and rewards at each time step (but we still don’t know the states). Unlike the MDPs in lecture, here we use a stochastic reward function, so that R t is a random variable with a distribution conditioned on S t and A t . The new Bayes net is given by 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Π s t 1 s t s t +1 a t 1 a t a t +1 · · · · · · · · · · · · r t 1 r t r t +1 · · · · · · Notice that the observed rewards do in fact give use- ful information since d-separation does not give that R t ⊥⊥ Π | A 1: t . (i) Give an active path connecting R t and Π when A 1: t are observed. Your answer should be an ordered list of nodes in the graph, for example S t , S t +1 , A t , Π , A t 1 , R t 1 ”. (ii) Write a recursive expression for P , S t | a 1: t , r 1: t ) in terms of the CPTs in the Bayes net above. P , S t | a 1: t , r 1: t ) (c) We now observe only the sequence of rewards and no longer observe the sequence of actions. The new Bayes net is shown on the right. Π s t 1 s t s t +1 a t 1 a t a t +1 · · · · · · · · · · · · r t 1 r t r t +1 · · · · · · (i) Write a recursive expression for P , S t , A t | r 1: t ) in terms of the CPTs in the Bayes net above. P , S t , A t | r 1: t ) We now try to adapt particle filtering to approximate this value. Each particle will contain a single state s t , a single action a t , and a potential policy π i . (ii) The following is pseudocode for the body of the loop in our adapted particle filtering algorithm. Fill in the boxes with the correct values so that the algorithm will approximate P , S t , A t | r 1: t ). 1. Elapse time: for each particle ( s t , a t , π i ), sample a successor state s t +1 from . Then, sample a successor action a t +1 from . The policy π in the new particle is . 2. Incorporate evidence: To each new particle ( s t +1 , a t +1 , π ), assign weight . 3. Resample particles from the weighted particle distribution. 4