final-hmm-vpi

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

188

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by SargentPencilManatee29

CS 188 Spring 2023 Final Review: HMMs & VPI Q1. Vehicle Perception Indication A vehicle is trying to identify the situation of the world around it using a set of sensors located around the vehicle. Each sensor reading (SRead) is based off of an object’s location (LOC) and an object’s movement (MOVE). The sensor reading will then produce various values for its predicted location (SLoc) and predicted movement (SMove). The user will receive these readings, as well as the the image (IMAGE) as feedback. (a) The vehicle takes an action, and we assign some utility to the action based on the object’s location and movement. Possible actions are MOVE TOWARDS, MOVE AWAY, and STOP. Suppose the decision network faced by the vehicle is the following. (i) Based on the diagram above, which of the following could possibly be true? □ VPI (Image) = 0 □ VPI (SRead) < 0 □ VPI (SMove , SRead) > VPI (SRead) □ VPI (Feedback) = 0 *$ None of the above (ii) Based on the diagram above, which of the following must necessarily be true? □ VPI (Image) = 0 □ VPI (SRead) = 0 □ VPI (SMove , SRead) = VPI (SRead) □ VPI (Feedback) = 0 *$ None of the above 1

Let’s assume that your startup has less money, so we use a simpler sensor network. One possible sensor network can be represented as follows. You have distributions of P (LOC) , P (MOVE) , P ( SRead | LOC , MOVE) , P ( SLoc | SRead ) and utility values U ( a, l, m ). (b) Complete the equation for determining the expected utility for some ACTION a . EU ( a ) = (i) (ii) (iii) U ( a, l, m ) (i) *$ ∑ l P ( l ) *$ ∑ sloc P ( sloc | l ) *$ ∑ l ∑ sloc P ( sloc | l ) *$ 1 (ii) *$ ∑ m P ( m ) *$ ∑ m P ( sloc | m ) *$ ∑ l ∑ m ∑ sloc P ( sloc | l ) P ( sloc | m ) *$ 1 (iii) *$ ∗ ∑ l ∑ m ∑ sloc P ( sloc | l ) P ( sloc | m ) *$ + ∑ l ∑ m ∑ sloc P ( sloc | l ) P ( sloc | m ) *$ + ∑ l ∑ m ∑ sloc P ( sloc | l, m ) P ( l ) P ( m ) *$ ∗ 1 (c) Your colleague Bob invented a new sensor to observe values of SLoc . (i) Suppose that your company had no sensors till this point. Which of the following expression is equivalent to VPI( SLoc )? □ VPI( SLoc ) = ( ∑ sloc P ( sloc ) MEU( SLoc = sloc )) − max a EU( a ) □ VPI( SLoc ) = MEU( SLoc ) − MEU( ∅ ) □ VPI( SLoc ) = max sloc MEU( SLoc = sloc ) − MEU( ∅ ) *$ None of the above (ii) Gaagle, an established company, wants to sell your startup a device that gives you SRead . Given that you already have Bob’s device (that gives you SLoc ), what is the maximum amount of money you should pay for Gaagle’s device? Suppose you value $ 1 at 1 utility. □ VPI( SRead ) □ VPI( SRead ) − VPI( SLoc ) □ VPI( SRead, SLoc ) □ VPI( SRead, SLoc ) − VPI( SLoc ) *$ None of the above 2

2 Particle Filtering Apprenticeship We are observing an agent’s actions in an MDP and are trying to determine which out of a set { π 1 , . . . , π n } the agent is following. Let the random variable Π take values in that set and represent the policy that the agent is acting under. We consider only stochastic policies, so that A t is a random variable with a distribution conditioned on S t and Π. As in a typical MDP, S t is a random variable with a distribution conditioned on S t − 1 and A t − 1 . The full Bayes net is shown below. The agent acting in the environment knows what state it is currently in (as is typical in the MDP setting). Unfortunately, however, we, the observer, cannot see the states S t . Thus we are forced to use an adapted particle filtering algorithm to solve this problem. Concretely, we will develop an efficient algorithm to estimate P (Π | a 1: t ). (a) The Bayes net for part (a) is Π s t − 1 s t s t +1 a t − 1 a t a t +1 · · · · · · · · · · · · (i) Select all of the following that are guaranteed to be true in this model for t > 3: □ S t ⊥⊥ S t − 2 | S t − 1 □ S t ⊥⊥ S t − 2 | S t − 1 , A 1: t − 1 □ S t ⊥⊥ S t − 2 | Π □ S t ⊥⊥ S t − 2 | Π , A 1: t − 1 □ S t ⊥⊥ S t − 2 | Π , S t − 1 □ S t ⊥⊥ S t − 2 | Π , S t − 1 , A 1: t − 1 □ None of the above We will compute our estimate for P (Π | a 1: t ) by coming up with a recursive algorithm for computing P (Π , S t | a 1: t ). (We can then sum out S t to get the desired distribution; in this problem we ignore that step.) (ii) Write a recursive expression for P (Π , S t | a 1: t ) in terms of the CPTs in the Bayes net above. P (Π , S t | a 1: t ) ∝ We now try to adapt particle filtering to approximate this value. Each particle will contain a single state s t and a potential policy π i . (iii) The following is pseudocode for the body of the loop in our adapted particle filtering algorithm. Fill in the boxes with the correct values so that the algorithm will approximate P (Π , S t | a 1: t ). 1. Elapse time: for each particle ( s t , π i ), sample a successor s t +1 from . The policy π ′ in the new particle is . 2. Incorporate evidence: To each new particle ( s t +1 , π ′ ), assign weight . 3. Resample particles from the weighted particle distribution. (b) We now observe the acting agent’s actions and rewards at each time step (but we still don’t know the states). Unlike the MDPs in lecture, here we use a stochastic reward function, so that R t is a random variable with a distribution conditioned on S t and A t . The new Bayes net is given by 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Π s t − 1 s t s t +1 a t − 1 a t a t +1 · · · · · · · · · · · · r t − 1 r t r t +1 · · · · · · Notice that the observed rewards do in fact give use- ful information since d-separation does not give that R t ⊥⊥ Π | A 1: t . (i) Give an active path connecting R t and Π when A 1: t are observed. Your answer should be an ordered list of nodes in the graph, for example “ S t , S t +1 , A t , Π , A t − 1 , R t − 1 ”. (ii) Write a recursive expression for P (Π , S t | a 1: t , r 1: t ) in terms of the CPTs in the Bayes net above. P (Π , S t | a 1: t , r 1: t ) ∝ (c) We now observe only the sequence of rewards and no longer observe the sequence of actions. The new Bayes net is shown on the right. Π s t − 1 s t s t +1 a t − 1 a t a t +1 · · · · · · · · · · · · r t − 1 r t r t +1 · · · · · · (i) Write a recursive expression for P (Π , S t , A t | r 1: t ) in terms of the CPTs in the Bayes net above. P (Π , S t , A t | r 1: t ) ∝ We now try to adapt particle filtering to approximate this value. Each particle will contain a single state s t , a single action a t , and a potential policy π i . (ii) The following is pseudocode for the body of the loop in our adapted particle filtering algorithm. Fill in the boxes with the correct values so that the algorithm will approximate P (Π , S t , A t | r 1: t ). 1. Elapse time: for each particle ( s t , a t , π i ), sample a successor state s t +1 from . Then, sample a successor action a t +1 from . The policy π ′ in the new particle is . 2. Incorporate evidence: To each new particle ( s t +1 , a t +1 , π ′ ), assign weight . 3. Resample particles from the weighted particle distribution. 4

Related Documents

final-search-games.pdf

final-csps.pdf

IMG_2399.jpeg

Homework Graph Algorithms Dijkstra's and Bellman Ford(1).docx

IMG_2391.jpeg

IMG_2354.jpeg

IMG_2355.jpeg

Recommended textbooks for you

C++ for Engineers and Scientists

Computer Science

ISBN:9781133187844

Author:Bronson, Gary J.

Publisher:Course Technology Ptr

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Systems Architecture

Computer Science

ISBN:9781305080195

Author:Stephen D. Burd

Publisher:Cengage Learning

Principles of Information Systems (MindTap Course...

Computer Science

ISBN:9781305971776

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781337097536

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

Information Technology Project Management

Computer Science

ISBN:9781337101356

Author:Kathy Schwalbe

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
Principles of Information Systems (MindTap Course...
Computer Science
ISBN:9781305971776
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781337097536
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning
Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning