disc18-regular

pdf

School

Hong Kong Polytechnic University *

*We aren’t endorsed by this school

Course

273

Subject

Computer Science

Date

Nov 24, 2024

Type

pdf

Pages

Uploaded by lixun73230

CS 188 Summer 2023 Discussion 6A Q1. RL: Amusement Park After the disastrous waterslide experience you decide to go to an amusement park instead. In the previous questions the MDP was based on a single ride (a water slide). Here our MDP is about choosing a ride from a set of many rides. You start off feeling well, getting positive rewards from rides, some larger than others. However, there is some chance of each ride making you sick. If you continue going on rides while sick there is some chance of becoming well again, but you don’t enjoy the rides as much, receiving lower rewards (possibly negative). You have never been to an amusement park before, so you don’t know how much reward you will get from each ride (while well or sick). You also don’t know how likely you are to get sick on each ride, or how likely you are to become well again. What you do know about the rides is: Actions / Rides Type Wait Speed Big Dipper Rollercoaster Long Fast Wild Mouse Rollercoaster Short Slow Hair Raiser Drop tower Short Fast Moon Ranger Pendulum Short Slow Leave the Park Leave Short Slow We will formulate this as an MDP with two states, well and sick. Each ride corresponds to an action. The ’Leave the Park’ action ends the current run through the MDP. Taking a ride will lead back to the same state with some probability or take you to the other state. We will use a feature based approximation to the Q-values, defined by the following four features and associated weights: Features Initial Weights f 0 ( state , action ) = 1 (this is a bias feature that is always 1) w 0 = 1 f 1 ( state , action ) = 1 if action type is Rollercoaster 0 otherwise w 1 = 2 f 2 ( state , action ) = 1 if action wait is Short 0 otherwise w 2 = 1 f 3 ( state , action ) = 1 if action speed is Fast 0 otherwise w 3 = 0 . 5 (a) Calculate Q( ’Well’ , ’Big Dipper’ ): 1

(b) Apply a Q-learning update based on the sample ( ’Well’ , ’Big Dipper’ , ’Sick’ , − 10 . 5), using a learning rate of α = 0 . 5 and discount of γ = 0 . 5. What are the new weights? w 0 = w 1 = w 2 = w 3 = 2

(c) Using our approximation, are the Q-values that involve the sick state the same or different from the corre- sponding Q-values that involve the well state? In other words, is Q( ’Well’ , action) = Q( ’Sick’ , action) for each possible action? Why / Why not? (in just one sentence) Same Different Now we will consider the exploration / exploitation tradeoff in this amusement park. (d) Assume we have the original weights from the table on the previous page. What action will an ϵ -greedy approach choose from the well state? If multiple actions could be chosen, give each action and its probability. (e) When running Q-learning another approach to dealing with this tradeoff is using an exploration function: f ( u, n ) = u + k n (i) How is this function used in the Q-learning equations? (a single sentence) What are each of the following? (a single sentence each) (ii) u : (iii) n : (iv) k : 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

2 Deep inside Q-learning Consider the grid-world given below and an agent who is trying to learn the optimal policy. Rewards are only awarded for taking the Exit action from one of the shaded states. Taking this action moves the agent to the Done state, and the MDP terminates. Assume γ = 1 and α = 0 . 5 for all calculations. All equations need to explicitly mention γ and α if necessary. (a) The agent starts from the top left corner and you are given the following episodes from runs of the agent through this grid-world. Each line in an Episode is a tuple containing ( s, a, s ′ , r ). Episode 1 Episode 2 Episode 3 Episode 4 Episode 5 (1,3), S, (1,2), 0 (1,3), S, (1,2), 0 (1,3), S, (1,2), 0 (1,3), S, (1,2), 0 (1,3), S, (1,2), 0 (1,2), E, (2,2), 0 (1,2), E, (2,2), 0 (1,2), E, (2,2), 0 (1,2), E, (2,2), 0 (1,2), E, (2,2), 0 (2,2), E, (3,2), 0 (2,2), S, (2,1), 0 (2,2), E, (3,2), 0 (2,2), E, (3,2), 0 (2,2), E, (3,2), 0 (3,2), N, (3,3), 0 (2,1), Exit, D, -100 (3,2), S, (3,1), 0 (3,2), N, (3,3), 0 (3,2), S, (3,1), 0 (3,3), Exit, D, +50 (3,1), Exit, D, +30 (3,3), Exit, D, +50 (3,1), Exit, D, +30 Fill in the following Q-values obtained from direct evaluation from the samples: Q ((3,2), N) = Q ((3,2), S) = Q ((2,2), E) = (b) Q-learning is an online algorithm to learn optimal Q-values in an MDP with unknown rewards and transition function. The update equation is: Q ( s t , a t ) = (1 − α ) Q ( s t , a t ) + α ( r t + γ max a ′ Q ( s t +1 , a ′ )) where γ is the discount factor, α is the learning rate and the sequence of observations are ( · · · , s t , a t , s t +1 , r t , · · · ). Given the episodes in (a), fill in the time at which the following Q values first become non-zero. When up- dating the Q values, You should only go through each transition once, and the order in which you are to go through them is: transitions in ep 1, transitions in ep 2 and so on. Your answer should be of the form ( episode#,iter# ) where iter# is the Q-learning update iteration in that episode. If the specified Q value never becomes non-zero, write never . Q ((1,2), E) = Q ((2,2), E) = Q ((3,2), S) = (c) In Q-learning, we look at a window of ( s t , a t , s t +1 , r t ) to update our Q-values. One can think of using an update rule that uses a larger window to update these values. Give an update rule for Q ( s t , a t ) given the window ( s t , a t , r t , s t +1 , a t +1 , r t +1 , s t +2 ). Q ( s t , a t ) = Q ( s t , a t ) = Q ( s t , a t ) = 4

Related Documents

disc09-regular-sols.pdf

IMG_20231129_215947458.jpg

Screenshot (525).png

IMG-20240211-WA0018.jpg

Davis_Awodi_Project_3_Usability_review.pptx

Screenshot (476).png

Outputs are only as good as Inputs.docx

Assignment 001 - (6).png

Chat_histor4.docx

Critical Essay Topics.pdf

CSE 548 syllabus.pdf

Screenshot 2023-11-16 at 1.51.28 AM.png

Recommended textbooks for you

C++ for Engineers and Scientists

Computer Science

ISBN:9781133187844

Author:Bronson, Gary J.

Publisher:Course Technology Ptr

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Information Technology Project Management

Computer Science

ISBN:9781337101356

Author:Kathy Schwalbe

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
Information Technology Project Management
Computer Science
ISBN:9781337101356
Author:Kathy Schwalbe
Publisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781305627482
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781285196145
Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel
Publisher:Cengage Learning
Fundamentals of Information Systems
Computer Science
ISBN:9781305082168
Author:Ralph Stair, George Reynolds
Publisher:Cengage Learning

C++ for Engineers and Scientists

Computer Science

ISBN:9781133187844

Author:Bronson, Gary J.

Publisher:Course Technology Ptr

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

Information Technology Project Management

Computer Science

ISBN:9781337101356

Author:Kathy Schwalbe

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781305627482

Author:Carlos Coronel, Steven Morris

Publisher:Cengage Learning

Database Systems: Design, Implementation, & Manag...

Computer Science

ISBN:9781285196145

Author:Steven, Steven Morris, Carlos Coronel, Carlos, Coronel, Carlos; Morris, Carlos Coronel and Steven Morris, Carlos Coronel; Steven Morris, Steven Morris; Carlos Coronel

Publisher:Cengage Learning

Fundamentals of Information Systems

Computer Science

ISBN:9781305082168

Author:Ralph Stair, George Reynolds

Publisher:Cengage Learning

SEE MORE TEXTBOOKS