Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance of moving in the-90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location (3,2). It receives a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward. I 2 1 1 -1 +1 1 2 3 (a) Show the utility equations for U(1.1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma=0.9. (b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values rounded to two-digit precision.
Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance of moving in the-90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location (3,2). It receives a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward. I 2 1 1 -1 +1 1 2 3 (a) Show the utility equations for U(1.1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma=0.9. (b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values rounded to two-digit precision.
Chapter7: Uncertainty
Section: Chapter Questions
Problem 7.3P
Related questions
Question
4
![Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can
execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance
of moving in the -90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the
agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location
(3.2). it recelves a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward.
2
-1
1
+1
1
2
3
(a) Show the utility equations for U(1,1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma = 0.9.
(b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values
rounded to two-digit precislon.](/v2/_next/image?url=https%3A%2F%2Fcontent.bartleby.com%2Fqna-images%2Fquestion%2F629eb2d8-2aec-4b81-96f5-36db7876a096%2F77b275de-60e5-4caf-870b-1d7e12d4f416%2Fk6dzr1i_processed.png&w=3840&q=75)
Transcribed Image Text:Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can
execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance
of moving in the -90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the
agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location
(3.2). it recelves a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward.
2
-1
1
+1
1
2
3
(a) Show the utility equations for U(1,1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma = 0.9.
(b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values
rounded to two-digit precislon.
Expert Solution
![](/static/compass_v2/shared-icons/check-mark.png)
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by step
Solved in 3 steps
![Blurred answer](/static/compass_v2/solution-images/blurred-answer.jpg)
Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, economics and related others by exploring similar questions and additional content below.Recommended textbooks for you
![Managerial Economics: A Problem Solving Approach](https://www.bartleby.com/isbn_cover_images/9781337106665/9781337106665_smallCoverImage.gif)
Managerial Economics: A Problem Solving Approach
Economics
ISBN:
9781337106665
Author:
Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:
Cengage Learning
![Managerial Economics: A Problem Solving Approach](https://www.bartleby.com/isbn_cover_images/9781337106665/9781337106665_smallCoverImage.gif)
Managerial Economics: A Problem Solving Approach
Economics
ISBN:
9781337106665
Author:
Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:
Cengage Learning