Backward View 0.0/1.0 point (graded) Below we show a gridworld and a particular trajectory taken by the agent. The agent gets a reward of 0 at every step, except for the last step when it reaches the goal, receiving a reward of 1. We label some of the states and will refer to those states by the labled number. We use a tabular representation, meaning that the features are represented as a one-hot vector (i.e. the feature corresponding to each square is 1 if the agent is in that square, and 0 otherwise, meaning that only one feature will be non-zero at any given time). G 2 5 сл 4 Let us look more closely at the goal state and states 1-5, and consider the one-hot features just for those 6 states. When the agent is at state 5, it's features will be x (5) = (0, 0, 0, 0, 0, 1), and features for state 1 will be (1)=(0, 1, 0, 0, 0, 0). This means that V (5, w) = (0, 0, 0, 0, 0, 1). For this exercise, discount y = 0.9 and trace decay λ = 0.3. In the trajectory given above, what will be the last component of z, the eligibility trace, after processing state 3? That is, what will be the value of the component of z associated with state 5 after the component associated with state 3 has been added to it? TD Error 0.0/1.0 point (graded) Now consider that the states are initialized with the following values: V (1, w) = 10, V (2, w) = 15, V (3, w) = 20, V (4, w) = 25, V (5, w) = 30, and the discount rate y = 0.9. What is the TD-error & for the transition from state 5 to state 4? Submit You have used 0 of 500 attempts Save Update 0.0/1.0 point (graded) == Now consider that the states are initialized with the following values: V (1, w) = 10, V (2, w) = 15, V (3, w) = 20, V (4, w) = 25, V (5, w) = 30, V (goal, w) = 0 and the discount rate = 0.9 and learning rate a = 0.1. What will be the new value of state 5 (i.e. V (5, w)) if we make updates according to TD(A) at each step with λ = 0.3, at the end of the episode? Please specify the value up to 4 decimal digits.

Computer Science

Backward View 0.0/1.0 point (graded) Below we show a gridworld and a particular trajectory taken by the agent. The agent gets a reward of 0 at every step, except for the last step when it reaches the goal, receiving a reward of 1. We label some of the states and will refer to those states by the labled number. We use a tabular representation, meaning that the features are represented as a one-hot vector (i.e. the feature corresponding to each square is 1 if the agent is in that square, and 0 otherwise, meaning that only one feature will be non-zero at any given time). G 2 5 сл 4 Let us look more closely at the goal state and states 1-5, and consider the one-hot features just for those 6 states. When the agent is at state 5, it's features will be x (5) = (0, 0, 0, 0, 0, 1), and features for state 1 will be (1)=(0, 1, 0, 0, 0, 0). This means that V (5, w) = (0, 0, 0, 0, 0, 1). For this exercise, discount y = 0.9 and trace decay λ = 0.3. In the trajectory given above, what will be the last component of z, the eligibility trace, after processing state 3? That is, what will be the value of the component of z associated with state 5 after the component associated with state 3 has been added to it? TD Error 0.0/1.0 point (graded) Now consider that the states are initialized with the following values: V (1, w) = 10, V (2, w) = 15, V (3, w) = 20, V (4, w) = 25, V (5, w) = 30, and the discount rate y = 0.9. What is the TD-error & for the transition from state 5 to state 4? Submit You have used 0 of 500 attempts Save Update 0.0/1.0 point (graded) == Now consider that the states are initialized with the following values: V (1, w) = 10, V (2, w) = 15, V (3, w) = 20, V (4, w) = 25, V (5, w) = 30, V (goal, w) = 0 and the discount rate = 0.9 and learning rate a = 0.1. What will be the new value of state 5 (i.e. V (5, w)) if we make updates according to TD(A) at each step with λ = 0.3, at the end of the episode? Please specify the value up to 4 decimal digits.

Backward View 0.0/1.0 point (graded) Below we show a gridworld and a particular trajectory taken by the agent. The agent gets a reward of 0 at every step, except for the last step when it reaches the goal, receiving a reward of 1. We label some of the states and will refer to those states by the labled number. We use a tabular representation, meaning that the features are represented as a one-hot vector (i.e. the feature corresponding to each square is 1 if the agent is in that square, and 0 otherwise, meaning that only one feature will be non-zero at any given time). G 2 5 сл 4 Let us look more closely at the goal state and states 1-5, and consider the one-hot features just for those 6 states. When the agent is at state 5, it's features will be x (5) = (0, 0, 0, 0, 0, 1), and features for state 1 will be (1)=(0, 1, 0, 0, 0, 0). This means that V (5, w) = (0, 0, 0, 0, 0, 1). For this exercise, discount y = 0.9 and trace decay λ = 0.3. In the trajectory given above, what will be the last component of z, the eligibility trace, after processing state 3? That is, what will be the value of the component of z associated with state 5 after the component associated with state 3 has been added to it? TD Error 0.0/1.0 point (graded) Now consider that the states are initialized with the following values: V (1, w) = 10, V (2, w) = 15, V (3, w) = 20, V (4, w) = 25, V (5, w) = 30, and the discount rate y = 0.9. What is the TD-error & for the transition from state 5 to state 4? Submit You have used 0 of 500 attempts Save Update 0.0/1.0 point (graded) == Now consider that the states are initialized with the following values: V (1, w) = 10, V (2, w) = 15, V (3, w) = 20, V (4, w) = 25, V (5, w) = 30, V (goal, w) = 0 and the discount rate = 0.9 and learning rate a = 0.1. What will be the new value of state 5 (i.e. V (5, w)) if we make updates according to TD(A) at each step with λ = 0.3, at the end of the episode? Please specify the value up to 4 decimal digits.

Database System Concepts

Database System Concepts

7th Edition

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Chapter1: Introduction

Section: Chapter Questions

Problem 1PE

See similar textbooks

Related questions

Question

Don't use ai to answer I will report your answer Solve it Asap with explanation and calculation

Backward View
0.0/1.0 point (graded)
Below we show a gridworld and a particular trajectory taken by the agent. The agent gets a reward of 0 at
every step, except for the last step when it reaches the goal, receiving a reward of 1. We label some of the
states and will refer to those states by the labled number. We use a tabular representation, meaning that the
features are represented as a one-hot vector (i.e. the feature corresponding to each square is 1 if the agent is
in that square, and 0 otherwise, meaning that only one feature will be non-zero at any given time).
G
2
5
сл
4
Let us look more closely at the goal state and states 1-5, and consider the one-hot features just for those 6
states. When the agent is at state 5, it's features will be x (5) = (0, 0, 0, 0, 0, 1), and features for state 1 will
be (1)=(0, 1, 0, 0, 0, 0). This means that V (5, w) = (0, 0, 0, 0, 0, 1).
For this exercise, discount y = 0.9 and trace decay λ = 0.3.
In the trajectory given above, what will be the last component of z, the eligibility trace, after processing state
3? That is, what will be the value of the component of z associated with state 5 after the component
associated with state 3 has been added to it?

TD Error
0.0/1.0 point (graded)
Now consider that the states are initialized with the following values: V (1, w) = 10, V (2, w) = 15,
V (3, w) = 20, V (4, w) = 25, V (5, w) = 30, and the discount rate y = 0.9.
What is the TD-error & for the transition from state 5 to state 4?
Submit
You have used 0 of 500 attempts
Save
Update
0.0/1.0 point (graded)
==
Now consider that the states are initialized with the following values: V (1, w) = 10, V (2, w) = 15,
V (3, w) = 20, V (4, w) = 25, V (5, w) = 30, V (goal, w) = 0 and the discount rate = 0.9 and learning
rate a = 0.1.
What will be the new value of state 5 (i.e. V (5, w)) if we make updates according to TD(A) at each step with
λ = 0.3, at the end of the episode? Please specify the value up to 4 decimal digits.

Expert Solution

Step by step

Solved in 2 steps

SEE SOLUTION Check out a sample Q&A here

Blurred answer

Similar questions

Recommended textbooks for you

Database System Concepts

Database System Concepts

Computer Science

ISBN:

9780078022159

Author:

Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:

McGraw-Hill Education

Starting Out with Python (4th Edition)

Starting Out with Python (4th Edition)

Computer Science

ISBN:

9780134444321

Author:

Tony Gaddis

Publisher:

PEARSON

Digital Fundamentals (11th Edition)

Digital Fundamentals (11th Edition)

Computer Science

ISBN:

9780132737968

Author:

Thomas L. Floyd

Publisher:

PEARSON

Database System Concepts

Database System Concepts

Computer Science

ISBN:

9780078022159

Author:

Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:

McGraw-Hill Education

Starting Out with Python (4th Edition)

Starting Out with Python (4th Edition)

Computer Science

ISBN:

9780134444321

Author:

Tony Gaddis

Publisher:

PEARSON

Digital Fundamentals (11th Edition)

Digital Fundamentals (11th Edition)

Computer Science

ISBN:

9780132737968

Author:

Thomas L. Floyd

Publisher:

PEARSON

C How to Program (8th Edition)

C How to Program (8th Edition)

Computer Science

ISBN:

9780133976892

Author:

Paul J. Deitel, Harvey Deitel

Publisher:

PEARSON

Database Systems: Design, Implementation, & Manag…

Database Systems: Design, Implementation, & Manag…

Computer Science

ISBN:

9781337627900

Author:

Carlos Coronel, Steven Morris

Publisher:

Cengage Learning

Programmable Logic Controllers

Programmable Logic Controllers

Computer Science

ISBN:

9780073373843

Author:

Frank D. Petruzella

Publisher:

McGraw-Hill Education

SEE MORE TEXTBOOKS