Pacman is in an unknown MDP where there are three states [A, B, C] and two actions [Stop, Go]. We are given the following samples generated from taking actions in the unknown MDP. We have Q-learning samples shown below in the order of running. Find the estimates for Q-values of each episode as obtained by Q-learning. Assume that all Q-values are initialized to 0, g= 1, and a = 0.5. Episode s s' a r 1 A Go B 4 2 Stop Stop B A 2 3 B A -4 4 C Stop Go 1 5 C A 2 A Go A -2
Pacman is in an unknown MDP where there are three states [A, B, C] and two actions [Stop, Go]. We are given the following samples generated from taking actions in the unknown MDP. We have Q-learning samples shown below in the order of running. Find the estimates for Q-values of each episode as obtained by Q-learning. Assume that all Q-values are initialized to 0, g= 1, and a = 0.5. Episode s s' a r 1 A Go B 4 2 Stop Stop B A 2 3 B A -4 4 C Stop Go 1 5 C A 2 A Go A -2
Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
Related questions
Question
Pls help
![Pacman is in an unknown MDP where there are three states [A, B, C] and two actions [Stop, Go].
We are given the following samples generated from taking actions in the unknown MDP. We
have Q-learning samples shown below in the order of running. Find the estimates for Q-values of
each episode as obtained by Q-learning. Assume that all Q-values are initialized to 0, g = 1, and a
= 0.5.
Episode s
1
A
s'
a
Go
4
2
C
Stop
Stop
B
A
2
3
B
A
-4
C
C
Stop
Go
4
1
6.
A
Go
-2
mAA](/v2/_next/image?url=https%3A%2F%2Fcontent.bartleby.com%2Fqna-images%2Fquestion%2F792cdf00-9bc6-47a1-9d09-82095a59469d%2F5dfa76ff-fcb7-43f3-93a5-c424d219dfd8%2Fu7pcr0p_processed.png&w=3840&q=75)
Transcribed Image Text:Pacman is in an unknown MDP where there are three states [A, B, C] and two actions [Stop, Go].
We are given the following samples generated from taking actions in the unknown MDP. We
have Q-learning samples shown below in the order of running. Find the estimates for Q-values of
each episode as obtained by Q-learning. Assume that all Q-values are initialized to 0, g = 1, and a
= 0.5.
Episode s
1
A
s'
a
Go
4
2
C
Stop
Stop
B
A
2
3
B
A
-4
C
C
Stop
Go
4
1
6.
A
Go
-2
mAA
Expert Solution

This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
This is a popular solution!
Trending now
This is a popular solution!
Step by step
Solved in 2 steps with 1 images

Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Recommended textbooks for you

Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education

Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON

Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON

Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education

Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON

Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON

C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON

Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning

Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education