I need to implement a python code for Q learning and SARSA method. My example involves a cliff walking experiment where the rewards are -1 except for the region marked as cliff if the agent steps there the reward is -100 and the agent is sent back to the start. The values used are alpha = 0.1, y or gamma = 1 and the e- greedy action is 0.1.
I need to implement a python code for Q learning and SARSA method. My example involves a cliff walking experiment where the rewards are -1 except for the region marked as cliff if the agent steps there the reward is -100 and the agent is sent back to the start. The values used are alpha = 0.1, y or gamma = 1 and the e- greedy action is 0.1. After using these values on both
Trending now
This is a popular solution!
Step by step
Solved in 2 steps