Practice Quiz- Value Functions and Bellman Equations
pdf
keyboard_arrow_up
School
Northwest Missouri State University *
*We aren’t endorsed by this school
Course
5331
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
1
Uploaded by GrandWorld13655
é Value Functions and Bellman Equations Practice Quiz + 45 min + Congratulations! You passed! = TO PASS 80% or higher Value Functions and Bellman Equations TOTAL POINTS 10 1. Apolicy is a function whichmaps ___to__. States to actions. States to values. Actions to probability distributions over values. States to probability distributions over actions. Actions to probabilities. / Correct Correct! 2. The term “backup” most closely resembles the term __ in meaning. Value Update Diagram / Correct Correct! 3. At least one deterministic optimal policy exists in every Markov decision process. True False / Correct Correct! Let's say there is a policy 7, which does well in some states, while policy 5 does well in others. We could combine these policies into a third policy 73, which always chooses actions according to whichever of policy 71y and 72 has the highest value in the current state. 3 will necessarily have a value greater than or equal to both 7y and 73 in every state! So we will never have a situation where doing well in one state requires sacrificing value in another. Because of this, there always exists some policy which is best in every state. This is of course only an informal argument, but there is in fact a rigorous proof showing that there must always exist at least one optimal deterministic policy. 4. The optimal state-value function: Is not guaranteed to be unique, even in finite Markov decision processes. Is unique in every finite Markov decision process. / Correct Correct! The Bellman optimality equation is actually a system of equations, one for each state, so if there are N states, then there are N equations in N unknowns. If the dynamics of the environment are known, then in principle one can solve this system of equations for the optimal value function using any one of a variety of methods for solving systems of nonlinear equations. All optimal policies share the same optimal state-value function. 5. Does adding a constant to all rewards change the set of optimal policies in episodic tasks? Yes, adding a constant to all rewards changes the set of optimal policies. No, as long as the relative differences between rewards remain the same, the set of optimal policies is the same. / Correct Correct! Adding a constant to the reward signal can make longer episodes more or less advantageous (depending on whether the constant is positive or negative). 6. Does adding a constant to all rewards change the set of optimal policies in continuing tasks? Yes, adding a constant to all rewards changes the set of optimal policies. No, as long as the relative differences between rewards remain the same, the set of optimal policies is the same. / Correct Correct! Since the task is continuing, the agent will accumulate the same amount of extra reward independent of its behavior. 7. Select the equation that correctly relates v, to ¢,. Assume 7 is the uniform random policy. Vi($) = X, Aals)p(s’, rls, @)lr + q,(s")] vi(s) = maxqq, (s, a) Vu($) = Xy wals)p(s’, rls, a)lr + vg,(s)) Vvi($) = X, mals)p(s’, rls, a)q.(s") / Correct Correct! 8. Select the equation that correctly relates g, to v, using four-argument function p. q.(5,a0) = Xy, p(s',rla, $)[r +v.(s")] q.(s.a) = Xy, p(s' ra, )ylr +vi(s)] qu(s,a) = X, p(s', rla, )[r + yva(s)] / Correct Correct! 9. Write a policy 7, in terms of ¢,. m(als) = q,(s,a) #,(als) = max, q,(s,d) #i(als) = 1if a = argmax, ¢,(s,d'), else 0 / Correct Correct! 10. Give an equation for some 7, in terms of v, and the four-argument p. m(als) = 1if vy(s) = max, Y, p(s’,rls,d)[r + yva(s')], else O z(als) = maxy X, p(s’, rls, d)r + pvi(s')] dls) = X, pls'. s, )lr + pv.(s))] m(als) = 1ifvi(s) = X, p(s', rls, @)lr + yvu(s')], else 0 / Correct Correct!
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help