COMSW4701_HW2 (1)

pdf

School

Columbia University *

*We aren’t endorsed by this school

Course

4701

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

5

Uploaded by SuperGoldfish3864

Report
COMS W4701: Artificial Intelligence, Spring 2023 Homework 2 Instructions: Compile all solutions to the written problems on this assignment in a single PDF file (typed, or handwritten legibly if absolutely necessary). Coding solutions may be directly im- plemented in the provided Python file(s). When ready, submit all files on Gradescope in the ap- propriate assignment bins. Please be mindful of the deadline, as late submissions are not accepted, as well as our course policies on academic honesty. Problem 1 (18 points) Two players are playing a mis` ere tic-tac-toe game in which grid spaces have point values, shown on the left grid below. The players take turns marking a grid space with their own designation (X or O) until either a) one player gets three marks in a row or b) the board has no empty spaces left. When the game ends, a score is computed as the sum of the values of the O spaces minus the sum of the values of the X spaces. In addition, if O has three in a row, 3 points are added to the score; if X has three in a row, 3 points are subtracted from the score. X seeks to maximize the total score and O seeks to minimize it. 1. The right grid shows the current board configuration. It is X’s turn to move. Draw out the entire game tree with the root corresponding to the current board; be sure to draw the MAX and MIN nodes according to game tree convention. Label each node in the tree top-bottom, left-right using letters starting from A. 2. Sketch out the tic-tac-toe board configuration corresponding to each of the terminal nodes. Find the minimax value of all nodes (both terminal and nonterminal). What is the best move for X to make, and what is the expected score of the game assuming both players play optimally? 3. Suppose we perform alpha-beta search. Give an ordering of the tree nodes that would be processed such that no nodes are actually pruned. This ordering should include all nodes, starting from the root all the way to the last leaf node. 4. Given an ordering of the tree nodes that would be processed such that as many nodes are pruned as possible. Which nodes end up with incorrect minimax values as a result of pruning? 1
5. Suppose that player O plays stochastically. The probability of placing an O in cell j is p j = v j i v i , where i v i is the sum of all currently free cell values. For example, if there are two available cells with values 5 and 2, their probabilities are 5 7 and 2 7 , respectively. Explain how the game tree will change (you do not have to redraw it), and compute the new value and best move for X starting from the current state. 6. Suppose that player X is considering playing either in the center or the upper right cells, and that player O will still play randomly following X in the latter case. Solve for the probabilities for player O making either of the two remaining moves so that X becomes indifferent about the two options on the first turn. Problem 2 (16 points) The partial game tree below was discussed in class on the topic of Monte Carlo tree search. Each node shows the win rate w N : number of playout wins / total number of playouts from that node. The leaf node labeled 0/0 was just expanded in the middle of a MCTS iteration. 1. Suppose that a rollout is performed and the player corresponding to the orange nodes (second and fourth layers) wins. Give the new win rates of all nodes that are updated in order from leaf to root (either the w or N values or both). 2. Using the new win rates and the exploration parameter α = 1, compute the UCT values of each of the nodes in the second layer of the tree (immediate children of the root node). Which of these three nodes is traversed by the selection policy in the next MCTS iteration? 3. Solve for the minimum value of α for which a different child node of the root would be selected. Explain why we need a higher , not lower, α value than that in part 2 in order to select one of the other two nodes. 4. Briefly explain why the UCT selection policy never completely eliminates the possibility of selecting any tree node, even if it has a very low (or even zero) win rate. Assume α > 0. 2
Problem 3 (14 points) Consider the following non-cooperative game. Actions for P1 are given in the rows and P2 given in the columns. The first value of each utility pair is that of P1, and the second is that of P2. P1 / P2 X Y Z A (2,0) (1,1) (4,2) B (3,4) (1,2) (2,3) C (1,3) (0,2) (3,0) 1. Identify any dominant strategies for either player, dominant strategy equilibria, and pure strategy Nash equilibria. Briefly explain your reasoning for each. It is possible that only some or none of these exist. 2. Identify any dominated strategies for either player. Briefly explain your reasoning. 3. Solve for the mixed strategy Nash equilibria. 4. Solve for the expected utilities for each player following the mixed strategies. Problem 4: ( m, n, k ) -Game (52 points) The game of ( m, n, k ) is a generalization of tic-tac-toe. Two players take turns placing marks (one using X and the other using O) on a m × n grid, trying to be the first to achieve k consecutive marks. You will design an agent to play this game using two different methods: minimax search with alpha-beta pruning (alpha-beta search), and Monte Carlo tree search. We are providing a simple Python scaffold in which you will implement these functionalities. A game state is represented as a m × n 2D NumPy array. Element values may be ’X’ , ’O’ , or the character ’.’ , the latter representing an empty cell. A player whose turn it is to move from a given state is also indicated by one of the strings ’X’ and ’O’ . There are two functions in utils.py that will be useful for your implementation. Given a game state and k value, utility() returns 1 if player X has won, 1 if player O has won, 0 if it is a draw (no one has k in a row and the board is full), and None if the state is not terminal. Given a game state and player whose turn it is to move, successors() returns a list of possible successor states (again, each as a separate array). You should not need to invoke the k in row() function. Lastly, there are two classes, Node and Tree , whose usage we will describe in Problem 4.2 (MCTS). mnk game.py contains functionality to simulate a game. Given the starting game state, player to start, k value, strategies for each player, and parameters for MCTS, game loop will simulate each player playing an entire game from start to finish. There are three possible strategies for each player, each indicated by an integer value: 0 for random play, 1 for alpha-beta search, and 2 for MCTS. There is also a print option to show the progress of the game, and the function returns the utility of the final game state when finished. 4.1: Alpha-Beta Search (12 points) The first task is to implement alpha-beta search, which you will do by completing the two internal recursive functions max value() and min value() . The parameters of each are the game state, 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
alpha and beta values, and k . Each one should return the minimax value of the given state, as well as the successor state that corresponds to this value (note that we are returning states in lieu of actions as was described in the book and lecture notes). You can test the two functions individually by finding values of terminal states or states close to terminal states, for which you know the winner with certainty. For example, in the (3,3,3) game, max value() of a state with two X’s and a blank in a row should certainly return a move leading to a winning state for X. When you are finished with both, you can test your implementation on a full game. Run main.py and after setting num games=1 , Xstrat=1 , Ostrat=1 , and print result=True in mnk configs.yaml . This will then repeatedly call the ABS wrapper function to decide each individual move that a player makes, which will use your implementation of alpha-beta search. You should see the expected op- timal result for a (3 , 3 , 3) game. You can also try (4 , 3 , 3) and (4 , 3 , 4) games to further validate your implementation. 4.2: Monte Carlo Tree Search (24 points) Unfortunately, alpha-beta search will not be able to handle boards larger than 4 × 4 (at least not in a reasonable amount of time). So we will turn to MCTS, for which we will need to specify a few more parameters. In return we can more finely control the tradeoff between time spent searching and finding good moves. The wrapper function for MCTS first creates a Tree object containing a Node object corresponding to the starting state. Node s contain the corresponding state , parent , player (whose turn it is to move from the current state), w (number of wins from this state), and N (number of rollouts from this state). A Tree simply maintains a dictionary of state to node mappings. Given a Tree object, we can add new Node s to it or get the Node corresponding to a state. The latter method returns None if the given state is not in the Tree . After initialization, MCTS proceeds to update the Tree and run game simulations by going through each of the four helper functions over and over. Once it is done, the immediate successor of the original state that was rolled out the most is returned. Your task is to write each of the four helper functions used by MCTS: select() traverses the tree from the given state by repeatedly computing the UCT values of the current node’s successors and moving to the one with the highest value. This occurs until we find a node in the tree that either has successors not yet added to the tree or is a terminal state (this may be the root itself). expand() potentially adds a new successor node of the provided state to the tree. If state is already terminal, no expansion takes place, and both the tree and state are returned as they are. Otherwise, find a successor that does not yet exist in the tree, create a new Node for it, and add it to the Tree . The player value of the new node should be opposite from that of the parent. Return both the modified tree and the new state. simulate() rolls out the rest of the game from the provided state until a terminal state and then returns the resultant utility. Here you may simply use a random rollout policy, choosing a single random valid move for each player until the game ends. 4
backprop() backpropagates the resultant utility starting from the node of the given state up to the root of the tree (the node with no parent). Every node along the path should have N incremented by 1. If the result is 0 (a tie), every node increments w by 0 . 5. Otherwise, w is incremented by 1 if either (player is X and result is 1) or (player is O and result is 1). Note that the latter conditions are opposite to the meaning of the utility function, as the win/loss result is a consequence of the parent node (opposing player) choosing the current one. You are encouraged to test each function above individually. You can start with a state close to a terminal, initialize the tree, and run through each of the functions in sequence. Since the tree may potentially be updated after a runthrough of all four functions, you may want to do so multiple times to see if the tree is updating in an expected manner. Once you are ready to run the entire game loop, you can run main.py after setting Xstrat and Ostrat to 2 and rollouts and alpha to values like 500 and 1. 4.3: Analysis (16 points) For each of the following except for part 1, you are recommended to set print result=False to speed up the game runs. The final statistics will still be printed. 1. Find the results of the (3 , 3 , 3), (4 , 3 , 3), and (4 , 3 , 4) games when both players employ alpha- beta search ( Xstrat=Ostrat=1 ). Briefly explain whether these results are deterministic or if they can change. Which player has an advantage in each game, if any, and why? 2. Suppose that X plays randomly ( Xstrat=0 ) and O employs alpha-beta search. Simulate the (3 , 3 , 3), (4 , 3 , 3), and (4 , 3 , 4) games with num games=10 and report the number of wins and draws for each setting. Explain any differences you observe from those above. 3. Now suppose that X plays randomly while O uses MCTS ( Ostrat=2 ) with rollouts=500 and alpha=1 . Simulate 10 each of the (3 , 3 , 3), (4 , 4 , 4), and (5 , 5 , 5) games. Report your findings, and explain why MCTS is able to handle the larger games while alpha-beta search cannot. 4. For the final matchup, let’s have X use alpha-beta search and O use MCTS in the (3 , 3 , 3) game. Run 10 games for each of alpha=1 and alpha=5 . Which value seems to lead to more “optimal” games? (If it is not conclusive, try increasing alpha to 10 or 20, or in- crease num games .) Briefly explain what we can conclude about the role of exploration in the performance of MCTS in this context. Submission You should have one PDF document containing your solutions for problems 1-3, as well as your analysis in problem 4.3. You should also have the completed code file implementing the functions for problem 4; be sure that you did not modify any code outside of the completed functions. Submit the document and code files to the respective assignment bins on Gradescope. For full credit, you must tag your pages for each given problem on the former. 5