CSI 4106 - Fall 2024
Version: Dec 3, 2024 12:59
This lecture examines competitive environments where multiple agents have conflicting objectives, resulting in adversarial search problems.
Zero-sum games are competitive scenarios where one player’s gain is exactly balanced by another player’s loss, resulting in a net change of zero in total wealth or benefit.
Develop a policy \(S_0 \rightarrow S_\mathrm{final}\).
get_valid_moves
make_move
is_terminal
def is_terminal(state):
# Check rows, columns, and diagonals for a win
lines = []
lines.extend(state) # Rows
lines.extend(state.T) # Columns
lines.append(np.diagonal(state)) # Main diagonal
lines.append(np.diagonal(np.fliplr(state))) # Anti-diagonal
for line in lines:
if np.all(line == 'X') or np.all(line == 'O'):
return True
# Check for a draw (no empty spaces)
if ' ' not in state:
return True
return False
get_opponent
count_valid_sequences
The total number of valid sequences is: 255,168
Tic-tac-toe has 8 symmetrical transformations (4 rotations and 4 reflections).
By considering these, many game sequences that are different in raw move order become equivalent.
The number of unique sequences of moves is 26,830, whereas the number of unique board positions is 765.
The search tree size for the tic-tac-toe game is relatively small, making it suitable for use as a running example in later discussions.
How does this compare to the search trees for chess and Go?
Chess: \(35^{80} \sim 10^{123}\)
Go: \(361! \sim 10^{768}\)
Optimal play involves executing the best possible move at each step to maximize winning chances or outcomes.
In perfect information games like tic-tac-toe or chess, it requires anticipating the opponent’s moves and choosing actions that enhance one’s position or minimize losses.
What should be player 2’s strategy and why?
For move \(A\):
For move \(B\):
What should now be the strategy for Player 1?
Player 1, being the maximizer, will choose move \(A\), as it leads to the higher score of 3 after Player 2 minimizes.
Player 1 is the maximizing player, seeking the highest score.
Player 2 is the minimizing player, seeking the lowest score.
Evaluation:
Player 2 evaluates the potential outcomes for each of their moves and chooses the least favorable outcome for Player 1.
Player 1 then evaluates these outcomes, choosing the move that maximizes their minimum guaranteed score.
The minimax algorithm operates by exploring all possible moves in a game tree, evaluating the outcomes to minimize the possible loss for a worst-case scenario. At each node:
By backtracking from the terminal nodes to the root, the algorithm selects the move that maximizes the player’s minimum gain, effectively anticipating and countering the opponent’s best strategies.
# Tic-Tac-Toe game class
class TicTacToe(Game):
def __init__(self):
self.size = 3
self.board = np.full((self.size, self.size), ' ')
def get_valid_moves(self, state):
# Returns a list of available positions
moves = []
for i in range(self.size):
for j in range(self.size):
if state[i][j] == ' ':
moves.append((i, j))
return moves
def make_move(self, state, move, player):
# Returns a new state after making the move
new_state = state.copy()
new_state[move] = player
return new_state
def is_terminal(self, state):
# Check rows, columns, and diagonals for a win
lines = []
lines.extend(state) # Rows
lines.extend(state.T) # Columns
lines.append(np.diagonal(state)) # Main diagonal
lines.append(np.diagonal(np.fliplr(state))) # Anti-diagonal
for line in lines:
if np.all(line == 'X') or np.all(line == 'O'):
return True
# Check for a draw (no empty spaces)
if ' ' not in state:
return True
return False
def evaluate(self, state):
# Simple evaluation function
lines = []
lines.extend(state) # Rows
lines.extend(state.T) # Columns
lines.append(np.diagonal(state)) # Main diagonal
lines.append(np.diagonal(np.fliplr(state))) # Anti-diagonal
for line in lines:
if np.all(line == 'X'):
return 1 # X wins
if np.all(line == 'O'):
return -1 # O wins
return 0 # Draw or ongoing
def display(self, state):
display_tic_tac_toe(state, title=None)
def get_opponent(self, player):
return 'O' if player == 'X' else 'X'
import math
def minimax(game, state, depth, player, maximizing_player):
if game.is_terminal(state) or depth == 0:
return game.evaluate(state), None
valid_moves = game.get_valid_moves(state)
best_move = None
if maximizing_player:
max_eval = -math.inf
for move in valid_moves:
new_state = game.make_move(state, move, player)
eval_score, _ = minimax(game, new_state, depth - 1, game.get_opponent(player), False)
if eval_score > max_eval:
max_eval = eval_score
best_move = move
return max_eval, best_move
else:
min_eval = math.inf
for move in valid_moves:
new_state = game.make_move(state, move, player)
eval_score, _ = minimax(game, new_state, depth - 1, game.get_opponent(player), True)
if eval_score < min_eval:
min_eval = eval_score
best_move = move
return min_eval, best_move
def test_tic_tac_toe():
game = TicTacToe()
current_state = game.board.copy()
player = 'X'
maximizing_player=True
# Simulate a game
while not game.is_terminal(current_state):
game.display(current_state)
_, move = minimax(game, current_state, depth=9, player=player, maximizing_player=maximizing_player)
if move is None:
print("Game Over!")
break
current_state = game.make_move(current_state, move, player)
player = game.get_opponent(player)
maximizing_player = not maximizing_player
game.display(current_state)
result = game.evaluate(current_state)
if result == 1:
print("X wins!")
elif result == -1:
print("O wins!")
else:
print("It's a draw!")
It's a draw!
Elapsed time: 24.410518 seconds
Is test_tic_tac_toe
slower than expected?
Do you see an area for improvement?
def memoize_minimax(f):
cache = {}
def wrapper(game, state, depth, player, maximizing_player):
state_key = tuple(map(tuple, state)) # hashable state
key = (state_key, depth, player, maximizing_player)
if key in cache:
return cache[key]
result = f(game, state, depth, player, maximizing_player)
cache[key] = result
return result
return wrapper
@memoize_minimax
def minimax(game, state, depth, player, maximizing_player):
# The minimax code remains the same, without any cache handling
if game.is_terminal(state) or depth == 0:
return game.evaluate(state), None
valid_moves = game.get_valid_moves(state)
best_move = None
if maximizing_player:
max_eval = -math.inf
for move in valid_moves:
new_state = game.make_move(state, move, player)
eval_score, _ = minimax(game, new_state, depth - 1, game.get_opponent(player), False)
if eval_score > max_eval:
max_eval = eval_score
best_move = move
return max_eval, best_move
else:
min_eval = math.inf
for move in valid_moves:
new_state = game.make_move(state, move, player)
eval_score, _ = minimax(game, new_state, depth - 1, game.get_opponent(player), True)
if eval_score < min_eval:
min_eval = eval_score
best_move = move
return min_eval, best_move
It's a draw!
Elapsed time: 0.605073 seconds
Compare the reduction in execution time achieved through symmetry considerations versus caching techniques. Evaluate the combined effect of both approaches.
Develop a Connect Four game implementation employing a minimax search algorithm.
Connect Four is symmetric across its vertical axis. Develop a new implementation that leverages this symmetry.
The number of valid sequences of actions grows factorially, with particularly large growth observed in games like chess and Go.
To enhance the efficiency of the minimax algorithm, one could possibly prune certain parts of the search tree, thereby avoiding the exploration of descendant nodes.
How would you implement this modification? What factors would you take into account?
Tree pruning should be performed only when it can be demonstrated that those subtrees cannot yield better solutions.
Alpha-beta pruning is an optimization technique for the minimax algorithm that reduces the number of nodes evaluated in the search tree.
It achieves this by eliminating branches that cannot possibly influence the final decision, using two parameters:
alpha, the maximum score that the maximizing player is assured, and
beta, the minimum score that the minimizing player is assured.
At a maximizing node:
The maximizer aims to maximize the score.
Alpha (\(\alpha\)) is updated to the highest value found so far among child nodes.
Process:
Initialize \(\alpha = -\infty\).
For each child node:
Compute the evaluation score.
Update \(\alpha = \max(\alpha, \mathrm{child\_score})\).
At a minimizing node:
The minimizer aims to minimize the score.
Beta (\(\beta\)) is updated to the lowest value found so far among child nodes.
Process:
Initialize \(\beta = \infty\).
For each child node:
Compute the evaluation score.
Update \(\beta = \min(\beta, \mathrm{child\_score})\).
When a node’s evaluation proves it cannot improve on the current alpha or beta, further exploration of that branch is halted, thereby enhancing computational efficiency without affecting the outcome.
Pruning Condition:
If \(\beta \leq \alpha\), further exploration of the current node’s siblings is unnecessary.
Rationale:
The maximizer has a guaranteed score of at least \(\alpha\).
The minimizer can ensure that the maximizer cannot get a better score than \(\beta\).
If \(\beta \leq \alpha\), the maximizer won’t find a better option in this branch.
The effectiveness of pruning is influenced by the order in which nodes are evaluated.
Greater pruning is achieved if nodes are ordered from most to least promising.
# Minimax algorithm with Alpha-Beta Pruning
def alpha_beta_search(game, state, depth, player, alpha, beta, maximizing_player):
"""
Minimax algorithm with alpha-beta pruning.
:param game: The game instance.
:param state: The current game state.
:param depth: The maximum depth to search.
:param player: The current player ('X' or 'O').
:param alpha: The best value that the maximizer currently can guarantee at that level or above.
:param beta: The best value that the minimizer currently can guarantee at that level or above.
:param maximizing_player: True if the current move is for the maximizer.
:return: A tuple of (evaluation score, best move).
"""
if maximizing_player:
max_eval = -math.inf # Initialize maximum evaluation
for move in valid_moves:
# Simulate the move
new_state = game.make_move(state, move, player)
# Recursive call to alpha_beta_search for the minimizing player
eval_score, _ = alpha_beta_search(game, new_state, depth - 1, game.get_opponent(player), alpha, beta, False)
if eval_score > max_eval:
max_eval = eval_score # Update maximum evaluation
best_move = move # Update best move
alpha = max(alpha, eval_score) # Update alpha
if beta <= alpha:
break # Beta cut-off (prune the remaining branches)
return max_eval, best_move
else:
min_eval = math.inf # Initialize minimum evaluation
for move in valid_moves:
# Simulate the move
new_state = game.make_move(state, move, player)
# Recursive call to alpha_beta_search for the maximizing player
eval_score, _ = alpha_beta_search(game, new_state, depth - 1, game.get_opponent(player), alpha, beta, True)
if eval_score < min_eval:
min_eval = eval_score # Update minimum evaluation
best_move = move # Update best move
beta = min(beta, eval_score) # Update beta
if beta <= alpha:
break # Alpha cut-off (prune the remaining branches)
return min_eval, best_move
Grasping why alpha-beta pruning boosts minimax efficiency without altering outcomes requires careful thought.
The algorithm changes are minimal.
Is the enhancement justified?
Number of sequences explored by the Minimax Search algorithm: 255,168
Number of sequences explored by the Alpha-Beta Search algorithm: 7,330
A 97.13% reduction in the number of the sequences visited!
Implement a Connect Four game using the Alpha-Beta Search algorithm. Conduct a comparative analysis between the Minimax and Alpha-Beta Search implementations.
Marcel Turcotte
School of Electrical Engineering and Computer Science (EECS)
University of Ottawa