Adversarial Search

CSI 4106 - Fall 2024

Marcel Turcotte

Version: Dec 3, 2024 12:59

Preamble

Quote of the Day

Learning objectives

  • Explain zero-sum game concepts
  • Formulate never-lose strategies in Tic-Tac-Toe regardless of opponent moves
  • Utilize the minimax algorithm to determine optimal moves in adversarial settings
  • Articulate how alpha-beta pruning reduces the number of nodes evaluated without affecting outcomes

Introduction

Types of Games

  • Deterministic or stochastic
  • One, two, or more players
  • Zero-sum or not
  • Perfect information or not

Definition

Zero-sum games are competitive scenarios where one player’s gain is exactly balanced by another player’s loss, resulting in a net change of zero in total wealth or benefit.

Deterministic Games

  • States: \(S\) (\(S_0\) to \(S_k\))
  • Players: \(P = {1, N}\)
  • Actions: \(A\) (depends on \(P\) and \(S\))
  • Transition function: \(S \times A \rightarrow S\)
  • A final state: \(S_\mathrm{final}\)
  • Reward or utility: \(S_\mathrm{final}, p\)

Develop a policy \(S_0 \rightarrow S_\mathrm{final}\).

What do think?

  • Consider playing tic-tac-toe.
  • Can you ensure a never-lose strategy, irrespective of your opponent’s moves?
  • Extend this analysis to games like chess or Go.

Tic-Tac-Toe

Let’s represent the state of a tic-tac-toe game with a numpy array:

current_state = np.full((3, 3), ' ')

get_valid_moves

def get_valid_moves(state):

    size = state.shape[0]

    # Returns a list of available positions
    moves = []
    for i in range(size):
        for j in range(size):
            if state[i][j] == ' ':
                moves.append((i, j))

    return moves

make_move

def make_move(state, move, player):

    # Returns a new state after making the move
    new_state = state.copy()
    new_state[move] = player

    return new_state

is_terminal

def is_terminal(state):

    # Check rows, columns, and diagonals for a win
    lines = []
    lines.extend(state)  # Rows
    lines.extend(state.T)  # Columns
    lines.append(np.diagonal(state))  # Main diagonal
    lines.append(np.diagonal(np.fliplr(state)))  # Anti-diagonal

    for line in lines:
        if np.all(line == 'X') or np.all(line == 'O'):
            return True

    # Check for a draw (no empty spaces)
    if ' ' not in state:
        return True

    return False

get_opponent

def get_opponent(player):
    return 'O' if player == 'X' else 'X'

count_valid_sequences

def count_valid_sequences(state, player):

    if is_terminal(state):
      return 1

    valid_moves = get_valid_moves(state)

    total = 0
    for move in valid_moves:
        new_state = make_move(state, move, player)
        total += count_valid_sequences(new_state, get_opponent(player))

    return total
The total number of valid sequences is: 255,168

Symmetry (digression)

Tic-tac-toe has 8 symmetrical transformations (4 rotations and 4 reflections).

By considering these, many game sequences that are different in raw move order become equivalent.

The number of unique sequences of moves is 26,830, whereas the number of unique board positions is 765.

Search Tree

The search tree size for the tic-tac-toe game is relatively small, making it suitable for use as a running example in later discussions.

How does this compare to the search trees for chess and Go?

Search Tree

  • Chess: \(35^{80} \sim 10^{123}\)

  • Go: \(361! \sim 10^{768}\)

Definition

Optimal play involves executing the best possible move at each step to maximize winning chances or outcomes.

In perfect information games like tic-tac-toe or chess, it requires anticipating the opponent’s moves and choosing actions that enhance one’s position or minimize losses.

Two-Move Game

Game Setup

  • The game starts with a single decision point for Player 1, who has two possible moves: \(A\) and \(B\).
  • Each of these moves leads to a decision point for Player 2, who also has two possible responses: \(C\) and \(D\).
  • The game ends after Player 2’s move, resulting in a terminal state with predefined scores.

Search Tree

  • Root Node: Represents the initial state before Player 1’s move.
  • Ply 1: Player 1 chooses between moves \(A\) and \(B\).
  • Ply 2: For each of Player 1’s moves, Player 2 chooses between moves \(C\) and \(D\).
  • Leaf Nodes: Each branch’s endpoint is a terminal state with an associated score.

Scores

  • \((A, C)\) results in a score of 3.
  • \((A, D)\) results in a score of 5.
  • \((B, C)\) results in a score of 2.
  • \((B, D)\) results in a score of 1.

Strategy

What should be player 2’s strategy and why?

Strategy

  • For move \(A\):

    • Player 2 can choose \(C\) (score = 3) or \(D\) (score = 5); they choose \(C\) (minimizing to 3).
  • For move \(B\):

    • Player 2 can choose \(C\) (score = 2) or \(D\) (score = 1); they choose \(D\) (minimizing to 1).

Strategy

What should now be the strategy for Player 1?

Strategy

Player 1, being the maximizer, will choose move \(A\), as it leads to the higher score of 3 after Player 2 minimizes.

Minimax

  • Player 1 is the maximizing player, seeking the highest score.

  • Player 2 is the minimizing player, seeking the lowest score.

Evaluation:

  • Player 2 evaluates the potential outcomes for each of their moves and chooses the least favorable outcome for Player 1.

  • Player 1 then evaluates these outcomes, choosing the move that maximizes their minimum guaranteed score.

Minimax Search

The minimax algorithm operates by exploring all possible moves in a game tree, evaluating the outcomes to minimize the possible loss for a worst-case scenario. At each node:

  • Maximizing Player’s Turn: Choose the move with the highest possible value.
  • Minimizing Player’s Turn: Choose the move with the lowest possible value.

By backtracking from the terminal nodes to the root, the algorithm selects the move that maximizes the player’s minimum gain, effectively anticipating and countering the opponent’s best strategies.

Minimax Search

Walkthrough (first 4 minutes)

Base

# Base class for the game
class Game:
    def __init__(self):
        pass

    def get_valid_moves(self, state):
        pass

    def make_move(self, state, move, player):
        pass

    def is_terminal(self, state):
        pass

    def evaluate(self, state):
        pass

    def display(self, state):
        pass

    def get_opponent(self, player):
        pass

Tic-Tac-Toe

# Tic-Tac-Toe game class
class TicTacToe(Game):

    def __init__(self):
        self.size = 3
        self.board = np.full((self.size, self.size), ' ')

    def get_valid_moves(self, state):
        # Returns a list of available positions
        moves = []
        for i in range(self.size):
            for j in range(self.size):
                if state[i][j] == ' ':
                    moves.append((i, j))
        return moves

    def make_move(self, state, move, player):
        # Returns a new state after making the move
        new_state = state.copy()
        new_state[move] = player
        return new_state

    def is_terminal(self, state):

        # Check rows, columns, and diagonals for a win
        lines = []
        lines.extend(state)  # Rows
        lines.extend(state.T)  # Columns
        lines.append(np.diagonal(state))  # Main diagonal
        lines.append(np.diagonal(np.fliplr(state)))  # Anti-diagonal

        for line in lines:
            if np.all(line == 'X') or np.all(line == 'O'):
                return True

        # Check for a draw (no empty spaces)
        if ' ' not in state:
            return True

        return False

    def evaluate(self, state):

        # Simple evaluation function
        lines = []
        lines.extend(state)  # Rows
        lines.extend(state.T)  # Columns
        lines.append(np.diagonal(state))  # Main diagonal
        lines.append(np.diagonal(np.fliplr(state)))  # Anti-diagonal

        for line in lines:
            if np.all(line == 'X'):
                return 1  # X wins
            if np.all(line == 'O'):
                return -1  # O wins

        return 0  # Draw or ongoing

    def display(self, state):

        display_tic_tac_toe(state, title=None)

    def get_opponent(self, player):
        return 'O' if player == 'X' else 'X'

Minimax

import math

def minimax(game, state, depth, player, maximizing_player):

    if game.is_terminal(state) or depth == 0:
        return game.evaluate(state), None

    valid_moves = game.get_valid_moves(state)
    best_move = None

    if maximizing_player:
        max_eval = -math.inf
        for move in valid_moves:
            new_state = game.make_move(state, move, player)
            eval_score, _ = minimax(game, new_state, depth - 1, game.get_opponent(player), False)
            if eval_score > max_eval:
                max_eval = eval_score
                best_move = move
        return max_eval, best_move
    else:
        min_eval = math.inf
        for move in valid_moves:
            new_state = game.make_move(state, move, player)
            eval_score, _ = minimax(game, new_state, depth - 1, game.get_opponent(player), True)
            if eval_score < min_eval:
                min_eval = eval_score
                best_move = move
        return min_eval, best_move

Run

def test_tic_tac_toe():

    game = TicTacToe()
    current_state = game.board.copy()
    player = 'X'
    maximizing_player=True

    # Simulate a game
    while not game.is_terminal(current_state):

        game.display(current_state)

        _, move = minimax(game, current_state, depth=9, player=player, maximizing_player=maximizing_player)

        if move is None:
            print("Game Over!")
            break

        current_state = game.make_move(current_state, move, player)

        player = game.get_opponent(player)
        maximizing_player = not maximizing_player

    game.display(current_state)
    result = game.evaluate(current_state)
    if result == 1:
        print("X wins!")
    elif result == -1:
        print("O wins!")
    else:
        print("It's a draw!")

Run (1/2)

It's a draw!
Elapsed time: 24.410518 seconds

Faster Execution (digression)

  • Is test_tic_tac_toe slower than expected?

  • Do you see an area for improvement?

Caching

def memoize_minimax(f):

    cache = {}

    def wrapper(game, state, depth, player, maximizing_player):

        state_key = tuple(map(tuple, state)) # hashable state
        key = (state_key, depth, player, maximizing_player)

        if key in cache:
            return cache[key]

        result = f(game, state, depth, player, maximizing_player)
        cache[key] = result

        return result

    return wrapper

Caching

@memoize_minimax
def minimax(game, state, depth, player, maximizing_player):

    # The minimax code remains the same, without any cache handling
    if game.is_terminal(state) or depth == 0:
        return game.evaluate(state), None

    valid_moves = game.get_valid_moves(state)
    best_move = None

    if maximizing_player:
        max_eval = -math.inf
        for move in valid_moves:
            new_state = game.make_move(state, move, player)
            eval_score, _ = minimax(game, new_state, depth - 1, game.get_opponent(player), False)
            if eval_score > max_eval:
                max_eval = eval_score
                best_move = move
        return max_eval, best_move
    else:
        min_eval = math.inf
        for move in valid_moves:
            new_state = game.make_move(state, move, player)
            eval_score, _ = minimax(game, new_state, depth - 1, game.get_opponent(player), True)
            if eval_score < min_eval:
                min_eval = eval_score
                best_move = move
        return min_eval, best_move

Run (2/2)

It's a draw!
Elapsed time: 0.605073 seconds

Lower Predictability (digression)

import random

class TicTacToe(Game):

    def get_valid_moves(self, state):
        # Returns a list of available positions
        moves = []
        for i in range(self.size):
            for j in range(self.size):
                if state[i][j] == ' ':
                    moves.append((i, j))

        return random.shuffle(moves)

    # All the other methods stay the same

Exploration

  • Compare the reduction in execution time achieved through symmetry considerations versus caching techniques. Evaluate the combined effect of both approaches.

  • Develop a Connect Four game implementation employing a minimax search algorithm.

  • Connect Four is symmetric across its vertical axis. Develop a new implementation that leverages this symmetry.

Remark

The number of valid sequences of actions grows factorially, with particularly large growth observed in games like chess and Go.

Pruning

To enhance the efficiency of the minimax algorithm, one could possibly prune certain parts of the search tree, thereby avoiding the exploration of descendant nodes.

Pruning

How would you implement this modification? What factors would you take into account?

Pruning

Tree pruning should be performed only when it can be demonstrated that those subtrees cannot yield better solutions.

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Alph-Beta Pruning

Alpha-beta pruning is an optimization technique for the minimax algorithm that reduces the number of nodes evaluated in the search tree.

Alph-Beta Pruning

It achieves this by eliminating branches that cannot possibly influence the final decision, using two parameters:

  • alpha, the maximum score that the maximizing player is assured, and

  • beta, the minimum score that the minimizing player is assured.

Maximizing Player’s Perspective

At a maximizing node:

  • The maximizer aims to maximize the score.

  • Alpha (\(\alpha\)) is updated to the highest value found so far among child nodes.

  • Process:

    • Initialize \(\alpha = -\infty\).

    • For each child node:

      • Compute the evaluation score.

      • Update \(\alpha = \max(\alpha, \mathrm{child\_score})\).

Minimizing Player’s Perspective

At a minimizing node:

  • The minimizer aims to minimize the score.

  • Beta (\(\beta\)) is updated to the lowest value found so far among child nodes.

  • Process:

    • Initialize \(\beta = \infty\).

    • For each child node:

      • Compute the evaluation score.

      • Update \(\beta = \min(\beta, \mathrm{child\_score})\).

Alph-Beta Pruning

When a node’s evaluation proves it cannot improve on the current alpha or beta, further exploration of that branch is halted, thereby enhancing computational efficiency without affecting the outcome.

Role of Alpha and Beta in Pruning

Pruning Condition:

  • If \(\beta \leq \alpha\), further exploration of the current node’s siblings is unnecessary.

  • Rationale:

    • The maximizer has a guaranteed score of at least \(\alpha\).

    • The minimizer can ensure that the maximizer cannot get a better score than \(\beta\).

    • If \(\beta \leq \alpha\), the maximizer won’t find a better option in this branch.

Walkthrough (6:21 to 8:10)

Node Order

  • The effectiveness of pruning is influenced by the order in which nodes are evaluated.

  • Greater pruning is achieved if nodes are ordered from most to least promising.

Alpha-Beta Search

# Minimax algorithm with Alpha-Beta Pruning

def alpha_beta_search(game, state, depth, player, alpha, beta, maximizing_player):

    """
    Minimax algorithm with alpha-beta pruning.

    :param game: The game instance.
    :param state: The current game state.
    :param depth: The maximum depth to search.
    :param player: The current player ('X' or 'O').
    :param alpha: The best value that the maximizer currently can guarantee at that level or above.
    :param beta: The best value that the minimizer currently can guarantee at that level or above.
    :param maximizing_player: True if the current move is for the maximizer.
    :return: A tuple of (evaluation score, best move).
    """

Alpha-Beta Search

    # Base case: check for terminal state or maximum depth

    if game.is_terminal(state) or depth == 0:
        score = game.evaluate(state)
        return score, None  # Return the evaluation score and no move

    valid_moves = game.get_valid_moves(state)
    best_move = None  # Initialize the best move

Alpha-Beta Search

    if maximizing_player:

        max_eval = -math.inf  # Initialize maximum evaluation

        for move in valid_moves:

            # Simulate the move
            new_state = game.make_move(state, move, player)

            # Recursive call to alpha_beta_search for the minimizing player
            eval_score, _ = alpha_beta_search(game, new_state, depth - 1, game.get_opponent(player), alpha, beta, False)

            if eval_score > max_eval:
                max_eval = eval_score  # Update maximum evaluation
                best_move = move       # Update best move

            alpha = max(alpha, eval_score)  # Update alpha

            if beta <= alpha:
                break  # Beta cut-off (prune the remaining branches)

        return max_eval, best_move

Alpha-Beta Search

    else:

        min_eval = math.inf  # Initialize minimum evaluation

        for move in valid_moves:

            # Simulate the move
            new_state = game.make_move(state, move, player)

            # Recursive call to alpha_beta_search for the maximizing player
            eval_score, _ = alpha_beta_search(game, new_state, depth - 1, game.get_opponent(player), alpha, beta, True)

            if eval_score < min_eval:
                min_eval = eval_score  # Update minimum evaluation
                best_move = move       # Update best move

            beta = min(beta, eval_score)  # Update beta

            if beta <= alpha:
                break  # Alpha cut-off (prune the remaining branches)

        return min_eval, best_move

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Summmary

  • Alpha Cutoff: Occurs at minimizer nodes when \(\beta \le \alpha\).
  • Beta Cutoff: Occurs at maximizer nodes when \(\alpha \ge \beta\).

Minimax vs. Alpha-Beta Pruning

  • Grasping why alpha-beta pruning boosts minimax efficiency without altering outcomes requires careful thought.

  • The algorithm changes are minimal.

  • Is the enhancement justified?

Minimax vs. Alpha-Beta Pruning

Number of sequences explored by the Minimax Search algorithm: 255,168

Number of sequences explored by the Alpha-Beta Search algorithm: 7,330

A 97.13% reduction in the number of the sequences visited!

Exploration

Implement a Connect Four game using the Alpha-Beta Search algorithm. Conduct a comparative analysis between the Minimax and Alpha-Beta Search implementations.

Prologue

Further exploration

  • Expetimax search: handling players that are not perfect;
  • Expectiminimax: handling chance in games such as backgammon.

Summary

  • Introduction to adversarial search
  • Zero-sum games
  • Introduction to the minimax search method
  • Role of alpha and beta pruning in minimax search

Next lecture

  • We will look at the Monte Carlo Tree Search (MCTS) algorithm

References

Russell, Stuart, and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach. 4th ed. Pearson. http://aima.cs.berkeley.edu/.
Shannon, Claude E. 1959. “Programming a Computer Playing Chess.” Philosophical Magazine Ser.7, 41 (312).

Marcel Turcotte

Marcel.Turcotte@uOttawa.ca

School of Electrical Engineering and Computer Science (EECS)

University of Ottawa