Adversarial Search

CSI 4106 Introduction to Artificial Intelligence

Marcel Turcotte

Version: Nov 26, 2025 10:46

Preamble

Message of the Day

Learning objectives

  • Explain zero-sum game concepts
  • Formulate never-lose strategies in Tic-Tac-Toe regardless of opponent moves
  • Utilize the minimax algorithm to determine optimal moves in adversarial settings
  • Articulate how alpha-beta pruning reduces the number of nodes evaluated without affecting outcomes

Core Game Framework

Motivation

This lecture and the next will utilize tic-tac-toe to explore concepts in adversarial search and reinforcement learning.

We will implement and compare a range of solution strategies, including a random policy, three minimax variants, three alpha-beta pruning variants, and two Monte Carlo Tree Search algorithms.

Game

Code
class Game:

    """
    Abstract interface for a deterministic, 2-player, zero-sum,
    turn-taking game.

    Conventions (used by Tic-Tac-Toe and the solvers below):
    - Players are identified by strings "X" and "O".
    - evaluate(state) returns:
        > 0  if the position is good for "X"
        < 0  if the position is good for "O"
        == 0 for a draw or non-terminal equal position
    """

    def initial_state(self):

        """Return an object representing the starting position of the game."""

        raise NotImplementedError

    def get_valid_moves(self, state):

        """
        Given a state, return an iterable of legal moves.
        The type of 'move' is game-dependent (e.g., (row, col) for Tic-Tac-Toe).
        """

        raise NotImplementedError

    def make_move(self, state, move, player):

        """
        Return the successor state obtained by applying 'move' for 'player'
        to 'state'. The original state should not be modified in-place.
        """

        raise NotImplementedError

    def get_opponent(self, player):

        """Return the opponent of 'player'."""

        raise NotImplementedError

    def is_terminal(self, state):

        """
        Return True if 'state' is a terminal position (win, loss, or draw),
        False otherwise.
        """

        raise NotImplementedError

    def evaluate(self, state):

        """
        Return a scalar evaluation of 'state':
            +1 for X win, -1 for O win, 0 otherwise (for Tic-Tac-Toe).
        For other games this may be generalized, but here we keep it simple.
        """

        raise NotImplementedError

    def display(self, state):

        """Print a human-readable representation of 'state' (for debugging)."""

        raise NotImplementedError

Game

class Game:

    def initial_state(self):
        raise NotImplementedError

    def get_valid_moves(self, state):
        raise NotImplementedError

    def make_move(self, state, move, player):
        raise NotImplementedError

    def get_opponent(self, player):
        raise NotImplementedError

    def is_terminal(self, state):
        raise NotImplementedError

    def evaluate(self, state):
        raise NotImplementedError

    def display(self, state):
        raise NotImplementedError

Required Libraries

import math
import random
import numpy as np
import matplotlib.pyplot as plt

TicTacToe

Code
class TicTacToe(Game):

    """
    Classic 3x3 Tic-Tac-Toe implementation using a NumPy array of strings.
    Empty squares are represented by " ".
    Player "X" is assumed to be the maximizing player.
    """

    def __init__(self):
        self.size = 3

    def initial_state(self):

        """Return an empty 3x3 board."""

        return np.full((self.size, self.size), " ")

    def get_valid_moves(self, state):

        """All (i, j) pairs where the board cell is empty."""

        return [
            (i, j)
            for i in range(self.size)
            for j in range(self.size)
            if state[i, j] == " "
        ]

    def make_move(self, state, move, player):

        """
        Return a new board with 'player' placed at 'move' (row, col).
        The original state is not modified.
        """

        new_state = state.copy()
        new_state[move] = player
        return new_state

    def get_opponent(self, player):

        """Swap player labels between 'X' and 'O'."""

        return "O" if player == "X" else "X"

    def is_terminal(self, state):

        """
        A state is terminal if:
        - Either player has a 3-in-a-row (evaluate != 0), or
        - There are no empty squares left (draw).
        """

        if self.evaluate(state) != 0:
            return True
        return " " not in state

    def evaluate(self, state):

        """
        Return +1 if X has three in a row, -1 if O has three in a row,
        and 0 otherwise (including non-terminal states and draws).

        This is a "game-theoretic" evaluation at terminal states; for
        non-terminal positions we simply return 0.
        """

        lines = []

        # Rows and columns
        for i in range(self.size):
            lines.append(state[i, :])   # row i
            lines.append(state[:, i])   # column i

        # Main diagonals
        lines.append(np.diag(state))
        lines.append(np.diag(np.fliplr(state)))

        # Check each line for a win
        for line in lines:
            if np.all(line == "X"):
                return 1
            if np.all(line == "O"):
                return -1
        return 0

    def display(self, state):

        """
        Visualize a Tic-Tac-Toe board using matplotlib.

        Parameters
        ----------
        state : np.ndarray of shape (size, size)
            Board containing ' ', 'X', or 'O'.
        """

        size = self.size

        fig, ax = plt.subplots()
        ax.set_aspect('equal')
        ax.set_xlim(0, size)
        ax.set_ylim(0, size)

        # Draw grid lines
        for i in range(1, size):
            ax.axhline(i, color='black')
            ax.axvline(i, color='black')

        # Hide axes completely
        ax.axis('off')

        # Draw X and O symbols
        for i in range(size):
            for j in range(size):
                cx = j + 0.5
                cy = size - i - 0.5     # invert y-axis for correct row orientation

                symbol = state[i, j]

                if symbol == "X":
                    ax.plot(cx, cy, marker='x',
                            markersize=40 * (3/size),
                            color='blue',
                            markeredgewidth=3)
                elif symbol == "O":
                    circle = plt.Circle((cx, cy),
                                        radius=0.30 * (3/size),
                                        fill=False,
                                        color='red',
                                        linewidth=3)
                    ax.add_patch(circle)

        plt.show()

TicTacToe

class TicTacToe(Game):

    """
    Classic 3x3 Tic-Tac-Toe implementation using a NumPy array of strings.
    Empty squares are represented by " ".
    Player "X" is assumed to be the maximizing player.
    """

    def __init__(self):
        self.size = 3

initial_state

    def initial_state(self):

        """Return an empty 3x3 board."""

        return np.full((self.size, self.size), " ")

get_valid_moves

    def get_valid_moves(self, state):

        """All (i, j) pairs where the board cell is empty."""

        return [
            (i, j)
            for i in range(self.size)
            for j in range(self.size)
            if state[i, j] == " "
        ]

make_move

    def make_move(self, state, move, player):

        """
        Return a new board with 'player' placed at 'move' (row, col).
        The original state is not modified.
        """

        new_state = state.copy()
        new_state[move] = player

        return new_state

get_opponent

    def get_opponent(self, player):

        """Swap player labels between 'X' and 'O'."""

        return "O" if player == "X" else "X"

evaluate

    def evaluate(self, state):

        lines = []

        # Rows and columns
        for i in range(self.size):
            lines.append(state[i, :])   # row i
            lines.append(state[:, i])   # column i

        # Main diagonals
        lines.append(np.diag(state))
        lines.append(np.diag(np.fliplr(state)))

        # Check each line for a win
        for line in lines:
            if np.all(line == "X"):
                return 1
            if np.all(line == "O"):
                return -1
        return 0

is_terminal

    def is_terminal(self, state):

        """
        A state is terminal if:
        - Either player has a 3-in-a-row (evaluate != 0), or
        - There are no empty squares left (draw).
        """

        if self.evaluate(state) != 0:
            return True
        return " " not in state

display

Code
    def display(self, state):

        """
        Visualize a Tic-Tac-Toe board using matplotlib.

        Parameters
        ----------
        state : np.ndarray of shape (size, size)
            Board containing ' ', 'X', or 'O'.
        """

        size = self.size

        fig, ax = plt.subplots()
        ax.set_aspect('equal')
        ax.set_xlim(0, size)
        ax.set_ylim(0, size)

        # Draw grid lines
        for i in range(1, size):
            ax.axhline(i, color='black')
            ax.axvline(i, color='black')

        # Hide axes completely
        ax.axis('off')

        # Draw X and O symbols
        for i in range(size):
            for j in range(size):
                cx = j + 0.5
                cy = size - i - 0.5     # invert y-axis for correct row orientation

                symbol = state[i, j]

                if symbol == "X":
                    ax.plot(cx, cy, marker='x',
                            markersize=40 * (3/size),
                            color='blue',
                            markeredgewidth=3)
                elif symbol == "O":
                    circle = plt.Circle((cx, cy),
                                        radius=0.30 * (3/size),
                                        fill=False,
                                        color='red',
                                        linewidth=3)
                    ax.add_patch(circle)

        plt.show()

Solver

Code
class Solver:

    """
    Base class for all solvers (Random, Minimax, AlphaBeta, MCTS, etc.).

    Solvers must implement:
        - select_move(game, state, player)

    Solvers may optionally implement:
        - reset()           : called at the start of each game
        - opponent_played() : used by persistent solvers (e.g., MCTS)

    Notes
    -----
    • Solvers may keep internal state that persists across moves.
    • GameRunner may call reset() automatically before every match.
    """

    def select_move(self, game, state, player):

        """
        Must be implemented by subclasses.
        Returns a legal move for the given player.
        """

        raise NotImplementedError

    def get_name(self):

        """
        Return the solver's name for reporting, logging, or tournament results.

        The default returns the class name, but solvers may override
        to include parameters (e.g., "MCTS(num_simulations=500)"").
        """
        
        return self.__class__.__name__

    def opponent_played(self, move):
        """
        Optional. Called after the opponent moves.
        Useful for stateful solvers like MCTS.
        Stateless solvers can ignore it.
        """
        pass

    def reset(self):

        """
        Optional. Called once at the beginning of each game.
        Override only if the solver maintains internal state
        (e.g., MCTS tree, cached analysis, heuristic tables).
        """

        pass

Solver

class Solver:

    def select_move(self, game, state, player):
        raise NotImplementedError

    def opponent_played(self, move):
        pass

    def reset(self):
        pass

    def get_name(self):
        return self.__class__.__name__

RandomSolver

Code
class RandomSolver(Solver):

    """
    A simple baseline solver:
    - At each move, chooses uniformly at random among all legal moves.
    - Does not maintain any internal state (no learning).
    """

    def __init__(self, seed=None):
        self.rng = random.Random(seed)

    def select_move(self, game, state, player):

        """Return a random legal move for the current player."""

        moves = game.get_valid_moves(state)

        return self.rng.choice(moves)

    def opponent_played(self, move):

        """Random solver has no internal state to update."""

        pass
class RandomSolver(Solver):

    def __init__(self, seed=None):
        self.rng = random.Random(seed)

    def select_move(self, game, state, player):
        moves = game.get_valid_moves(state)
        return self.rng.choice(moves)

GameRunner

Code
class GameRunner:

    """
    Utility to run a single game between two solvers on a given Game.

    This class is deliberately simple: it alternates moves between "X" and "O"
    until a terminal state is reached.
    """

    def __init__(self, game, verbose=False):
        self.game = game
        self.verbose = verbose

    def play_game(self, solver_X, solver_O):

        """
        Play one full game:
        - solver_X controls player "X"
        - solver_O controls player "O"

        Returns
        -------
        result : int
            +1 if X wins, -1 if O wins, 0 for a draw.
        """

        state = self.game.initial_state()
        player = "X"
        solvers = {"X": solver_X, "O": solver_O}

        # Play until terminal position
        while not self.game.is_terminal(state):
            # Current player selects a move
            move = solvers[player].select_move(self.game, state, player)

            # Apply the move
            state = self.game.make_move(state, move, player)

            if self.verbose:
                self.game.display(state)

            # Notify the opponent (for persistent solvers like MCTS)
            opp = self.game.get_opponent(player)
            solvers[opp].opponent_played(move)

            # Switch active player
            player = opp

        if self.verbose:
            print(self.game.evaluate(state), "\n")

        # Final evaluation from X's perspective
        return self.game.evaluate(state)

GameRunner

class GameRunner:

    def __init__(self, game):
        self.game = game

    def play_game(self, solver_X, solver_O):
        
        state = self.game.initial_state()
        player = "X"
        solvers = {"X": solver_X, "O": solver_O}

        while not self.game.is_terminal(state):
            move = solvers[player].select_move(self.game, state, player)
            state = self.game.make_move(state, move, player)
            opp = self.game.get_opponent(player)
            solvers[opp].opponent_played(move)
            player = opp

        return self.game.evaluate(state)

Example

game = TicTacToe()

runner = GameRunner(game, verbose=True)

a = RandomSolver(123)
b = RandomSolver(456)

outcome = runner.play_game(a, b)

Example

-1 

evaluate_solvers

Code
def evaluate_solvers(game, solver_X, solver_O, num_games, verbose=False):

    """
    Evaluate two solvers head-to-head on a given game.

    Parameters
    ----------
    game      : Game
        An instance of a Game (e.g., TicTacToe).
    solver_X  : Solver
        Solver controlling player "X" (the maximizing player).
    solver_O  : Solver
        Solver controlling player "O" (the minimizing player).
    num_games : int
        Number of games to play with these fixed roles.

    Notes
    -----
    - The same solver instances are reused across games.
      This allows *persistent* solvers (e.g., MCTS) to accumulate
      experience across games.
    - Outcomes are interpreted from X's perspective:
        +1 -> X wins
        -1 -> O wins
         0 -> draw
    """

    runner = GameRunner(game)

    # Aggregate statistics over all games
    results = {
        "X_wins": 0,
        "O_wins": 0,
        "draws": 0,
    }

    for i in range(num_games):
        # Play one game with solver_X as "X" and solver_O as "O"
        outcome = runner.play_game(solver_X, solver_O)

        # Update counters based on outcome (+1, -1, or 0)
        if outcome == 1:
            results["X_wins"] += 1
            if verbose:
                print(f"Game {i + 1}: X wins") 
        elif outcome == -1:
            results["O_wins"] += 1
            if verbose:
                print(f"Game {i + 1}: O wins") 
        else:
            results["draws"] += 1
            if verbose:
                print(f"Game {i + 1}: Draw")

    # Print final summary
    if verbose:
        print(f"\nAfter {num_games} games:")
        print(f"  X ({solver_X.get_name()}) wins: {results['X_wins']}")
        print(f"  O ({solver_O.get_name()}) wins: {results['O_wins']}")
        print(f"  Draws: {results['draws']}")

    return results

Can You Guess the Outcome?

game = TicTacToe()

a = RandomSolver(7)
b = RandomSolver(42)

results = evaluate_solvers(game, a, b, num_games=1000)

results

Can You Guess the Outcome?

{'X_wins': 581, 'O_wins': 290, 'draws': 129}

Introduction

Types of Games

  • Deterministic or stochastic
  • One, two, or more players
  • Zero-sum or not
  • Perfect information or not

Definition

Zero-sum games are competitive scenarios where one player’s gain is exactly balanced by another player’s loss, resulting in a net change of zero in total wealth or benefit.

Deterministic Games

  • States: \(S\) (\(S_0\) to \(S_k\))
  • Players: \(P = {1, N}\)
  • Actions: \(A\) (depends on \(P\) and \(S\))
  • Transition function: \(S \times A \rightarrow S\)
  • A final state: \(S_\mathrm{final}\)
  • Reward or utility: \(S_\mathrm{final}, p\)

Develop a policy \(S_0 \rightarrow S_\mathrm{final}\).

What do you think?

  • Consider playing tic-tac-toe.
  • Can you ensure a never-lose strategy, irrespective of your opponent’s moves?

count_valid_sequences

def count_valid_sequences(game, state, player):

    if game.is_terminal(state):
      return 1

    valid_moves = game.get_valid_moves(state)

    total = 0
    for move in valid_moves:
        new_state = game.make_move(state, move, player)
        total += count_valid_sequences(game, new_state, game.get_opponent(player))

    return total
Code
game = TicTacToe()
state = game.initial_state()
player = "X"
total = count_valid_sequences(game, state, player)
print(f"The total number of valid sequences is: {total:,}")
The total number of valid sequences is: 255,168

Symmetry (Digression)

Tic-tac-toe has 8 symmetrical transformations (4 rotations and 4 reflections).

By considering these, many game sequences that are different in raw move order become equivalent.

The number of unique sequences of moves is 26,830, whereas the number of unique board positions is 765.

Search Tree

The search tree size for the tic-tac-toe game is relatively small, making it suitable for use as a running example in later discussions.

How does this compare to the search trees for chess and Go?

Search Tree

  • Chess: \(35^{80} \sim 10^{123}\)

  • Go: \(361! \sim 10^{768}\)

Definition

Optimal play involves executing the best possible move at each step to maximize winning chances, assuming the opponent also plays optimally.

In perfect information games like tic-tac-toe or chess, it requires anticipating the opponent’s moves and choosing actions that enhance one’s position or minimize losses.

Two-Move Game

Game Setup

  • The game starts with a single decision point for Player 1, who has two possible moves: \(A\) and \(B\).
  • Each of these moves leads to a decision point for Player 2, who also has two possible responses: \(C\) and \(D\).
  • The game ends after Player 2’s move, resulting in a terminal state with predefined scores.

Search Tree

  • Root Node: Represents the initial state before Player 1’s move.
  • Ply 1: Player 1 chooses between moves \(A\) and \(B\).
  • Ply 2: For each of Player 1’s moves, Player 2 chooses between moves \(C\) and \(D\).
  • Leaf Nodes: Each branch’s endpoint is a terminal state with an associated score.

Scores

  • \((A, C)\) results in a score of 3.
  • \((A, D)\) results in a score of 5.
  • \((B, C)\) results in a score of 2.
  • \((B, D)\) results in a score of 1.

Strategy

What should be player 2’s strategy and why?

Strategy

  • For move \(A\):

    • Player 2 can choose \(C\) (score = 3) or \(D\) (score = 5); they choose \(C\) (minimizing to 3).
  • For move \(B\):

    • Player 2 can choose \(C\) (score = 2) or \(D\) (score = 1); they choose \(D\) (minimizing to 1).

Strategy

What should now be the strategy for Player 1?

Strategy

Player 1, being the maximizer, will choose move \(A\), as it leads to the higher score of 3 after Player 2 minimizes.

Minimax

Minimax

  • Player 1 is the maximizing player, seeking the highest score.

  • Player 2 is the minimizing player, seeking the lowest score.

Evaluation:

  • Player 2 evaluates the potential outcomes for each of their moves and chooses the least favorable outcome for Player 1.

  • Player 1 then evaluates these outcomes, choosing the move that maximizes their minimum guaranteed score.

Minimax Search

The minimax algorithm operates by exploring all possible moves in a game tree, evaluating the outcomes to minimize the possible loss for a worst-case scenario. At each node:

  • Maximizing Player’s Turn: Choose the move with the highest possible value.
  • Minimizing Player’s Turn: Choose the move with the lowest possible value.

By backtracking from the terminal nodes to the root, the algorithm selects the move that maximizes the player’s minimum gain, effectively anticipating and countering the opponent’s best strategies.

Minimax Search

Walkthrough (first 4 minutes)

MinimaxSolverV1

Code
class MinimaxSolverV1(Solver):

    """
    A classic, exact Minimax solver for Tic-Tac-Toe.

    Key properties:
    - Assumes that "X" is the maximizing player.
    - Performs a full game-tree search (Tic-Tac-Toe is small enough).
    """

    # ----------------------------------------------------------------------
    # PUBLIC API — Solver interface
    # ----------------------------------------------------------------------

    def select_move(self, game, state, player):

        """
        Return the optimal move for `player` using the full minimax search.

        Notes
        -----
        - Because Tic-Tac-Toe is small, a full search is instantaneous.
        - X is always maximizing; O is always minimizing.
        """

        maximizing = (player == "X")
        _, move = self._minimax(game, state, player, maximizing)
        return move

    # ----------------------------------------------------------------------
    # CORE MINIMAX
    # ----------------------------------------------------------------------

    def _minimax(self, game, state, player, maximizing):

        """
        The core recursive minimax procedure.

        Parameters
        ----------
        game       : the TicTacToe instance
        state      : NumPy array representing the current board
        player     : the player to move ("X" or "O")
        maximizing : True if this node corresponds to the maximizing player
                     (i.e., X must play optimally), False for minimizing (O)

        Returns
        -------
        (value, move)
            value : +1, 0, or -1 from the perspective of X
            move  : the best move found at this node
        """

        # ------------------------------------------------------------
        # TERMINAL TEST
        # ------------------------------------------------------------

        if game.is_terminal(state):
            # game.evaluate() returns 1 for X win, -1 for O win, 0 otherwise
            return game.evaluate(state), None

        # ------------------------------------------------------------
        # GENERATE ALL LEGAL MOVES
        # ------------------------------------------------------------

        moves = game.get_valid_moves(state)

        # ------------------------------------------------------------
        # MAX NODE: X to play
        # ------------------------------------------------------------

        if maximizing:
            best_value = -math.inf
            best_move = None

            for move in moves:
                next_state = game.make_move(state, move, player)
                # After X plays, it's O's turn (minimizing)
                value, _ = self._minimax(
                    game,
                    next_state,
                    game.get_opponent(player),
                    maximizing=False
                )
                if value > best_value:
                    best_value = value
                    best_move = move

            return best_value, best_move

        # ------------------------------------------------------------
        # MIN NODE: O to play
        # ------------------------------------------------------------

        else:
            best_value = math.inf
            best_move = None

            for move in moves:
                next_state = game.make_move(state, move, player)
                # After O plays, it's X's turn (maximizing)
                value, _ = self._minimax(
                    game,
                    next_state,
                    game.get_opponent(player),
                    maximizing=True
                )
                if value < best_value:
                    best_value = value
                    best_move = move

            return best_value, best_move

MinimaxSolverV1

class MinimaxSolverV1(Solver):

    def select_move(self, game, state, player):

        maximizing = (player == "X")

        _, move = self._minimax(game, state, player, maximizing)

        return move

_minimax

   def _minimax(self, game, state, player, maximizing):

        if game.is_terminal(state):
            return game.evaluate(state), None

        moves = game.get_valid_moves(state)

_minimax: if maximizing

        if maximizing:
            best_value = -math.inf
            best_move = None

            for move in moves:
                next_state = game.make_move(state, move, player)
                value, _ = self._minimax(
                    game,
                    next_state,
                    game.get_opponent(player),
                    maximizing=False
                )
                if value > best_value:
                    best_value = value
                    best_move = move

            return best_value, best_move

_minimax: if minimizing

        else:
            best_value = math.inf
            best_move = None

            for move in moves:
                next_state = game.make_move(state, move, player)
                value, _ = self._minimax(
                    game,
                    next_state,
                    game.get_opponent(player),
                    maximizing=True
                )
                if value < best_value:
                    best_value = value
                    best_move = move

            return best_value, best_move

Execution (1/2)

Code
import time

# Record the start time
start_time = time.perf_counter()
a = RandomSolver(13)
b = MinimaxSolverV1()

results = evaluate_solvers(game, a, b, num_games=1)
results
{'X_wins': 0, 'O_wins': 0, 'draws': 1}
Code
# Record the end time
end_time = time.perf_counter()

# Calculate the elapsed time
elapsed_time = end_time - start_time

# Print the elapsed time in seconds
print(f"Elapsed time: {elapsed_time:.6f} seconds!")

Execution (1/2)

Elapsed time: 2.232875 seconds!

Discussion (Digression)

  • Is test_tic_tac_toe faster or slower than expected?

  • Do you see an area for improvement?

Discussion (Digression)

Memoization (Caching)

from functools import lru_cache

def canonical(state):

    """
    Convert a NumPy array board into a hashable, immutable representation
    (tuple of tuples). This allows us to use it as a key in dicts or
    as an argument to lru_cache. MCTS can also reuse this representation.
    """
    
    return tuple(map(tuple, state))

MinimaxSolver

class MinimaxSolver(Solver):

    """
    A classic, exact Minimax solver for Tic-Tac-Toe.

    - Assumes "X" is the maximizing player.
    - Uses memoization (lru_cache) to avoid recomputing values for
      identical positions.
    """

    def select_move(self, game, state, player):
        
        """
        Public interface: choose the best move for 'player' using Minimax.
        For Tic-Tac-Toe we can safely search the full game tree.
        """

        # Store game on self so _minimax can use it
        self.game = game

        # From X's perspective: X is maximizing, O is minimizing
        maximizing = (player == "X")

        # For Tic-Tac-Toe, depth=9 is enough to cover all remaining moves.
        _, move = self._minimax(canonical(state), player, maximizing, 9)
        return move

    @lru_cache(maxsize=None)
    def _minimax(self, state_key, player, maximizing, depth):

        """
        Internal recursive minimax.

        Parameters
        ----------
        state_key : hashable representation of the board (tuple of tuples)
        player    : player to move at this node ("X" or "O")
        maximizing: True if this node is a 'max' node (X to move),
                    False if this is a 'min' node (O to move)
        depth     : remaining search depth (not used for cutoffs in this
                    full-search Tic-Tac-Toe implementation, but kept for
                    didactic purposes and easy extension).
        """

        # Recover the NumPy board from the canonical state_key
        state = np.array(state_key)

        # Terminal test: win, loss, or draw
        if self.game.is_terminal(state):
            # Evaluation is always from X's perspective: +1, -1, or 0
            return self.game.evaluate(state), None

        moves = self.game.get_valid_moves(state)
        best_move = None

        if maximizing:
            # X to move: maximize the evaluation
            best_val = -math.inf
            for move in moves:
                st2 = self.game.make_move(state, move, player)
                val, _ = self._minimax(
                    canonical(st2),
                    self.game.get_opponent(player),
                    False,
                    depth - 1
                )
                if val > best_val:
                    best_val = val
                    best_move = move
            return best_val, best_move

        else:
            # O to move: minimize the evaluation (since evaluation is for X)
            best_val = math.inf
            for move in moves:
                st2 = self.game.make_move(state, move, player)
                val, _ = self._minimax(
                    canonical(st2),
                    self.game.get_opponent(player),
                    True,
                    depth - 1
                )
                if val < best_val:
                    best_val = val
                    best_move = move
            return best_val, best_move

Execution (2/2)

Code
import time

# Record the start time
start_time = time.perf_counter()
a = RandomSolver(13)
b = MinimaxSolver()

results = evaluate_solvers(game, a, b, num_games=1)
results
{'X_wins': 0, 'O_wins': 0, 'draws': 1}
Code
# Record the end time
end_time = time.perf_counter()

# Calculate the elapsed time
elapsed_time = end_time - start_time

# Print the elapsed time in seconds
print(f"Elapsed time: {elapsed_time:.6f} seconds!")

Execution (2/2)

Elapsed time: 0.109785 seconds!

Yet Another Digression

    def get_valid_moves(self, state):

        moves = [
            (i, j)
            for i in range(self.size)
            for j in range(self.size)
            if state[i, j] == " "
        ]

        return random.shuffle(moves)

Exploration

  • Compare the reduction in execution time achieved through symmetry considerations versus caching techniques. Evaluate the combined effect of both approaches.

  • Develop a Connect Four game implementation employing a minimax search algorithm.

  • Connect Four is symmetric across its vertical axis. Develop a new implementation that leverages this symmetry.

Can You Guess the Outcome?

game = TicTacToe()

a = MinimaxSolver()
b = RandomSolver(7)

results = evaluate_solvers(game, a, b, num_games=1000)

results

Can You Guess the Outcome?

{'X_wins': 998, 'O_wins': 0, 'draws': 2}

Can You Guess the Outcome?

game = TicTacToe()

a = RandomSolver(7)
b = MinimaxSolver()

results = evaluate_solvers(game, a, b, num_games=1000)

results

Can You Guess the Outcome?

{'X_wins': 0, 'O_wins': 793, 'draws': 207}

Can You Guess the Outcome?

game = TicTacToe()

a = MinimaxSolver()
b = MinimaxSolver()

results = evaluate_solvers(game, a, b, num_games=1000)

results

Can You Guess the Outcome?

{'X_wins': 0, 'O_wins': 0, 'draws': 1000}

Summary

What have we obtained?

We derived a policy (a decision algorithm) that recommends actions leading to the game’s optimal solution, under the assumption that the opponent adopts the same policy.

Remark

The number of valid sequences of actions grows factorially, with particularly large growth observed in games like chess and Go.

Alpha-beta pruning

Pruning

To enhance the efficiency of the minimax algorithm, one could possibly prune certain parts of the search tree, thereby avoiding the exploration of descendant nodes.

Pruning

How would you implement this modification? What factors would you take into account?

Pruning

Tree pruning should be performed only when it can be demonstrated that those subtrees cannot yield better solutions.

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Criteria for Pruning

Alph-Beta Pruning

Alpha-beta pruning is an optimization technique for the minimax algorithm that reduces the number of nodes evaluated in the search tree.

Alph-Beta Pruning

It achieves this by eliminating branches that cannot possibly influence the final decision, using two parameters:

  • alpha, the maximum score that the maximizing player is assured, and

  • beta, the minimum score that the minimizing player is assured.

Maximizing Player’s Perspective

At a maximizing node:

  • The maximizer aims to maximize the score.

  • Alpha (\(\alpha\)) is updated to the highest value found so far among child nodes.

  • Process:

    • Initialize \(\alpha = -\infty\).

    • For each child node:

      • Compute the evaluation score.

      • Update \(\alpha = \max(\alpha, \mathrm{child\_score})\).

Minimizing Player’s Perspective

At a minimizing node:

  • The minimizer aims to minimize the score.

  • Beta (\(\beta\)) is updated to the lowest value found so far among child nodes.

  • Process:

    • Initialize \(\beta = \infty\).

    • For each child node:

      • Compute the evaluation score.

      • Update \(\beta = \min(\beta, \mathrm{child\_score})\).

Alph-Beta Pruning

When a node’s evaluation proves it cannot improve on the current alpha or beta, further exploration of that branch is halted, thereby enhancing computational efficiency without affecting the outcome.

Role of Alpha and Beta in Pruning

Pruning Condition:

  • If \(\beta \leq \alpha\), further exploration of the current node’s siblings is unnecessary.

  • Rationale:

    • The maximizer has a guaranteed score of at least \(\alpha\).

    • The minimizer can ensure that the maximizer cannot get a better score than \(\beta\).

    • If \(\beta \leq \alpha\), the maximizer won’t find a better option in this branch.

Walkthrough (6:21 to 8:10)

Node Order

  • The effectiveness of pruning is influenced by the order in which nodes are evaluated.

  • Greater pruning is achieved if nodes are ordered from most to least promising.

MinimaxAlphaBetaSolverV1

Code
class MinimaxAlphaBetaSolverV1(Solver):

    """
    A classical Minimax solver enhanced with Alpha–Beta pruning.

    - Assumes "X" is the maximizing player.
    - Performs a full search of the Tic–Tac–Toe game tree.
    """

    # ------------------------------------------------------------
    # Solver interface
    # ------------------------------------------------------------

    def select_move(self, game, state, player):

        """
        Choose the best move for `player` using Minimax with
        Alpha–Beta pruning.

        For Tic–Tac–Toe, depth=9 suffices to search the entire game.
        """

        self.game = game
        maximizing = (player == "X")

        value, move = self._alphabeta(
            state=state,
            player=player,
            maximizing=maximizing,
            depth=9,
            alpha=-math.inf,
            beta=math.inf
        )

        return move

    # ------------------------------------------------------------
    # CORE MINIMAX WITH ALPHA-BETA PRUNING
    # ------------------------------------------------------------

    def _alphabeta(self, state, player, maximizing, depth, alpha, beta):

        """
        Internal recursive minimax search with alpha–beta pruning.

        Parameters
        ----------
        state      : NumPy array, current board
        player     : "X" or "O", the player to move
        maximizing : True if this is a maximizing node (X to move)
        depth      : remaining search depth
        alpha      : best value found so far for the maximizer
        beta       : best value found so far for the minimizer

        Returns
        -------
        (value, move)
            value : evaluation of the state from X's perspective (+1/-1/0)
            move  : the best move found at this node
        """

        # Terminal test: win/loss/draw
        if self.game.is_terminal(state) or depth == 0:
            return self.game.evaluate(state), None

        moves = self.game.get_valid_moves(state)
        best_move = None

        # ------------------------------------------------------------
        # Maximizing node (X)
        # ------------------------------------------------------------

        if maximizing:
            value = -math.inf

            for move in moves:
                next_state = self.game.make_move(state, move, player)

                child_val, _ = self._alphabeta(
                    next_state,
                    self.game.get_opponent(player),
                    False,            # next is minimizing
                    depth - 1,
                    alpha,
                    beta
                )

                if child_val > value:
                    value = child_val
                    best_move = move

                # Update alpha
                alpha = max(alpha, value)

                # Prune
                if beta <= alpha:
                    break

            return value, best_move

        # ------------------------------------------------------------
        # Minimizing node (O)
        # ------------------------------------------------------------

        else:
            value = math.inf

            for move in moves:
                next_state = self.game.make_move(state, move, player)

                child_val, _ = self._alphabeta(
                    next_state,
                    self.game.get_opponent(player),
                    True,             # next is maximizing
                    depth - 1,
                    alpha,
                    beta
                )

                if child_val < value:
                    value = child_val
                    best_move = move

                # Update beta
                beta = min(beta, value)

                # Prune
                if beta <= alpha:
                    break

            return value, best_move

MinimaxAlphaBetaSolverV1

class MinimaxAlphaBetaSolverV1(Solver):

    def select_move(self, game, state, player):

        self.game = game
        maximizing = (player == "X")

        value, move = self._alphabeta(
            state=state,
            player=player,
            maximizing=maximizing,
            depth=9,
            alpha=-math.inf,
            beta=math.inf
        )

        return move

_alphabeta

    def _alphabeta(self, state, player, maximizing, depth, alpha, beta):

        if self.game.is_terminal(state) or depth == 0:
            return self.game.evaluate(state), None

        moves = self.game.get_valid_moves(state)
        best_move = None

_alphabeta: if maximizing

        if maximizing:
            value = -math.inf

            for move in moves:
                next_state = self.game.make_move(state, move, player)

                child_val, _ = self._alphabeta(
                    next_state,
                    self.game.get_opponent(player),
                    False,            # next is minimizing
                    depth - 1,
                    alpha,
                    beta
                )

                if child_val > value:
                    value = child_val
                    best_move = move

                # Update alpha
                alpha = max(alpha, value)

                # Prune
                if beta <= alpha:
                    break

            return value, best_move

_alphabeta: if minimizing

        else:
            value = math.inf

            for move in moves:
                next_state = self.game.make_move(state, move, player)

                child_val, _ = self._alphabeta(
                    next_state,
                    self.game.get_opponent(player),
                    True,             # next is maximizing
                    depth - 1,
                    alpha,
                    beta
                )

                if child_val < value:
                    value = child_val
                    best_move = move

                # Update beta
                beta = min(beta, value)

                # Prune
                if beta <= alpha:
                    break

            return value, best_move

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Walkthrough

Sanity Check

game = TicTacToe()

a = MinimaxAlphaBetaSolverV1()
b = RandomSolver(7)

results = evaluate_solvers(game, a, b, num_games=100)

results
{'X_wins': 100, 'O_wins': 0, 'draws': 0}

Sanity Check

game = TicTacToe()

a = RandomSolver(7)
b = MinimaxAlphaBetaSolverV1()

results = evaluate_solvers(game, a, b, num_games=100)

results
{'X_wins': 0, 'O_wins': 82, 'draws': 18}

Sanity Check

game = TicTacToe()

a = MinimaxAlphaBetaSolverV1()
b = MinimaxAlphaBetaSolverV1()

results = evaluate_solvers(game, a, b, num_games=100)

results
{'X_wins': 0, 'O_wins': 0, 'draws': 100}

Summmary

  • Alpha Cutoff: Occurs at minimizer nodes when \(\beta \le \alpha\).
  • Beta Cutoff: Occurs at maximizer nodes when \(\alpha \ge \beta\).

Discussion

  • Grasping why alpha-beta pruning boosts minimax efficiency without altering outcomes requires careful thought.

  • The algorithm changes are minimal.

  • Is the enhancement justified?

MinimaxSolverV2 (INSTRUMENTED)

Code
class MinimaxSolverV2(Solver):

    """
    A classic, exact Minimax solver for Tic-Tac-Toe — *instrumented version*.

    Differences from V1
    --------------------
    - Adds a counter `self.nodes_visited` that counts each recursive call.
    - Uses `reset()` so the GameRunner or TournamentRunner can correctly
      prepare the solver before each game.
    - Still performs a full game-tree search without memoization.

    Characteristics
    ---------------
    - Assumes "X" is the maximizing player.
    - Performs a complete minimax search (Tic-Tac-Toe is small enough).
    """

    def __init__(self):

        # Count how many nodes have been visited in the current run
        self.nodes_visited = 0

    # ------------------------------------------------------------
    # Solver interface
    # ------------------------------------------------------------
    
    def select_move(self, game, state, player):

        """
        Choose the optimal move for `player` using a full minimax search.
        """

        maximizing = (player == "X")
        value, move = self._minimax(game, state, player, maximizing)
        return move
        
    def reset(self):

        """
        Reset instrumentation counters at the start of a game (or tournament run).

        GameRunner / TournamentRunner should call solver.reset() before
        starting a new game.
        """

        self.nodes_visited = 0

    def get_name(self):

        """
        Name is inherited from Solver but solvers may override get_name()
        to display additional instrumentation info.
        """

        return f"{self.__class__.__name__} (nodes={self.nodes_visited})"

    # ------------------------------------------------------------
    # Core recursive minimax
    # ------------------------------------------------------------

    def _minimax(self, game, state, player, maximizing):

        """
        The core recursive minimax computation.

        Parameters
        ----------
        game       : TicTacToe instance
        state      : NumPy array of current board position
        player     : "X" or "O" — the player whose turn it is at this node
        maximizing : True if this node corresponds to X; False if O

        Returns
        -------
        (value, move)
            value : +1 for X win, -1 for O win, 0 otherwise
            move  : best move selected at this node
        """

        # Instrumentation
        self.nodes_visited += 1

        # ------------------------
        # TERMINAL NODE?
        # ------------------------
        if game.is_terminal(state):
            return game.evaluate(state), None

        # ------------------------
        # GENERATE ALL MOVES
        # ------------------------
        moves = game.get_valid_moves(state)

        # ------------------------
        # MAX NODE (X plays)
        # ------------------------
        if maximizing:
            best_value = -math.inf
            best_move = None

            for move in moves:
                next_state = game.make_move(state, move, player)
                val, _ = self._minimax(
                    game,
                    next_state,
                    game.get_opponent(player),
                    maximizing=False
                )
                if val > best_value:
                    best_value = val
                    best_move = move

            return best_value, best_move

        # ------------------------
        # MIN NODE (O plays)
        # ------------------------

        else:
            best_value = math.inf
            best_move = None

            for move in moves:
                next_state = game.make_move(state, move, player)
                val, _ = self._minimax(
                    game,
                    next_state,
                    game.get_opponent(player),
                    maximizing=True
                )
                if val < best_value:
                    best_value = val
                    best_move = move

            return best_value, best_move

MinimaxAlphaBetaSolverV2 (INSTRUMENTED)

Code
class MinimaxAlphaBetaSolverV2(Solver):

    """
    A classical Minimax solver enhanced with Alpha–Beta pruning,
    instrumented to count how many nodes are visited.

    - Assumes "X" is the maximizing player.
    - Performs a full search of the Tic–Tac–Toe game tree.
    - Alpha–Beta pruning reduces the number of explored states
      without changing the final result.

    Instrumentation
    ---------------
    - self.nodes_visited counts how many times _alphabeta() is called.
    """

    def __init__(self):

        # Count how many nodes have been visited in the current run
        self.nodes_visited = 0
 
    # ------------------------------------------------------------
    # Solver interface
    # ------------------------------------------------------------

    def select_move(self, game, state, player):

        """
        Choose the best move for `player` using Minimax with
        Alpha–Beta pruning.

        For Tic–Tac–Toe, depth=9 suffices to search the entire game.
        """

        self.game = game
        maximizing = (player == "X")

        value, move = self._alphabeta(
            state=state,
            player=player,
            maximizing=maximizing,
            depth=9,
            alpha=-math.inf,
            beta=math.inf
        )

        return move

    def reset(self):

        """
        Reset any per-game state.

        Called by GameRunner (or similar) at the start of a new game.
        """

        self.nodes_visited = 0

    # ------------------------------------------------------------
    # Private
    # ------------------------------------------------------------

    def _alphabeta(self, state, player, maximizing, depth, alpha, beta):

        """
        Internal recursive minimax search with alpha–beta pruning.

        Parameters
        ----------
        state      : NumPy array, current board
        player     : "X" or "O", the player to move
        maximizing : True if this is a maximizing node (X to move)
        depth      : remaining search depth
        alpha      : best value found so far for the maximizer
        beta       : best value found so far for the minimizer

        Returns
        -------
        (value, move)
            value : evaluation of the state from X's perspective (+1/-1/0)
            move  : the best move found at this node
        """

        # Instrumentation: count this node
        self.nodes_visited += 1

        # Terminal test: win/loss/draw or depth cutoff
        if self.game.is_terminal(state) or depth == 0:
            return self.game.evaluate(state), None

        moves = self.game.get_valid_moves(state)
        best_move = None

        # ------------------------------------------------------------
        # Maximizing node (X)
        # ------------------------------------------------------------
        if maximizing:
            value = -math.inf

            for move in moves:
                next_state = self.game.make_move(state, move, player)

                child_val, _ = self._alphabeta(
                    next_state,
                    self.game.get_opponent(player),
                    False,            # next is minimizing
                    depth - 1,
                    alpha,
                    beta
                )

                if child_val > value:
                    value = child_val
                    best_move = move

                # Update alpha
                alpha = max(alpha, value)

                # Prune
                if beta <= alpha:
                    break

            return value, best_move

        # ------------------------------------------------------------
        # Minimizing node (O)
        # ------------------------------------------------------------

        else:
            value = math.inf

            for move in moves:
                next_state = self.game.make_move(state, move, player)

                child_val, _ = self._alphabeta(
                    next_state,
                    self.game.get_opponent(player),
                    True,             # next is maximizing
                    depth - 1,
                    alpha,
                    beta
                )

                if child_val < value:
                    value = child_val
                    best_move = move

                # Update beta
                beta = min(beta, value)

                # Prune
                if beta <= alpha:
                    break

            return value, best_move

Can You Guess the Outcome?

runner = GameRunner(game)

mm = MinimaxSolverV2()
ab = MinimaxAlphaBetaSolverV2()

outcome = runner.play_game(mm, ab)

print("Nodes visited for MinimaxSolverV2:", mm.nodes_visited)
print("Nodes visited for MinimaxAlphaBetaSolverV2:", ab.nodes_visited)
print(f"Pruning efficiency: {(mm.nodes_visited - ab.nodes_visited) / mm.nodes_visited:.3f}")

Can You Guess the Outcome?

Nodes visited for MinimaxSolverV2: 557492
Nodes visited for MinimaxAlphaBetaSolverV2: 2435
Pruning efficiency: 0.996

Memoization

Code
class AlphaBetaSolver(Solver):
    
    """
    A classical Minimax solver enhanced with Alpha-Beta pruning.

    - Assumes "X" is the maximizing player.
    - Uses memoization (lru_cache) to avoid recomputing states.
    - Performs a *full* search of Tic-Tac-Toe (depth=9).
    - Returns the optimal move for the current player.
    """

    # ------------------------------------------------------------
    # Solver interface
    # ------------------------------------------------------------

    def select_move(self, game, state, player):

        """
        Public interface required by Solver.
        Runs Alpha-Beta search from the current state.
        """

        self.game = game

        maximizing = (player == "X")   # X maximizes, O minimizes

        # Reset cache between games to avoid storing millions of keys
        self._alphabeta.cache_clear()

        value, move = self._alphabeta(
            canonical(state),
            player,
            maximizing,
            9,               # full-depth search
            -math.inf,       # alpha
            math.inf         # beta
        )
        return move

    # ------------------------------------------------------------
    # Internal alpha-beta with memoization
    # ------------------------------------------------------------

    @lru_cache(maxsize=None)
    def _alphabeta(self, state_key, player, maximizing, depth, alpha, beta):

        """
        Parameters
        ----------
        state_key : tuple-of-tuples board
        player    : player whose turn it is ('X' or 'O')
        maximizing: True if this node is a maximizing node for X
        depth     : remaining depth
        alpha     : best guaranteed value for maximizer so far
        beta      : best guaranteed value for minimizer so far
        """

        state = np.array(state_key)

        # Terminal or horizon case
        if self.game.is_terminal(state) or depth == 0:
            return self.game.evaluate(state), None

        moves = self.game.get_valid_moves(state)
        best_move = None

        # --------------------------------------------------------
        # MAX (X)
        # --------------------------------------------------------

        if maximizing:
            value = -math.inf

            for move in moves:
                st2 = self.game.make_move(state, move, player)

                child_val, _ = self._alphabeta(
                    canonical(st2),
                    self.game.get_opponent(player),
                    False,              # now minimizing
                    depth - 1,
                    alpha,
                    beta
                )

                if child_val > value:
                    value = child_val
                    best_move = move

                alpha = max(alpha, value)
                if beta <= alpha:
                    break  # β-cutoff

            return value, best_move

        # --------------------------------------------------------
        # MIN (O)
        # --------------------------------------------------------

        else:
            value = math.inf

            for move in moves:
                st2 = self.game.make_move(state, move, player)

                child_val, _ = self._alphabeta(
                    canonical(st2),
                    self.game.get_opponent(player),
                    True,               # now maximizing
                    depth - 1,
                    alpha,
                    beta
                )

                if child_val < value:
                    value = child_val
                    best_move = move

                beta = min(beta, value)
                if beta <= alpha:
                    break  # α-cutoff

            return value, best_move

Exploration

Implement a Connect Four game using the Alpha-Beta Search algorithm. Conduct a comparative analysis between the Minimax and Alpha-Beta Search implementations.

Prologue

Further exploration

  • Expetimax search: handling players that are not perfect;
  • Expectiminimax: handling chance in games such as backgammon.

Summary

  • Introduction to adversarial search
  • Zero-sum games
  • Introduction to the minimax search method
  • Role of alpha and beta pruning in minimax search

Next lecture

  • We will look at the Monte Carlo Tree Search (MCTS) algorithm

References

Russell, Stuart, and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach. 4th ed. Pearson. http://aima.cs.berkeley.edu/.
Shannon, Claude E. 1959. “Programming a Computer Playing Chess.” Philosophical Magazine Ser.7, 41 (312).

Marcel Turcotte

Marcel.Turcotte@uOttawa.ca

School of Electrical Engineering and Computer Science (EECS)

University of Ottawa