Introduction to machine learning

CSI 4106 - Fall 2024

Marcel Turcotte

Version: Sep 20, 2024 09:26

Preamble

Quote of the day

Quote of the day (continued)

Yoshua Bengio

Université de Montréal

Francois Chollet

Software engineer, Google

Sasha Luccioni

AI & Climate Lead, Hugging Face (Montréal)

Quote of the day (continued)

Ilya Sutskever

Co-founder, Safe Superintelligence

Andrej Karpathy

Founder, Eureka Labs

Remark

In the evolution of intelligence, learning was one of the first milestones to emerge. It is also one of the most thoroughly understood mechanisms in natural intelligence.

Fundamentals of machine learning

In this lecture, we will introduce concepts essential for understanding machine learning, including the types of problems (tasks).

General objective:

  • Describe the fundamental concepts of machine learning

Learning objectives

  • Summarize the various types and tasks in machine learning
  • Discuss the need for a training and test set

Readings

  • Russell and Norvig (2020), Chapter 19: Learning from examples.

Introduction

Rationale

Why a computer program should learn?

Definition

Mitchell (1997), page 2

A computer program is said to learn from experience \(E\) with respect to some class of tasks \(T\) and performance measure \(P\), if its performance at tasks in \(T\), as measured by \(P\), improves with experience \(E\).

Concepts

See: images/svg/ml_concepts-00.svg

Types of problems

There are three (3) distinct types of feedback:

  1. Unsupervised Learning: No feedback is provided to the algorithm.
  2. Supervised Learning: Each example is accompanied by a label.
  3. Reinforcement Learning: The algorithm receives a reward or a punishment following each action.

Supervised learning is the most extensively studied and arguably the most intuitive type of learning. It is typically the first type of learning introduced in educational contexts.

Two phases

  1. Learning (building a model)
  2. Inference (using the model)

Learning (building a model)

Inference (using a model)

Carp-e Diem! (example)

1. Problem: Will They Bite Today?

Objective: Develop a predictive model to classify the likelihood of a successful fishing day into three categories: ‘Poor’, ‘Average’, or ‘Excellent’.

2. Attributes (features)

Various sources, including The Old Farmer’s Almanac, suggest that the moon phase serves as a reliable predictor of fishing success.

  • Moon Phase (Categorical): ‘New Moon’, ‘First Quarter’, ‘Full Moon’, and ‘Last Quarter’.
  • Forecast (Categorical): ‘Rainy’, ‘Cloudy’, and ‘Sunny’.
  • Outdoor Temperature (Numerical): The temperature in Celcius.
  • Water Temperature (Numerical): The water temperature of the lake or river.

3. Training data

Example Moon Phase Forecast Outdoor Temperature (°C) Water Temperature (°C) Fishing Day Likelihood
1 Full Moon Sunny 25 22 Excellent
2 New Moon Cloudy 18 19 Average
3 First Quarter Rainy 15 17 Poor
4 Last Quarter Sunny 30 24 Excellent
5 Full Moon Cloudy 20 20 Average
6 New Moon Rainy 22 21 Poor

3. Training data (continued)

Moon Phase Forecast Outdoor Temperature (°C) Water Temperature (°C)
Full Moon Sunny 25 22
New Moon Cloudy 18 19
First Quarter Rainy 15 17
Last Quarter Sunny 30 24
Full Moon Cloudy 20 20
New Moon Rainy 22 21
Fishing Day Likelihood
Excellent
Average
Poor
Excellent
Average
Poor

4. Model Training

Model training involves using labeled data to teach a machine learning algorithm how to make predictions. This process adjusts the model’s parameters to minimize the error between the predicted and actual outcomes.

4. Model Training (continued)

  • Excellent Fishing Day:
    • Moon Phase: Full Moon or New Moon
    • Forecast: Sunny
    • Outdoor Temperature: 20°C to 30°C
    • Water Temperature: 20°C to 25°C

\(\ldots\)

  • Poor Fishing Day:
    • Moon Phase: First Quarter or Last Quarter
    • Forecast: Rainy
    • Outdoor Temperature: < 20°C or > 30°C
    • Water Temperature: < 20°C or > 25°C

5. Prediction

Given new, unseen data, predict whether today will be successful.

  • Moon Phase: New Moon
  • Forecast: Sunny
  • Outdoor Temperature: 24°C
  • Water Temperature: 21°C

Life cycle

  1. Data collection and preparation
  2. Feature engineering
  3. Training
  4. Model evaluation
  5. Model deployment
  6. Monitoring and maintenance

Formal definitions

Supervised learning (notation)

The data set (“experience”) is a collection of labelled examples.

  • \(\{(x_i, y_i)\}_{i=1}^N\)
    • Each \(x_i\) is a feature (attribute) vector with \(D\) dimensions.
    • \(x^{(j)}_i\) is the value of the feature \(j\) of the example \(i\), for \(j \in 1 \ldots D\) and \(i \in 1 \ldots N\).
    • The label \(y_i\) is either a class, taken from a finite list of classes, \(\{1, 2, \ldots, C\}\), or a real number, or a complex object (tree, graph, etc.).

Problem: Given the data set as input, create a model that can be used to predict the value of \(y\) for an unseen \(x\).

Supervised learning (notation, contd)

  • When the label \(y_i\) is a class, taken from a finite list of classes, \(\{1, 2, \ldots, C\}\), we call the task a classification task.

  • When the label \(y_i\) is a real number, we call the task a regression task.

Example with code

Scikit-learn

Scikit-learn is an open source machine learning library that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.

Scikit-learn provides dozens of built-in machine learning algorithms and models, called estimators.

Built on NumPy, SciPy, and matplotlib.

Scikit-learn

Example: iris data set

Example: loading the data

from sklearn.datasets import load_iris

# Load the Iris dataset

iris = load_iris()

Example: Using a DecisionTree

from sklearn import tree

clf = tree.DecisionTreeClassifier()

Example: Training

# It is customary to use X and y for the data and labels

X, y = iris.data, iris.target

# Training

clf = clf.fit(X, y)

Example: Visualizing the tree (1/2)

import matplotlib.pyplot as plt

tree.plot_tree(clf)
plt.show()

Example: Visualizing the tree (2/2)

tree.plot_tree(clf, 
               feature_names=iris.feature_names, 
               class_names=iris.target_names,
               label='none',
               filled=True)
plt.show()

Example: Prediction

# Creatingg 2 test examples
# 'sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'

X_test = [[5.1, 3.5, 1.4, 0.2],[6.7, 3.0, 5.2, 2.3]]

# Prediction

y_test = clf.predict(X_test)

# Printing the predicted labels for our two examples

print(iris.target_names[y_test])
['setosa' 'virginica']

Example: Complete

iris = load_iris()
clf = tree.DecisionTreeClassifier()
X, y = iris.data, iris.target
clf = clf.fit(X, y)
tree.plot_tree(clf)
X_test = [[5.1, 3.5, 1.4, 0.2],[6.7, 3.0, 5.2, 2.3]]
y_test = clf.predict(X_test)
print(iris.target_names[y_test])
['setosa' 'virginica']

Example: Performance

from sklearn.metrics import classification_report, accuracy_score

# Make predictions

y_pred = clf.predict(X)

# Evaluate the model

accuracy = accuracy_score(y, y_pred)
report = classification_report(y, y_pred, target_names=iris.target_names)

print(f'Accuracy: {accuracy:.2f}')
print('Classification Report:')
print(report)

Example: Performance

Accuracy: 1.00
Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        50
  versicolor       1.00      1.00      1.00        50
   virginica       1.00      1.00      1.00        50

    accuracy                           1.00       150
   macro avg       1.00      1.00      1.00       150
weighted avg       1.00      1.00      1.00       150

Example: Discussion

We have demonstrated a complete example:

  • Loading the data
  • Selecting a classifier
  • Training the model
  • Visualizing the model
  • Making a prediction

Example: Take 2

from sklearn.metrics import classification_report, accuracy_score

# Make predictions

y_pred = clf.predict(X)

# Evaluate the model

accuracy = accuracy_score(y, y_pred)
report = classification_report(y, y_pred, target_names=iris.target_names)

print(f'Accuracy: {accuracy:.2f}')
print('Classification Report:')
print(report)

Important

This example is misleading, or even flawed!

Example: Exploration

print(f'Dataset Description:\n{iris["DESCR"]}\n')
Dataset Description:
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, predictive attributes and the class
:Attribute Information:
    - sepal length in cm
    - sepal width in cm
    - petal length in cm
    - petal width in cm
    - class:
            - Iris-Setosa
            - Iris-Versicolour
            - Iris-Virginica

:Summary Statistics:

============== ==== ==== ======= ===== ====================
                Min  Max   Mean    SD   Class Correlation
============== ==== ==== ======= ===== ====================
sepal length:   4.3  7.9   5.84   0.83    0.7826
sepal width:    2.0  4.4   3.05   0.43   -0.4194
petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)
============== ==== ==== ======= ===== ====================

:Missing Attribute Values: None
:Class Distribution: 33.3% for each of 3 classes.
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
:Date: July, 1988

The famous Iris database, first used by Sir R.A. Fisher. The dataset is taken
from Fisher's paper. Note that it's the same as in R, but not as in the UCI
Machine Learning Repository, which has two wrong data points.

This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

.. dropdown:: References

  - Fisher, R.A. "The use of multiple measurements in taxonomic problems"
    Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
    Mathematical Statistics" (John Wiley, NY, 1950).
  - Duda, R.O., & Hart, P.E. (1973) Pattern Classification and Scene Analysis.
    (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
  - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
    Structure and Classification Rule for Recognition in Partially Exposed
    Environments".  IEEE Transactions on Pattern Analysis and Machine
    Intelligence, Vol. PAMI-2, No. 1, 67-71.
  - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
    on Information Theory, May 1972, 431-433.
  - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
    conceptual clustering system finds 3 classes in the data.
  - Many, many more ...

Example: Exploration

print(f'Feature Names: {iris.feature_names}')
Feature Names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
print(f'Target Names: {iris.target_names}')
Target Names: ['setosa' 'versicolor' 'virginica']
print(f'Data Shape: {iris.data.shape}')
Data Shape: (150, 4)
print(f'Target Shape: {iris.target.shape}')
Target Shape: (150,)

Example: Using Pandas (continued)

import pandas as pd

# Create a DataFrame

df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target

Example: Using Pandas (continued)

# Display the first few rows of the DataFrame

print(df.head())
   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   species  
0        0  
1        0  
2        0  
3        0  
4        0  

Example: Using Pandas (continued)

# Summary statistics

print(df.describe())
       sepal length (cm)  sepal width (cm)  petal length (cm)  \
count         150.000000        150.000000         150.000000   
mean            5.843333          3.057333           3.758000   
std             0.828066          0.435866           1.765298   
min             4.300000          2.000000           1.000000   
25%             5.100000          2.800000           1.600000   
50%             5.800000          3.000000           4.350000   
75%             6.400000          3.300000           5.100000   
max             7.900000          4.400000           6.900000   

       petal width (cm)     species  
count        150.000000  150.000000  
mean           1.199333    1.000000  
std            0.762238    0.819232  
min            0.100000    0.000000  
25%            0.300000    0.000000  
50%            1.300000    1.000000  
75%            1.800000    2.000000  
max            2.500000    2.000000  

Example: Using Seaborn

import seaborn as sns

# Map target values to species names

df['species'] = df['species'].map({0: 'setosa', 1: 'versicolor', 2: 'virginica'})

# Pairplot using seaborn

sns.pairplot(df, hue='species', markers=["o", "s", "D"])
plt.suptitle("Pairwise Scatter Plots of Iris Features", y=1.02)
plt.show()

Example: Using Seaborn

Example: Training and test set

from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)

Example: Creating a new classifier

# Train the model
clf = tree.DecisionTreeClassifier()

Example: Training the new classifier

# Train the model
clf.fit(X_train, y_train)
DecisionTreeClassifier()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Example: Making predictions

# Make predictions
y_pred = clf.predict(X_test)

Example: measuring the performance

from sklearn.metrics import classification_report, accuracy_score
# Make predictions

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=iris.target_names)

print(f'Accuracy: {accuracy:.2f}')
print('Classification Report:')
print(report)

Example: measuring the performance

Accuracy: 0.87
Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00         7
  versicolor       0.83      0.83      0.83        12
   virginica       0.82      0.82      0.82        11

    accuracy                           0.87        30
   macro avg       0.88      0.88      0.88        30
weighted avg       0.87      0.87      0.87        30

Summary

  • We introduced relevant terminology.
  • We examined a hypothetical example.
  • Next, we will explore a complete example using scikit-learn.
  • We performed a detailed exploration of our data.
  • Finally, we recognized the necessity of an independent test set to accurately measure performance.

Prologue

Further readings (1/3)

  • The Hundred-Page Machine Learning Book (Burkov 2019) is a succinct and focused textbook that can feasibly be read in one week, making it an excellent introductory resource.
  • Available under a “read first, buy later” model, allowing readers to evaluate its content before purchasing.
  • Its author, Andriy Burkov, received his Ph.D. in AI from Université Laval.

Further readings (2/3)

Further readings (3/3)

  • Mathematics for Machine Learning (Deisenroth, Faisal, and Ong 2020) aims to provide the necessary mathematical skills to read machine learning books.
  • PDF of the book
  • “This book provides great coverage of all the basic mathematical concepts for machine learning. I’m looking forward to sharing it with students, colleagues, and anyone interested in building a solid understanding of the fundamentals.” Joelle Pineau, McGill University and Facebook

References

Burkov, Andriy. 2019. The Hundred-Page Machine Learning Book. Andriy Burkov.
Deisenroth, Marc Peter, A. Aldo Faisal, and Cheng Soon Ong. 2020. Mathematics for Machine Learning. Cambridge University Press. https://doi.org/10.1017/9781108679930.
Géron, Aurélien. 2022. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. 3rd ed. O’Reilly Media, Inc.
Kingsford, C, and Steven L Salzberg. 2008. “What Are Decision Trees?” Nature Biotechnology 26 (9): 1011–13. https://doi.org/10.1038/nbt0908-1011.
Mitchell, Tom M. 1997. Machine Learning. New York: McGraw-Hill.
Russell, Stuart, and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach. 4th ed. Pearson. http://aima.cs.berkeley.edu/.

Next lecture

  • Linear regression
  • Gradient desceent
  • Logistic regression

Marcel Turcotte

Marcel.Turcotte@uOttawa.ca

School of Electrical Engineering and Computer Science (EECS)

University of Ottawa