Introduction to Artificial Neural Networks

CSI 4106 - Fall 2025

Marcel Turcotte

Version: Jul 10, 2025 16:47

Preamble

Quote of the Day

;document.getElementById("tweet-65152").innerHTML = tweet["html"];

The Nobel Prize in Physics 2024 was awarded to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks”

Learning objectives

  • Explain perceptrons and MLPs: structure, function, history, and limitations.
  • Describe activation functions: their role in enabling complex pattern learning.
  • Implement a feedforward neural network with Keras on Fashion-MNIST.
  • Interpret neural network training and results: visualization and evaluation metrics.
  • Familiarize with deep learning frameworks: PyTorch, TensorFlow, and Keras for model building and deployment.

Introduction

Neural Networks (NN)

We now shift our focus to a family of machine learning models that draw inspiration from the structure and function of biological neural networks found in animals.

Machine Learning Problems

  • Supervised Learning: Classification, Regression

  • Unsupervised Learning: Autoencoders, Self-Supervised

  • Reinforcement Learning: Now an Integral Component

A neuron

Interconnected neurons

Connectionist

Hierarchy of concepts

Basics

Computations with neurodes

where \(x_1, x_2 \in \{0,1\}\) and \(f(z)\) is an indicator function: \[ f(z)= \begin{cases}0, & z<\theta \\ 1, & z \geq \theta\end{cases} \]

Computations with neurodes

\[ y = f(x_1 + x_2)= \begin{cases}0, & x_1 + x_2 <\theta \\ 1, & x_1 + x_2 \geq \theta\end{cases} \]

  • With \(\theta = 2\), the neurode implements an AND logic gate.

  • With \(\theta = 1\), the neurode implements an OR logic gate.

Computations with neurodes

  • Digital computations can be broken down into a sequence of logical operations, enabling neurode networks to execute any computation.

  • McCulloch and Pitts (1943) did not focus on learning parameter \(\theta\).

  • They introduced a machine that computes any function but cannot learn.

Threshold logic unit

Simple Step Functions

\(\text{heaviside}(t)\) =

  • 1, if \(t \geq 0\)

  • 0, if \(t < 0\)

\(\text{sign}(t)\) =

  • 1, if \(t > 0\)

  • 0, if \(t = 0\)

  • -1, if \(t < 0\)

Notation

Notation

Perceptron

Perceptron

Notation

Notation

  • \(X\) is the input data matrix where each row corresponds to an example and each column represents one of the \(D\) features.

  • \(W\) is the weight matrix, structured with one row per input (feature) and one column per neuron.

  • Bias terms can be represented separately; both approaches appear in the literature. Here, \(b\) is a vector with a length equal to the number of neurons.

Discussion

  • The algorithm to train the perceptron closely resembles stochastic gradient descent.

    • In the interest of time and to avoid confusion, we will skip this algorithm and focus on multilayer perception (MLP) and its training algorithm, backpropagation.

Historical Note and Justification

Multilayer Perceptron

XOR Classification problem

\(x^{(1)}\) \(x^{(2)}\) \(y\) \(o_1\) \(o_2\) \(o_3\)
1 0 1 0 1 1
0 1 1 0 1 1
0 0 0 0 0 0
1 1 0 1 1 0

Feedforward Neural Network (FNN)

Forward Pass (Computatation)

\(o3 = \sigma(w_{13} x^{(1)}+ w_{23} x^{(2)} + b_3)\)

\(o4 = \sigma(w_{14} x^{(1)}+ w_{24} x^{(2)} + b_4)\)

\(o5 = \sigma(w_{15} x^{(1)}+ w_{25} x^{(2)} + b_5)\)

\(o6 = \sigma(w_{36} o_3 + w_{46} o_4 + w_{56} o_5 + b_6)\)

\(o7 = \sigma(w_{37} o_3 + w_{47} o_4 + w_{57} o_5 + b_7)\)

Forward Pass (Computatation)

import numpy as np

# Sigmoid function

def sigma(x):
    return 1 / (1 + np.exp(-x))

# Input (two attributes) vector, one example of our trainig set

x1, x2 = (0.5, 0.9)

# Initializing the weights of layers 2 and 3 to random values

w13, w14, w15, w23, w24, w25 = np.random.uniform(low=-1, high=1, size=6)
w36, w46, w56, w37, w47, w57 = np.random.uniform(low=-1, high=1, size=6)

# Initializing all 5 bias terms to random values

b3, b4, b5, b6, b7 = np.random.uniform(low=-1, high=1, size=5)

o3 = sigma(w13 * x1 + w23 * x2 + b3)
o4 = sigma(w14 * x1 + w24 * x2 + b4)
o5 = sigma(w15 * x1 + w25 * x2 + b5)
o6 = sigma(w36 * o3 + w46 * o4 + w56 * o5 + b6)
o7 = sigma(w37 * o3 + w47 * o4 + w57 * o5 + b7)

(o6, o7)
(np.float64(0.46460973054399307), np.float64(0.24291381296138898))

Forward Pass (Computatation)

Forward Pass (Computatation)

Activation Function

  • As will be discussed later, the training algorithm, known as backpropagation, employs gradient descent, necessitating the calculation of the partial derivatives of the loss function.

  • The step function in the multilayer perceptron had to be replaced, as it consists only of flat surfaces. Gradient descent cannot progress on flat surfaces due to their zero derivative.

Activation Function

  • Nonlinear activation functions are paramount because, without them, multiple layers in the network would only compute a linear function of the inputs.

  • According to the Universal Approximation Theorem, sufficiently large deep networks with nonlinear activation functions can approximate any continuous function. See Universal Approximation Theorem.

Sigmoid

\[ \sigma(t) = \frac{1}{1 + e^{-t}} \]

Hyperbolic Tangent Function

\[ \tanh(t) = 2 \sigma(2t) - 1 \]

Rectified linear unit function (ReLU)

\[ \mathrm{ReLU}(t) = \max(0, t) \]

Common Activation Functions

Universal Approximation

Definition

The universal approximation theorem (UAT) states that a feedforward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a compact subset of \(\mathbb{R}^n\), given appropriate weights and activation functions.

Demonstration with code

import numpy as np

# Defining the function to be approximated

def f(x):
  return 2 * x**3 + 4 * x**2 - 5 * x + 1

# Generating a dataset, x in [-4,2), f(x) as above

X = 6 * np.random.rand(1000, 1) - 4

y = f(X.flatten())

Increasing the number of neurons

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.1, random_state=42)

models = []

sizes = [1, 2, 5, 10, 100]

for i, n in enumerate(sizes):

  models.append(MLPRegressor(hidden_layer_sizes=[n], max_iter=5000, random_state=42))

  models[i].fit(X_train, y_train) 

Increasing the number of neurons

Increasing the number of neurons

Universal Approximation

Let’s code

Frameworks

PyTorch and TensorFlow are the leading platforms for deep learning.

  • PyTorch has gained considerable traction in the research community. Initially developed by Meta AI, it is now part of the Linux Foundation.

  • TensorFlow, created by Google, is widely adopted in industry for deploying models in production environments.

Keras

Keras is a high-level API designed to build, train, evaluate, and execute models across various backends, including PyTorch, TensorFlow, and JAX, Google’s high-performance platform.

Fashion-MNIST dataset

Fashion-MNIST is a dataset of Zalando’s article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.”

Loading

import tensorflow as tf

fashion_mnist = tf.keras.datasets.fashion_mnist.load_data()

(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist

X_train, y_train = X_train_full[:-5000], y_train_full[:-5000]
X_valid, y_valid = X_train_full[-5000:], y_train_full[-5000:]

Exploration

X_train.shape
(55000, 28, 28)
X_train.dtype
dtype('uint8')

Transforming the pixel intensities from integers in the range 0 to 255 to floats in the range 0 to 1.

X_train, X_valid, X_test = X_train / 255., X_valid / 255., X_test / 255.

What are these images anyway!

plt.figure(figsize=(2, 2))
plt.imshow(X_train[0], cmap="binary")
plt.axis('off')
plt.show()

y_train
array([9, 0, 0, ..., 9, 0, 2], shape=(55000,), dtype=uint8)

Since the labels are integers, 0 to 9. Class names will become handy.

class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

First 40 images

n_rows = 4
n_cols = 10
plt.figure(figsize=(n_cols * 1.2, n_rows * 1.2))
for row in range(n_rows):
    for col in range(n_cols):
        index = n_cols * row + col
        plt.subplot(n_rows, n_cols, index + 1)
        plt.imshow(X_train[index], cmap="binary", interpolation="nearest")
        plt.axis('off')
        plt.title(class_names[y_train[index]])
plt.subplots_adjust(wspace=0.2, hspace=0.5)
plt.show()

First 40 images

Creating a model

tf.random.set_seed(42)

model = tf.keras.Sequential()

model.add(tf.keras.layers.InputLayer(shape=[28, 28]))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(300, activation="relu"))
model.add(tf.keras.layers.Dense(100, activation="relu"))
model.add(tf.keras.layers.Dense(10, activation="softmax"))

model.summary()

Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ flatten (Flatten)               │ (None, 784)            │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 300)            │       235,500 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 100)            │        30,100 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 10)             │         1,010 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 266,610 (1.02 MB)
 Trainable params: 266,610 (1.02 MB)
 Non-trainable params: 0 (0.00 B)

Creating a model (alternative)

model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(28, 28)),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(300, activation="relu"),
    tf.keras.layers.Dense(100, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])

model.summary()

Compiling the model

model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])

Training the model

history = model.fit(X_train, y_train, epochs=30,
                    validation_data=(X_valid, y_valid))

Visualization

import pandas as pd 

pd.DataFrame(history.history).plot(
    figsize=(8, 5), xlim=[0, 29], ylim=[0, 1], grid=True, xlabel="Epoch",
    style=["r--", "r--.", "b-", "b-*"])
plt.legend(loc="lower left")  # extra code
plt.show()

Visualization

Evaluating the model on our test

model.evaluate(X_test, y_test)

Making predictions

X_new = X_test[:3]
y_proba = model.predict(X_new)
y_proba.round(2)
y_pred = y_proba.argmax(axis=-1)
y_pred
y_new = y_test[:3]
y_new

Predicted vs Observed

np.array(class_names)[y_pred]

Test Set Performance

from sklearn.metrics import classification_report

y_proba = model.predict(X_test)
y_pred = y_proba.argmax(axis=-1)

Test Set Performance

print(classification_report(y_test, y_pred))

Prologue

Summary

  • Introduction to Neural Networks and Connectionism
    • Shift from symbolic AI to connectionist approaches in artificial intelligence.
    • Inspiration from biological neural networks and the human brain’s structure.
  • Computations with Neurodes and Threshold Logic Units
    • Early models of neurons (neurodes) capable of performing logical operations (AND, OR, NOT).
    • Limitations of simple perceptrons in solving non-linearly separable problems like XOR.
  • Multilayer Perceptrons (MLPs) and Feedforward Neural Networks (FNNs)
    • Overcoming perceptron limitations by introducing hidden layers.
    • Structure and information flow in feedforward neural networks.
    • Explanation of forward pass computations in neural networks.
  • Activation Functions in Neural Networks
    • Importance of nonlinear activation functions (sigmoid, tanh, ReLU) for enabling learning of complex patterns.
    • Role of activation functions in backpropagation and gradient descent optimization.
    • Universal Approximation Theorem and its implications for neural networks.
  • Deep Learning Frameworks
    • Overview of PyTorch and TensorFlow as leading platforms for deep learning.
    • Introduction to Keras as a high-level API for building and training neural networks.
    • Discussion on the suitability of different frameworks for research and industry applications.
  • Hands-On Implementation with Keras
    • Loading and exploring the Fashion-MNIST dataset.
    • Building a neural network model using Keras’ Sequential API.
    • Compiling the model with appropriate loss functions and optimizers for multiclass classification.
    • Training the model and visualizing training and validation metrics over epochs.
    • Evaluating model performance on test data and interpreting results.
  • Making Predictions and Interpreting Results
    • Using the trained model to make predictions on new data.
    • Visualizing predictions alongside actual images and labels.
    • Understanding the output probabilities and class assignments in the context of the dataset.

Next lecture

  • We will discuss the training algorithm for artificial neural networks.

References

Cybenko, George V. 1989. “Approximation by Superpositions of a Sigmoidal Function.” Mathematics of Control, Signals and Systems 2: 303–14. https://api.semanticscholar.org/CorpusID:3958369.
Géron, Aurélien. 2022. Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. 3rd ed. O’Reilly Media, Inc.
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. Adaptive Computation and Machine Learning. MIT Press. https://dblp.org/rec/books/daglib/0040158.
Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989. “Multilayer Feedforward Networks Are Universal Approximators.” Neural Networks 2 (5): 359–66. https://doi.org/https://doi.org/10.1016/0893-6080(89)90020-8.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (7553): 436–44. https://doi.org/10.1038/nature14539.
LeNail, Alexander. 2019. NN-SVG: Publication-Ready Neural Network Architecture Schematics.” Journal of Open Source Software 4 (33): 747. https://doi.org/10.21105/joss.00747.
McCulloch, Warren S, and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity.” The Bulletin of Mathematical Biophysics 5 (4): 115–33. https://doi.org/10.1007/bf02478259.
Minsky, Marvin, and Seymour Papert. 1969. Perceptrons: An Introduction to Computational Geometry. Cambridge, MA, USA: MIT Press.
Rosenblatt, F. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65 (6): 386–408. https://doi.org/10.1037/h0042519.
Russell, Stuart, and Peter Norvig. 2020. Artificial Intelligence: A Modern Approach. 4th ed. Pearson. http://aima.cs.berkeley.edu/.

Marcel Turcotte

Marcel.Turcotte@uOttawa.ca

School of Electrical Engineering and Computer Science (EECS)

University of Ottawa