CSI 5180 - Machine Learning for Bioinformatics
Version: Feb 19, 2025 13:30
In this lecture, we will cover the foundational concepts of linear regression, and gradient descent.
You will gain a deeper understanding of these essential machine learning techniques, enabling you to apply them effectively in your work.
Linear regression is introduced to conveniently present a well-known training algorithm, gradient descent. Additionally, it serves as a foundation for introducing logistic regression–a classification algorithm—which further facilitates discussions on artificial neural networks.
Gene regulation at the National Human Genome Research Institute
Centers for Disease Control and Prevention 2024-04-20
Your genes play an important role in your health, but so do your behaviors and environment, such as what you eat and how physically active you are. Epigenetics is the study of how your behaviors and environment can cause changes that affect the way your genes work. Unlike genetic changes, epigenetic changes are reversible and do not change your DNA sequence, but they can change how your body reads a DNA sequence.
Gene expression refers to how often or when proteins are created from the instructions within your genes. While genetic changes can alter which protein is made, epigenetic changes affect gene expression to turn genes “on” and “off.” Since your environment and behaviors, such as diet and exercise, can result in epigenetic changes, it is easy to see the connection between your genes and your behaviors and environment.
“Epigenetics is the study of heritable changes in gene expression (active versus inactive gene) that do not involve changes to the underlying DNA sequence — a change in phenotype without a change in genotype — which in turn affects how cells read the genes.”
“Epigenetic change is a regular and natural occurrence but can also be influenced by several factors including age, the environment/lifestyle, and disease state.”
“Epigenetic modifications can manifest as commonly as the manner in which cells terminally differentiate to end up as skin cells, liver cells, brain cells, etc.”
“At least three systems including methylation, histone modifications and non-coding RNA (ncRNA).”
\(y_i\) is the expression level of gene \(i\).
\(x_i = \{x_{i1},\ldots,x_{iP}\}\) is a vector with \(P\) regulatory signals for gene \(i\).
where \(i=1,\ldots,N\).
The learned coefficients identify whether a signal functions as an activator (positive), a repressor (negative), or is irrelevant (zero).
The paper introduces a methodology for predicting gene expression based on an extensive array of regulatory signals, including transcription factor binding affinities and histone modification profiles.
The approach autonomously determines the optimal number of regression models required for accurate predictions.
It also automatically selects pertinent features, enhancing model precision.
A typical learning algorithm comprises the following components:
Problem: find values for all the model parameters so that the model “best fits” the training data.
\[ \sqrt{\frac{1}{N}\sum_1^N [h(x_i) - y_i]^2} \]
Until some termination criteria is met1:
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
# Define the variable and function
t = sp.symbols('t')
f = t**2 + 4*t + 7
# Compute the derivative
f_prime = sp.diff(f, t)
# Lambdify the functions for numerical plotting
f_func = sp.lambdify(t, f, "numpy")
f_prime_func = sp.lambdify(t, f_prime, "numpy")
# Generate t values for plotting
t_vals = np.linspace(-5, 2, 400)
# Get y values for the function and its derivative
f_vals = f_func(t_vals)
f_prime_vals = f_prime_func(t_vals)
# Plot the function and its derivative
plt.plot(t_vals, f_vals, label=r'$f(t) = t^2 + 4t + 7$', color='blue')
plt.plot(t_vals, f_prime_vals, label=r"$f'(t) = 2t + 4$", color='red')
# Add labels and legend
plt.axhline(0, color='black',linewidth=1)
plt.axvline(0, color='black',linewidth=1)
plt.title('Function and Derivative')
plt.xlabel('t')
plt.ylabel(r'$f(t)$')
plt.legend()
# Show the plot
plt.grid(True)
plt.show()
The graph of the derivative, \(f^{'}(t)\), is depicted in red.
The derivative indicates how changes in the input affect the output, \(f(t)\).
The magnitude of the derivative at \(t = -2\) is \(0\).
This point corresponds to the minimum of our function.
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
# Define the variable and function
t = sp.symbols('t')
f = t**2 + 4*t + 7
# Compute the derivative
f_prime = sp.diff(f, t)
# Lambdify the functions for numerical plotting
f_func = sp.lambdify(t, f, "numpy")
f_prime_func = sp.lambdify(t, f_prime, "numpy")
# Generate t values for plotting
t_vals = np.linspace(-5, 2, 400)
# Get y values for the function and its derivative
f_vals = f_func(t_vals)
f_prime_vals = f_prime_func(t_vals)
# Plot the function and its derivative
plt.plot(t_vals, f_vals, label=r'$f(t) = t^2 + 4t + 7$', color='blue')
plt.plot(t_vals, f_prime_vals, label=r"$f'(t) = 2t + 4$", color='red')
# Add labels and legend
plt.axhline(0, color='black',linewidth=1)
plt.axvline(0, color='black',linewidth=1)
plt.title('Function and Derivative')
plt.xlabel('t')
plt.ylabel('y')
plt.legend()
# Show the plot
plt.grid(True)
plt.show()
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
# Define the variable and function
t = sp.symbols('t')
f = t**2 + 4*t + 7
# Compute the derivative
f_prime = sp.diff(f, t)
# Lambdify the functions for numerical plotting
f_func = sp.lambdify(t, f, "numpy")
f_prime_func = sp.lambdify(t, f_prime, "numpy")
# Generate t values for plotting
t_vals = np.linspace(-5, 2, 400)
# Get y values for the function and its derivative
f_vals = f_func(t_vals)
f_prime_vals = f_prime_func(t_vals)
# Plot the function and its derivative
plt.plot(t_vals, f_vals, label=r'$f(t) = t^2 + 4t + 7$', color='blue')
plt.plot(t_vals, f_prime_vals, label=r"$f'(t) = 2t + 4$", color='red')
# Fill the area below the derivative where it's negative
plt.fill_between(t_vals, f_prime_vals, where=(f_prime_vals > 0), color='red', alpha=0.3)
# Add labels and legend
plt.axhline(0, color='black',linewidth=1)
plt.axvline(0, color='black',linewidth=1)
plt.title('Function and Derivative')
plt.xlabel('t')
plt.ylabel('y')
plt.legend()
# Show the plot
plt.grid(True)
plt.show()
A positive derivative indicates that increasing the input variable will increase the output value.
Additionally, the magnitude of the derivative quantifies how rapidly the output changes.
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
# Define the variable and function
t = sp.symbols('t')
f = t**2 + 4*t + 7
# Compute the derivative
f_prime = sp.diff(f, t)
# Lambdify the functions for numerical plotting
f_func = sp.lambdify(t, f, "numpy")
f_prime_func = sp.lambdify(t, f_prime, "numpy")
# Generate t values for plotting
t_vals = np.linspace(-5, 2, 400)
# Get y values for the function and its derivative
f_vals = f_func(t_vals)
f_prime_vals = f_prime_func(t_vals)
# Plot the function and its derivative
plt.plot(t_vals, f_vals, label=r'$f(t) = t^2 + 4t + 7$', color='blue')
plt.plot(t_vals, f_prime_vals, label=r"$f'(t) = 2t + 4$", color='red')
# Fill the area below the derivative where it's negative
plt.fill_between(t_vals, f_prime_vals, where=(f_prime_vals < 0), color='red', alpha=0.3)
# Add labels and legend
plt.axhline(0, color='black',linewidth=1)
plt.axvline(0, color='black',linewidth=1)
plt.title('Function and Derivative')
plt.xlabel('t')
plt.ylabel('y')
plt.legend()
# Show the plot
plt.grid(True)
plt.show()
A negative derivative indicates that increasing the input variable will decrease the output value.
Additionally, the magnitude of the derivative quantifies how rapidly the output changes.
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
# Define the variable and function
t = sp.symbols('t')
f = t**2 + 4*t + 7
# Compute the derivative
f_prime = sp.diff(f, t)
# Lambdify the functions for numerical plotting
f_func = sp.lambdify(t, f, "numpy")
f_prime_func = sp.lambdify(t, f_prime, "numpy")
# Generate t values for plotting
t_vals = np.linspace(-5, 2, 400)
# Get y values for the function and its derivative
f_vals = f_func(t_vals)
f_prime_vals = f_prime_func(t_vals)
# Plot the function and its derivative
plt.plot(t_vals, f_vals, label=r'$J$', color='blue')
plt.plot(t_vals, f_prime_vals, label=r"$\frac {\partial}{\partial \theta_j}J(\theta)$", color='red')
# Add labels and legend
plt.axhline(0, color='black',linewidth=1)
plt.axvline(0, color='black',linewidth=1)
plt.title('Function and Derivative')
plt.xlabel(r'$\theta_j$')
plt.ylabel(r'$J$')
plt.legend()
# Show the plot
plt.grid(True)
plt.show()
When the value of \(\theta_j\) is in the range \([- \inf, -2)\), \(\frac {\partial}{\partial \theta_j}J(\theta)\) has a negative value.
Therefore, \(- \alpha \frac {\partial}{\partial \theta_j}J(\theta)\) is positive.
Accordingly, the value of \(\theta_j\) is increased.
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
# Define the variable and function
t = sp.symbols('t')
f = t**2 + 4*t + 7
# Compute the derivative
f_prime = sp.diff(f, t)
# Lambdify the functions for numerical plotting
f_func = sp.lambdify(t, f, "numpy")
f_prime_func = sp.lambdify(t, f_prime, "numpy")
# Generate t values for plotting
t_vals = np.linspace(-5, 2, 400)
# Get y values for the function and its derivative
f_vals = f_func(t_vals)
f_prime_vals = f_prime_func(t_vals)
# Plot the function and its derivative
plt.plot(t_vals, f_vals, label=r'$J$', color='blue')
plt.plot(t_vals, f_prime_vals, label=r"$\frac {\partial}{\partial \theta_j}J(\theta)$", color='red')
# Add labels and legend
plt.axhline(0, color='black',linewidth=1)
plt.axvline(0, color='black',linewidth=1)
plt.title('Function and Derivative')
plt.xlabel(r'$\theta_j$')
plt.ylabel(r'$J$')
plt.legend()
# Show the plot
plt.grid(True)
plt.show()
When the value of \(\theta_j\) is in the range \((-2, \infty]\), \(\frac {\partial}{\partial \theta_j}J(\theta)\) has a positive value.
Therefore, \(- \alpha \frac {\partial}{\partial \theta_j}J(\theta)\) is negative.
Accordingly, the value of \(\theta_j\) is decreased.
Given
\[ J(\theta_0, \theta_1) = \frac{1}{N}\sum_1^N [h(x_i) - y_i]^2 = \frac{1}{N}\sum_1^N [\theta_0 + \theta_1 x_i - y_i]^2 \]
We have
\[ \frac {\partial}{\partial \theta_0}J(\theta_0, \theta_1) = \frac{2}{N} \sum\limits_{i=1}^{N} (\theta_0 - \theta_1 x_i - y_{i}) \]
and
\[ \frac {\partial}{\partial \theta_1}J(\theta_0, \theta_1) = \frac{2}{N} \sum\limits_{i=1}^{N} x_{i} \left(\theta_0 + \theta_1 x_i - y_{i}\right) \]
from IPython.display import Math, display
from sympy import *
# Define the symbols
theta_0, theta_1, x_i, y_i = symbols('theta_0 theta_1 x_i y_i')
# Define the hypothesis function:
h = theta_0 + theta_1 * x_i
print("Hypothesis function:")
display(Math('h(x) = ' + latex(h)))
Hypothesis function:
\(\displaystyle h(x) = \theta_{0} + \theta_{1} x_{i}\)
# Calculate the partial derivative with respect to theta_0
partial_derivative_theta_0 = diff(J, theta_0)
print("Partial derivative with respect to theta_0:")
display(Math(latex(partial_derivative_theta_0)))
Partial derivative with respect to theta_0:
\(\displaystyle \frac{\sum_{x_{i}=1}^{N} \left(2 \theta_{0} + 2 \theta_{1} x_{i} - 2 y_{i}\right)}{N}\)
# Calculate the partial derivative with respect to theta_1
partial_derivative_theta_1 = diff(J, theta_1)
print("\nPartial derivative with respect to theta_1:")
display(Math(latex(partial_derivative_theta_1)))
Partial derivative with respect to theta_1:
\(\displaystyle \frac{\sum_{x_{i}=1}^{N} 2 x_{i} \left(\theta_{0} + \theta_{1} x_{i} - y_{i}\right)}{N}\)
\[ h (x_i) = \theta_0 + \theta_1 x_i^{(1)} + \theta_2 x_i^{(2)} + \theta_3 x_i^{(3)} + \cdots + \theta_D x_i^{(D)} \]
\[ \begin{align*} x_i^{(j)} &= \text{value of the feature } j \text{ in the } i \text{th example} \\ D &= \text{the number of features} \end{align*} \]
The new loss function is
\[ J(\theta_0, \theta_1,\ldots,\theta_D) = \dfrac {1}{N} \displaystyle \sum _{i=1}^N \left (h(x_{i}) - y_i \right)^2 \]
Its partial derivative:
\[ \frac {\partial}{\partial \theta_j}J(\theta) = \frac{2}{N} \sum\limits_{i=1}^N x_i^{(j)} \left( \theta x_i - y_i \right) \]
where \(\theta\), \(x_i\) and \(y_i\) are vectors, and \(\theta x_i\) is a vector operation!
The vector containing the partial derivative of \(J\) (with respect to \(\theta_j\), for \(j \in \{0, 1\ldots D\}\)) is called the gradient vector.
\[ \nabla_\theta J(\theta) = \begin{pmatrix} \frac {\partial}{\partial \theta_0}J(\theta) \\ \frac {\partial}{\partial \theta_1}J(\theta) \\ \vdots \\ \frac {\partial}{\partial \theta_D}J(\theta)\\ \end{pmatrix} \]
\[ \theta' = \theta - \alpha \nabla_\theta J(\theta) \]
The gradient descent algorithm becomes:
Repeat until convergence:
\[ \begin{aligned} \{ & \\ \theta_j := & \theta_j - \alpha \frac {\partial}{\partial \theta_j}J(\theta_0, \theta_1, \ldots, \theta_D) \\ &\text{for } j \in [0, \ldots, D] \textbf{ (update simultaneously)} \\ \} & \end{aligned} \]
Repeat until convergence:
\[ \begin{aligned} \; \{ & \\ \; & \theta_0 := \theta_0 - \alpha \frac{2}{N} \sum\limits_{i=1}^{N} x^{0}_i(h(x_i) - y_i) \\ \; & \theta_1 := \theta_1 - \alpha \frac{2}{N} \sum\limits_{i=1}^{N} x^{1}_i(h(x_i) - y_i) \\ \; & \theta_2 := \theta_2 - \alpha \frac{2}{N} \sum\limits_{i=1}^{N} x^{2}_i(h(x_i) - y_i) \\ & \cdots \\ \} & \end{aligned} \]
What were our assumptions?
A function is convex if for any pair of points on the graph of the function, the line connecting these two points lies above or on the graph.
For functions that are not convex, the gradient descent algorithm converges to a local minimum.
The loss function generally used with linear or logistic regressions, and Support Vector Machines (SVM) are convex, but not the ones for artificial neural networks.
# 1. Define the symbolic variable and the function
x = sp.Symbol('x', real=True)
f_expr = 2*x**3 + 4*x**2 - 5*x + 1
# 2. Compute the derivative of f
f_prime_expr = sp.diff(f_expr, x)
# 3. Convert symbolic expressions to Python functions
f = sp.lambdify(x, f_expr, 'numpy')
f_prime = sp.lambdify(x, f_prime_expr, 'numpy')
# 4. Generate a range of x-values
x_vals = np.linspace(-4, 2, 1000)
# 5. Compute f and f' over this range
y_vals = f(x_vals)
y_prime_vals = f_prime(x_vals)
# 6. Prepare LaTeX strings for legend
f_label = rf'$f(x) = {sp.latex(f_expr)}$'
f_prime_label = rf'$f^\prime(x) = {sp.latex(f_prime_expr)}$'
# 7. Plot f and f', with equations in the legend
plt.figure(figsize=(8, 4))
plt.plot(x_vals, y_vals, label=f_label)
plt.plot(x_vals, y_prime_vals, label=f_prime_label)
# 8. Shade the region between x-axis and f'(x) for the entire domain
plt.fill_between(x_vals, y_prime_vals, 0, color='gray', alpha=0.2, interpolate=True,
label='Region between 0 and f\'(x)')
# 9. Add reference line, labels, legend, etc.
plt.axhline(0, color='black', linewidth=0.5)
plt.title(rf'Function and its Derivative with Shading for $f^\prime(x)$')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()
import numpy as np
import matplotlib.pyplot as plt
def f(x):
return x**2
def grad_f(x):
return 2*x
# Initial guess, learning rate, and number of gradient-descent steps
x_current = 2.0
learning_rate = 1.1 # Too large => divergence
num_iterations = 5 # We'll do five updates
# Store each x value in a list (trajectory) for plotting
trajectory = [x_current]
# Perform gradient descent
for _ in range(num_iterations):
g = grad_f(x_current)
x_current = x_current - learning_rate * g
trajectory.append(x_current)
# Prepare data for plotting
x_vals = np.linspace(-5, 5, 1000)
y_vals = f(x_vals)
# Plot the function f(x)
plt.figure(figsize=(6, 5))
plt.plot(x_vals, y_vals, label=r"$f(x) = x^2$")
plt.axhline(0, color='black', linewidth=0.5)
# Plot the trajectory, labeling each iteration
for i, x_t in enumerate(trajectory):
y_t = f(x_t)
# Plot the point
plt.plot(x_t, y_t, 'ro')
# Label the iteration number
plt.text(x_t, y_t, f" {i}", color='red')
# Connect consecutive points
if i > 0:
x_prev = trajectory[i - 1]
y_prev = f(x_prev)
plt.plot([x_prev, x_t], [y_prev, y_t], 'r--')
# Final touches
plt.title("Gradient Descent Divergence with a Large Learning Rate")
plt.xlabel("x")
plt.ylabel("f(x)")
plt.legend()
plt.grid(True)
plt.show()
The stochastic gradient descent algorithm randomly selects one training instance to calculate its gradient.
epochs = 10
for epoch in range(epochs):
for i in range(N):
selection = np.random.randint(N)
# Calculate the gradient using selection
# Update the weights
Batch gradient descent is inherently slow and impractical for large datasets requiring out-of-core support, though it is capable of handling a substantial number of features.
Stochastic gradient descent is fast and well-suited for processing a large volume of examples efficiently.
Mini-batch gradient descent combines the benefits of both batch and stochastic methods; it is fast, capable of managing large datasets, and leverages hardware acceleration, particularly with GPUs.
We will briefly revisit the subject when discussing deep artificial neural networks, for which specialized optimization algorithms exist.
The closed-form analytical solution to the linear regression problem is known as the normal equation.
Computational Complexity: Calculating the inverse of \(X^T X\) has a time complexity of \(O(n^3)\) (web.cs.ucla.edu).
Numerical Stability: If \(X^T X\) is nearly singular or ill-conditioned, inversion can lead to significant numerical errors (cs.cornell.edu).
Memory Requirements: The matrix \(X^T X\) has dimensions \(n \times n\), which can be prohibitive in terms of memory usage when \(n\) is large (web.cs.ucla.edu).
import numpy as np
X = 6 * np.random.rand(100, 1) - 4
y = X ** 2 - 4 * X + 5 + np.random.randn(100, 1)
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)
X_new = np.array([[-4], [2]])
y_pred = lin_reg.predict(X_new)
import matplotlib as mpl
import matplotlib.pyplot as plt
plt.plot(X, y, "b.")
plt.plot(X_new, y_pred, "r-")
plt.xlabel("$x$", fontsize=18)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.axis([-4, 2, -1, 35])
# save_fig("regression_linear-01")
plt.show()
User: Explain the code on the previous slide line by line.
AI: Certainly! Here is a line-by-line explanation of the provided code:
X
containing 100 rows and 1 column of random numbers. The np.random.rand(100, 1)
function generates uniformly distributed random numbers between 0 and 1. Multiplying by 6 scales these numbers to the range [0, 6), and subtracting 4 shifts the range to [-4, 2).y
based on a quadratic equation with some added noise. Specifically:
X ** 2
computes the square of each element in X
.- 4 * X
subtracts four times each element in X
.+ 5
adds a constant term of 5.+ np.random.randn(100, 1)
adds Gaussian noise with a mean of 0 and standard deviation of 1 to each element, making the data more realistic by introducing some randomness.LinearRegression
class from the sklearn.linear_model
module, which is part of the Scikit-Learn library. Scikit-Learn is widely used for machine learning in Python.LinearRegression
class and assigns it to the variable lin_reg
. This object will be used to fit the linear regression model to the data.(X, y)
. The fit
method trains the model using the provided features X
and target variable y
.X_new
with two values: -4 and 2. These values represent new data points for which we want to predict the target variable using the trained model.y_pred
for the new data points in X_new
. The predict
method returns the predicted values based on the fitted model.In summary, this script generates synthetic data, fits a linear regression model to it, and then uses the model to make predictions on new data points.
Marcel Turcotte
School of Electrical Engineering and Computer Science (EECS)
University of Ottawa