A notebook is a shareable document that combines computer code, plain language descriptions, data, rich visualizations like 3D models, charts, graphs and figures, and interactive controls. A notebook, along with an editor (like JupyterLab), provides a fast interactive environment for prototyping and explaining code, exploring and visualizing data, and sharing ideas with others.
Quick Start
Running Jupyter on your computer
Assuming the notebook is in the current directory, execute the following command from the terminal.
Ease of Use: The interface is intuitive and conducive to exploratory analysis.
Visualization: The capability to embed rich, interactive visualizations directly within the notebook enhances its utility for data analysis and presentation.
Reproducibility: Jupyter Notebooks have become the de facto standard in many domains for demonstrating code functionality and ensuring reproducibility.
import numpy as npimport matplotlib.pyplot as plt# Sigmoid functiondef sigmoid(t):return1/ (1+ np.exp(-t))# Generate x valuest = np.linspace(-6, 6, 400)# Compute y values for the sigmoid functiony = sigmoid(t)# Create a figure and remove axes and gridfig, ax = plt.subplots()ax.plot(t, y, color='black', linewidth=2) # Keep the curve opaqueplt.grid(True)# Set transparent background for the figure and axesfig.patch.set_alpha(0) # Transparent background for the figureplt.show()
Lecture Notes
Prologue
Summary
Introducing the tools, specifically Jupyter Notebooks and Google Colab.
By default, Jupyter Notebooks store the outputs of code cells, including media objects.
Jupyter Notebooks are JSON documents, and images within them are encoded in PNG base64 format.
This encoding can lead to several issues when using version control systems, such as GitHub.
Large File Sizes: Jupyter Notebooks can become quite large due to embedded images and outputs, leading to prolonged upload times and potential storage constraints.
Incompatibility with Text-Based Version Control: GitHub is optimized for text-based files, and the inclusion of binary data, such as images, complicates the process of tracking changes and resolving conflicts. Traditional diff and merge operations are not well-suited for handling these binary formats.
Version Control (GitHub) - solutions
In JupyterLab or Notebook, Edit \(\rightarrow\) Clear Outputs of All Cells, then save.
On the command line, use jupyter nbconvert --clear-output
Do not attempt to install these tools unless you are confident in your technical skills. An incorrect installation could waste significant time or even render your environment unusable. There is nothing wrong with using pip or Google Colab for your coursework. You can develop these installation skills later without impacting your grades.
Package management
Managing package dependencies can be complex.
A package manager addresses these challenges.
Different projects may require different versions of the same libraries.
Package management tools, such as conda, facilitate the creation of virtual environments tailored to specific projects.
Anaconda
Anaconda is a comprehensive package management platform for Python and R. It utilizes Conda to manage packages, dependencies, and environments.
Anaconda is advantageous as it comes pre-installed with over 250 popular packages, providing a robust starting point for users.
However, this extensive distribution results in a large file size, which can be a drawback.
Additionally, since Anaconda relies on conda, it also inherits the limitations and issues associated with conda (see subsequent slides).
Miniconda
Miniconda is a minimal version of Anaconda that includes only conda, Python, their dependencies, and a small selection of essential packages.
Conda
Conda is an open-source package and environment management system for Python and R. It facilitates the installation and management of software packages and the creation of isolated virtual environments.
Dependency conflicts due to complex package interdependencies can force the user reinstall Anaconda/Conda.
Plague with large storage requirements and performance issues during package resolution.
Mamba
Mamba is a reimplementation of the conda package manager in C++.
It is significantly faster than conda.
It consumes fewer computational resources.
It provides clearer and more informative error messages.
It is fully compatible with conda, making it a viable replacement.
Micromamba is a fully statically-linked, self-contained executable. Its empty base environment ensures that the base is never corrupted, eliminating the need for reinstallation.