Machine Learning Engineering

CSI 5180 - Machine Learning for Bioinformatics

Important

I have now published the descriptions for assignment 2 on the course website. You can access it through the following link:

Prepare

  • TensorFlow Playground
    • Dataset Options: Users can choose from four types of datasets: circular, XOR, Gaussian, and spiral.
    • Feature Engineering: Enables the creation of new features to improve model performance.
    • Model Architecture: Allows customization of neural network architecture, including varying the number of layers and neurons per layer.
    • Hyperparameter Tuning: Provides options to adjust learning rate, activation functions, regularization techniques, and task specifications to observe their effects on model training.
    • Suggestion 1: For the Gaussian dataset, which is linearly separable, configure a network without hidden layers and a single output neuron using the sigmoid activation function. This setup effectively constructs a logistic regression model.
    • Suggestion 2: The circular dataset is not linearly separable using only the original features \(x_1\) and \(x_2\). However, by creating new features, \(x_1^2\) and \(x_2^2\), the problem becomes linearly separable in the transformed feature space. A network with no hidden layers and a single output node is sufficient for this task.
  • Consult Zou et al. (2019) and its Tutorial on Google Colab.

Participate

Further Readings

Zou, James, Mikael Huss, Abubakar Abid, Pejman Mohammadi, Ali Torkamani, and Amalio Telenti. 2019. “A Primer on Deep Learning in Genomics.” Nature Genetics 51 (1): 12–18. https://doi.org/10.1038/s41588-018-0295-5.