Intro to Applied Machine Learning

Overview

Location: E14-525
Instructors

Office: E14-548 (Responsive Environments Group)
Office Hours: Thursdays 4-5PM in E14-548
Important Links

Useful Information

While we will introduce several machine learning toolkits in a variety of languages, the lectures and assignments will primarily use Python, specifically the IPython notebook. If you want to get a development environment up on your machine, you’ll need to install:

  • python
  • ipython
  • matplotlib
  • scipy
  • scikit-learn

For those who don’t want to or have trouble getting the dependencies installed, we’ve set up a shared IPython notebook server available at https://mas500.media.mit.edu. Contact Spencer for the super-secret password. Be aware that this notebook server is shared, so make sure to create your own notebooks so as not to clobber each other.

Schedule

Class 1 – Feb 5 (intro pdf)

Class 2 – Feb 7 (Weka and MATLAB pdf)

  • Python / scikit-learn / matplotlib / numpy / etc.
  • MATLAB
  • WEKA
  • Perceptron
  • Linear SVM

Class 3 – Feb 12 

Class 4 – Feb 14 (Naive Bayes pdf)

  • Naive Bayes
  • kNN
  • K-means

Class 5 – Feb 19

  • Gaussian Mixture Models
  • 1hr Guest – Brad Knox
  • Project and Reading Discussion

Class 6 – Feb 21 (Decision trees pdf)

  • Bagging
  • Boosting
  • Decision Trees and Random Forests
  • Project and Reading Discussion
  • Assign Project and Reading

Class 7 – Feb 26 (Evaluation Metrics pdf)

  • Machine Learning Metrics
  • Model Selection
  • Guest speakers – Affective Computing

Class 8 – Feb 28 (Regression and Applying ML pdf)

  • Project and Reading Discussion
  • Wrapup / Summary / Discussion
  • Linear, Polynomial, Logistic Regression

Assignment 1 (Due February 11 at 11:59PM)

Reading Wired: The AI Revolution Is On
Read the linked article and come up with 1 thoughtful discussion question. Submit via email by the due date.
Mini Project scikit-learn comes with several data sets to try out. Load the iris data set and use SVMs to classify each class from the other two. That is, you’ll run 3 separate 2-class classification tasks: 1 against 2+3, 2 against 1+3, and 3 against 1+2. For each task, hold out 10% of the data from your training, which you’ll use as test data. When trained on 90% (training data), how does the classifier do on the test data? What if you only use 50%? What do you notice about the 3 classes? Submit your notebook either via email (if you’re using a local IPython installation) or by emailing the name of your notebook on the shared server. Please send the emails to Spencer and Artem.

Assignment 2 (Due February 18 at 11:59PM)

Reading –  Pedro Domingos: A Few Useful Things to Know about Machine Learning As before,  submit a discussion question for class on Wednesday with your homework. Mini Project Download the bike sharing dataset from https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset. You can use the python package Pandas to load in the csv. (pro-tip, the function is “read_csv”).

  • Train an SVC using and RBF kernel with default values, using the atemp, hum, windspeed, and cnt fields as your training data, and the season as the target classes. (see the download page for a definition of these fields). In Pandas you can refer to column “date” in a pandas dataset with data[“date”], and multiple columns with data[[“date”, “temp”]]. Also note that scikit-learn will want raw data matrices instead of pandas objects, so you can use data.values to access the raw data inside a pandas array. If you’re using the shared IPython notebook, you can load the data from “data/bikeshare/car.csv”.
  • use the sklearn function train_test_split to split your data into 80% training data, and 20% test data.
  • What accuracy does the default SVC give you?
  • What is the accuracy if you “test” using the training data?
  • What does the relationship between your test error and training error tell you about your classifier?
  • What can you try to improve the test accuracy?
  • How does a linear kernel perform? How does varying C effect the accuracy?
  • How does a Naive Bayes classifier perform? (Try the GaussianNB class in sklearn)
  • What is the worst you would expect a really bad (but not malicious) classifier to perform?
  • Explain in your own words the relationship between alpha and theta in the two representation of the perceptron algorithm (note that it’s the same with SVM).

Assignment 3 (Due February 27 at 11:59PM)

This assignment is a little more open-ended than the previous 2. Find a data set from the internet or from your own research.  See below for some ideas on where to get data. Now that you’ve had a little experience on simple classification, this is an opportunity to try some more sophisticated data pre-processing. Pick a dataset that will require some pre-processing of the features (for example, in the BCI data you might want to slice the time-series data into frames and take the FFT to use spectral features).
Use cross-validation to evaluate your classifier and generate a confusion matrix to visualize your errors. Each student will present their process and findings to the class.
Note that the due date is next Thursday, not Tuesday as the previous 2 assignments have been.
Sample Datasets:
http://www.kaggle.com/competitions
https://archive.ics.uci.edu/ml/datasets.html
https://www.bbci.de/competition/iv/
Reading: ElectriSense- Single-Point Sensing Using EMI for Electrical Event Detection and Classification in the Home
As before, please prepare 1 discussion question based on the reading for the in-class discussion.