Recitation 3
Homework Tips
- Be familiar with the What Can I Ask On Diderot? policy
- Talk to other students taking the course – they can help you and you can help them.
- Feel free to meet other students during the Collaboration Space – every Tuesday in GHC 4303.
- Look for the “Common Problems in Homework x” post on Diderot before asking questions online.
TA Hours
- Come this week! Don’t wait until the last week.
- Construct a minimal counter-example : a simplest test case that fails.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
Linear Classification
Usually we talk about linear regression: inferring a continuous value as a linear function of input variables. In this homework, we’re using linear regression to predict a category variable. Let’s see what this looks like on a simple test dataset:
n=50
data = np.concatenate((np.random.normal(size=(n, 2)), np.random.normal(size=(n, 2)) + 2.5))
cls = np.array([-1]*n + [1]*n)
plt.scatter(data[cls==-1,0], data[cls==-1,1])
plt.scatter(data[cls== 1,0], data[cls== 1,1])
We fit a simple linear regression model to this data and examine the output. Notice that the output is continuous, not discrete.
model = LinearRegression()
model.fit(data, cls)
pred = model.predict(data)
pred[:5]
Let’s draw the scatter data and highlight misclassified points:
plt.scatter(data[pred<0,0], data[pred<0,1])
plt.scatter(data[pred>0,0], data[pred>0,1])
# Plot misclassified points:
errs = ((cls==-1) & (pred>0)) | ((cls==1) & (pred<0))
plt.scatter(data[errs,0], data[errs,1])
Lets visualize the classifier using regularly-spaced points:
xx, yy = np.meshgrid(np.linspace(-3, 6, 101), np.linspace(-3, 6, 101))
grid = np.vstack([xx.ravel(), yy.ravel()]).T
ccls = model.predict(grid)
plt.scatter(grid[ccls<0,0], grid[ccls<0,1])
plt.scatter(grid[ccls>0,0], grid[ccls>0,1])
# Post the original data division as well:
plt.scatter(data[cls==-1,0], data[cls==-1,1])
plt.scatter(data[cls== 1,0], data[cls== 1,1])
Notice that the class boundary is a line; this is characteristic of a linear classifier. We cannot separate classes like these using a linear classifier:
plt.subplot(1, 2, 1)
plt.scatter([-1, 1], [0, 0])
plt.scatter([0, 0], [-1, 1])
plt.subplot(1, 2, 2)
plt.yticks([])
plt.scatter([-1, 1], [-1, 1])
plt.scatter([0], [0])
F1 vs Accuracy
In hw3_text
, we introduce the $F_1$-score to measure the performance of a binary classifier. We define this in terms of the number of true and false positives and negatives in our classifier.
Predicted | Actual | |
---|---|---|
true positive | T | T |
false positive | T | F |
false negative | F | T |
true negative | F | F |
To calculate the $F_1$ score, we can calculate: \(\begin{align*} \text{precision} & \gets \frac{\text{true positive}}{\text{true positive} + \text{true negative}} \\ \text{recall} & \gets \frac{\text{true positive}}{\text{true positive} + \text{false positive}} \\ F_1\text{ score} & \gets 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision} + \text{recall}} \end{align*}\)
Why?
We care about this because it works when you want to measure classifier performance on rare events.
For example, lets look at a pair of ficticious medical test in a population where 1% of people have some disorder.
When Zico’s Magic Classifier (ZMC) and the much simpler Always False Classifier (AFC) are run on 1000 people, they both obtain 99% accuracy. Here are the confusion matrices for both:
Always False Classifier | Actual True | Actual False |
---|---|---|
Predicted True | 0 | 0 |
Predicted False | 10 | 990 |
Zico’s Magic Classifier | Actual True | Actual False |
---|---|---|
Predicted True | 10 | 10 |
Predicted False | 0 | 980 |
The accuracy for both classifiers is 99%, despite AFC not depending on the data (and consequently being useless). In this example, accuracy does not tell us which classifier is better.
Let’s calculate the $F_1$ score for both classifiers:
Precision | Recall | $F_1$ score | |
---|---|---|---|
ATC | $\frac{0}{0}$ | $\frac{0}{10}$ | $0$ or NaN |
ZMC | $\frac{10}{20}$ | $\frac{10}{10}$ | $\frac{2}{3}$ |
The $F_1$ score is good when you want measure your classifier’s performance on catching rare events.