On linear classifiers
Linear Classifier
This is a step up from the problem of image classification and acts as a precursor to neural networks.
Let's define:
- Score function: Maps raw data to class scores
- Loss function: Quantifies how well predicted scores match ground-truth labels
This is the core of supervised learning: we will later minimize the loss with respect to the parameters of the score function.
Score Function
If you remember from school, the linear function is:
where
- controls the slope
- controls the intercept
This simple linear equation is the building block. In image classification, instead of predicting one value, we extend it to predict one score per class simultaneously. Therefore, instead of being a scalar, it becomes a matrix , and becomes a vector:
where:
- are flattened training images, i.e. is a floating point vector shape of [D,1]. D features per image (width × height × channels)
- are the labels.
- There are total images and total classes
- Each row of can be interpreted as a detector for one class.
The score function maps pixels to raw class scores: .
The function score variable dimensions are:
| Variable | Shape | Meaning |
|---|---|---|
| [D,1] | Flattened image | |
| [K,D] | One row per class classifier | |
| [K,1] | Class bias | |
| [K,1] | Class scores |
Tracking dimensions is extremely useful when implementing models.
Class prediction is obtained by
It means:
- produces a vector of class scores [K,1].
- is the score for class k (k-th element of the score vector)
- returns the argument (index) of maximum value.
- That index corresponds to the predicted class
- In this notation, on means predicted
If you are uncomfortable with the math notation, I encourage you to go over it again and familiarise with the jargon here, this will become increasingly important. Rest assured I will make sure to introduce new notations as we go.