On linear classifiers

Linear Classifier

This is a step up from the problem of image classification and acts as a precursor to neural networks.

Let's define:

  • Score function: Maps raw data to class scores
  • Loss function: Quantifies how well predicted scores match ground-truth labels

This is the core of supervised learning: we will later minimize the loss with respect to the parameters of the score function.

Score Function


If you remember from school, the linear function is:

f(x)=ax+bf(x)=ax+b

where

  • aa controls the slope
  • bb controls the intercept

This simple linear equation is the building block. In image classification, instead of predicting one value, we extend it to predict one score per class simultaneously. Therefore, instead of aa being a scalar, it becomes a matrix WW, and bb becomes a vector:

f(xi,W,b)=Wxi+bf(x_i, W, b) = W x_i + b

where:

  • xiRDx_i \in \mathbb{R}^D are flattened training images, i.e. xix_i is a floating point vector shape of [D,1]. D features per image (width × height × channels)
  • yi1Ky_i \in {1 \dots K} are the labels.
  • There are NN total images and KK total classes
  • Each row of WW can be interpreted as a detector for one class.

The score function maps pixels to raw class scores: f:RDRKf: \mathbb{R}^D \mapsto \mathbb{R}^K.


The function score variable dimensions are:

VariableShapeMeaning
xix_i[D,1]Flattened image
WW[K,D]One row per class classifier
bb[K,1]Class bias
f(xi)f(x_i)[K,1]Class scores

Tracking dimensions is extremely useful when implementing models.


Class prediction is obtained by

y^=argmaxkfk(xi)\hat{y} = \arg\max_k f_k(x_i)

It means:

  • f(xi)f(x_i) produces a vector of class scores [K,1].
  • fk(xi)f_k(x_i) is the score for class k (k-th element of the score vector)
  • argmax\arg\max returns the argument (index) of maximum value.
  • That index corresponds to the predicted class
  • In this notation, ^\hat{} on y^\hat{y} means predicted

If you are uncomfortable with the math notation, I encourage you to go over it again and familiarise with the jargon here, this will become increasingly important. Rest assured I will make sure to introduce new notations as we go.

← Back to blog