On nearest neighbor

This is part #1 of my series of notes on CS231n. The course material is open: video lectures and assignments. If you plan on going over it, I highly reccomend checking the schedule, it helps you pace out lectures with assignments, which you should definetely complete if you want to learn. Lastly, you will be better off completing cs231n course than going over my notes.

The course starts with defining the problem set of classifing images. It starts with NN, then moves to linear classifer, then to a Fully connected NN and later on CNNs. By more prominent CV problems are introduced. Here NN is introduced as an image classifier approach. NN is much more useful and generalizes to many other problem sets. Despite not being used at all as an image classifier, NN is very useful to understand. It powers many use-cases you use daily, as its the way to compare high dimensional vectors that might encode semantic meaning.

Image Classification

Is the task of assigning a label from a set to an input image image. From a CV alogrithm prespective, the challanges are:

  • Viewpoint variation: the object orientation respect to the camera variation.
  • Scale variation: Size of the object is another factor.
  • Deformation: Non rigid objectes can be deformed.
  • Occlusion: Sometimes only a small portion of an object is visible
  • Illumination conditions
  • Background clutter: Objects of interest may blend into the environment, making it hard to identify.
  • Intra-class variation: e.g. think of how many different appearances of chairs exist.

Nearest Neighbor Classifier

The nearest neighbor classifier will take a test image, compare it to every single one of the training images, and predict the label of the closest training image. This is not used for image classification in practice but its a good stareting point to introduce few distance metrics.

The way to compare two images using nearest neighbor is to compute the pixels distance. We treat the image as a vector (i.e. a point in multi dimensional space) and then use distance metrics such as:

  • L1 distance. Also known as Manhattan distance because it represents the distance to travel in a grid-like city layout like Manhattan, where you can only follow horizontal and vertical movements.

    d1(I1,I2)=pI1pI2pd_1 (I_1, I_2) = \sum_{p} \left| I^p_1 - I^p_2 \right|

  • L2 distance. Also known as Euclidian distance.

    d2(I1,I2)=p(I1pI2p)2d_2 (I_1, I_2) = \sqrt{\sum_{p} \left( I^p_1 - I^p_2 \right)^2}

    L2 is much more unforgiving to large discrepancies as distance is squared.

This generalizes to k-Nearest Neighbor, where the label is determined by a majority vote of the k nearest neighbors, instead of the closest neighbor (k=1).

To select the best distance metric and the best k choices is called selecting hyperparameters, and this will be your bread and butter from now on.

In ML, the dataset is split into training set and validation set ( or test set). The golden rule is to never use your test set for hyperparameter tuning. If you train on the test set you overfit, and you have no clue how the model generalizes to unseen data, by splitting training and test you can use the test set as unseen data to get a proxy of how well your model might generalize. If you don't have much data to work with you can use cross-validation to tune hyperparameters.


Potentially you have realised that kNN as described has several issues if say, we want to compare to internet scale images. It's important to be aware that Aproximate nearest neighbor (ANN), where you trade recall with space/time complexity during retrival, exists and is an active area of research. FLANN, kdtree indexing, k-means clustering...


← Back to blog