Collaborative Filtering - a simple overview
Oct 19th 2011 (updated Apr 3rd 2023)
The Collaborative filtering algorithm, is a recommendation mechanism which uses existing data to make predictions.
Introduction
The most typical application is the recommendation of products on ecommerce web sites. In these cases you are given a recommendation on what you may want to also buy, based on what people who bought what you did also bought.
The plain English explanation of the algorithm is as follows: Find a group of people who bought the same things that you did, and see what they bought that you didn't.
The application of collaborative filtering can be applied to more than just ecommerce, where we consider the people as 'objects', and the purchases as any attribute (which can be represented numerically).
The Algorithm
Collaborative Filtering consists of three key stages:
Representation
The first stage of collaborative filtering considers how the 'real world' data is represented and turned into a usable format. This involves taking information from a data source (files, relational database) and creating a representation (which may be an interface rather than an actual conversion) of the data in the form of a table (or multidimensional array).
Neighbourhood Formation
Neighbourhood formation consists of finding the top most similar other objects to the given object. In order to do this we need some measure of similarity, which can really be any means of finding out how related (and therefore similar) two data sets are to each other.
Regardless of what means of calculating similarity is used (all with their own statistical and mathematical pros and cons), this stage results in a smaller data set consisting of the most similar objects to the given object.
Recommendation Generation
This stage involves assessing the group of most similar objects (the neighbourhood), and looking at the most popular attributes (the object data sets) the neighbourhood has that the given object doesn't. These selected attributes become the recommended items.
The best report I have found for explaining Collaborative Filtering with sufficient depth