Collaborative filtering for product recommendations tends to focus on user-item recommendations – given a user, try to find the products they might like based on the products they have already interacted with. While personalisation like this is often preferable, sometimes we would just like to know what products to recommend if a user lands on a given product page. There is far less written about this in the literature, so I would like to share some thoughts (and maths).

Assume we have users products, we can build a ratings matrix where is the rating given by user to product . This can either be made up of ratings entered explicitly by a user, or inferred implicitly from the way they interact with the product. This matrix will generally be large and sparse, since most users will only interact with a small proportion of the products available.

A standard approach to fill in unknown data in the matrix is to use matrix factorisation. Briefly, this assumes that there are only a small number of latent factors for users and products, that we can use to infer an unknown rating using a linear multiplication. That is we assume

where are and matrices (resp.), for some small rank . This allows us to express latent vectors for each product, given by the columns of matrix . It also allows us to estimate unknown ratings between products and users that have not been rated. This is a standard technique, and for large sparse , the factorisation can be approximated done using the alternating least squares algorithm, e.g. in Spark.

If our aim is item-item recommendations, another standard approach is to put the items in a latent space, and measure the cosine distance between them. So if we have vectors , , normalised to have norm 1, then we take

.

So a natural thing to consider is just taking the cosine distance between our columns of (this is what was suggested in this Stackoverflow answer). This seems problematic, for example is invariant under scaling column of by , and scaling the corresponding row of by , but clearly the dot products between the columns of will generally change if we do this. Really we would like to take the dot product between the columns of . Unfortunately, this matrix can be very large and dense – imagine if we had millions of products, and millions of users. Even if we are doing this in a distributed computation, calculating even a subset of the dot products between them will be expensive.

Since and have low rank, we might hope that we can find some matrix with the same dimensions of that has the same pairwise dot products between the columns as . Fortunately, this can be done, with a little linear algebra.

Suppose we want to take the dot product between column and column of . This is

is an matrix, so it would be awesome if we could find an matrix such that , since then the dot product between column and column of is the dot product between column and column of , which is a much smaller calculation. We now turn to the singular value decomposition of . Assuming that has rank , we can write as where is a orthogonal matrix of eigenvalues of , is a orthogonal matrix of eigenvalues of , and is the diagonal matrix (not square, but having zeros off the leading diagonal), whose entries are the square root of the eigenvalues of or . Exercise for the reader : show that is a real matrix, i.e. that for any , has real positive eigenvalues. N.B. this decomposition has an implementation in Spark.

Then note that

where . We aren’t quite there, because is a matrix, but note that because the lower rows of are all zero, if we take to be the upper square matrix of , then if we take (an matrix), then it’s easy to check that , and we are done. So to get our item-item similarities we can look at the dot products between columns in the matrix .

Thanks to Jeremy Mitchell, a colleague at Qubit for an enlightening discussion on the content of this post.