On Cosine Similarity

Cosine similarity is commonly used in data mining and information retrieval as a measure of the resemblance between data sets; i.e. how similar or alike these are. It is an important concept used in Vector Space Theory and affine models.

While there are many tools and tutorials on the subject out there, quite often what is missed from these is a clear explanation of the underlying meaning and nature of the variables involved.

Did you know that centering data sets by subtracting the corresponding variable means can and will impact the angle between them, and therefore, the corresponding cosine similarity? Did you know that said change can be used to assess whether the variables are orthogonal, uncorrelated, or both/neither? Do you know what a cosine similarity of zero actually mean?

All these and similar questions are addressed with our cosine similarity tool and companion tutorial. Access them now at



To use the tool simply enter two data sets and select how these are delimited. Then check whether you want to compute their cosine similarity by using them as given (raw mode) or by subtracting their mean (centered mode). To interpret the results from either mode, read the companion tutorial.