In a Google patent article on user similarities (https://www.google.com/patents/US8458195) my old tutorial on cosine similarity is cited. If you try to follow that link, you won’t be able to access it as I removed it long ago, along with all of my IR tutorials. These changes were part of the relaunching of http://www.miislita.com as a miner.
I have seen many web pages citing that tutorial, or reproducing that one and many more from the early 2000s. Those attempts are convincing me that I should restore them, perhaps in the tutorials section of Minerazzi?
Perhaps. In the meantime, check this little one http://www.minerazzi.com/tutorials/cosine-similarity-tutorial.pdf where the connection between cosine similarity and Pearson’s Correlation Coefficient (r) is demonstrated.
Essentially, Pearson’s r is a cosine; i.e., the cosine between mean-centered paired variables. As a cosine, Pearson’s r is not additive, nor it can be arithmetically averaged, as many SEOs still wrongly think.