For years many SEOs fooled their own peers with the assertion that LSI was something new that Google implemented. Some even have claimed LSI was a proprietary algorithm from Google. I’ve spent sooooo many years debunking all this crap and few other urban legends from unscrupulous SEOs.
In this Thanksgiving Day I thank that all these myths have been debunked to no end: LSI-rank correlations, LDA-rank correlations, KD-rank correlations, additiveness of correlation coefficients, blah, blah, blah… I thank also that along came this:
http://infolab.stanford.edu/~sergey/349/
LSI?
Known from the onset by Google.
A cost effective implementation in a large scale and dynamic environment as the Web is?
Nope.
Or, really, was there such a myth????
Another popular myth is that PageRank is a strong relevance factor! Fortunately, the recent experiments with ClueWeb09 collection showed, it is not true. On the other, hand, a spam ranking is the next strongest factor after BM25 (or BM25 analog such divergence from independence score).
Hi, Leo
Thank you for stopping by.
You would be surprise about how many myths, urban legends, crap are spreaded within the SEO community.
Thank you for the heads up on ClueWeb09 regarding this other PageRank myth. I checked their wiki site. Good stuff and input from your end, as usual.
Have a nice weekend.
Actually Leo, one of the biggest fallacies promoted in the early 2000 by SEOs -and unfortunately including some IR scholars with vested interest in seeing Google succeed- was the Link Citation-Literature Citation Analogy a la Gardfield’s Impact Factors. I covered this few years ago at the end of this tutorial on linear algebra:
http://www.miislita.com/information-retrieval-tutorial/matrix-tutorial-3-eigenvalues-eigenvectors.html
Here is a direct quote:
“Literature citation and Impact Factors are driven by editorial policies and peer reviews. On the Web anyone can add/remove/exchange links at any time for any reason whatever. Anyone can buy/sell/trade links for any sort of vested interest or overwrite links at will. In such noisy environment, far from the controlled conditions observed in a computer lab, peer review and citation policies are almost absent or at best contaminated by commercialization. Evidently under such circumstances the link citation-literature citation analogy or the notion that a link is a vote of citation importance for the content of a document cannot be sustained.”
I fought against that nonsense early in the 2000′s. It was later realized all that was a good sell for both vested IRs, Stanford University, and SEOs.
You are right: what worked for a controlled set, stopped working in a noise environment. I would rather say that a number of queries where PageRank substantially improves the search quality is small. It is also possible that a modified version of a PageRank (such as a TrustRank) can do a better job.