I’m still trying to understand why so many SEOs have LSI backward and why others insists in promoting or explaining something that is not LSI as LSI. Some even repeat previous fallacies they have heard across the Web or from contaminated pools of knowlege like Wikipedia.
To top off, I have emails from SEOs so mad about being misled into error by other SEO “experts” regarding claims about what is LSI or how it works.
In LSI the singular value decomposition (SVD) algorithm must be used, yet those that claim to know LSI don’t even know how SVD works –otherwise they would realize why theirs are non sense explanations.
At this Randfish’s SEOmoz blog, one poster came with the almost standard hearsay that there is such thing as an LSI tilde operator. Fortunately, Jose Nunez, PhD (from hirank.com) quickly refuted that claim. Looking at previous posts in that thread, some posters don’t seem to grasp the concepts of LSI and co-occurrence, either.
In The LSI Myth, Mike Duz (from seo-blog.com) dissects the “LSI Tilde Operator Myth” and explains some other LSI myths promoted by SEOs. Others, like David Petar (from dpn.name), and who is working on a master thesis on LSA, spotted the same LSI-tilde claims, this time at this blog from “Clasione” (from Searchen.com). I emailed Clasione asking to rectify this since what he explains is not LSI, not even close.
There is no doubt that many SEOs have LSI backward to no end.
Some like Aaron Wall (from seobook) seem to brush things off claiming that it doesn’t matter if what SEOs perceive as LSI is or not LSI. Both Mike Duz and David Peter strongly disagree, as can be see from their reply to Aaron Wall in that thread. See also this other post of David Petar.
I have given my response to Mr. Wall in When SEOs are caught in Lies
Bill Slawski (from cre8asiteforums.com) apparently trying to be polite has claimed at Wall’s blog that nobody is harmed by such hearsays. I strongly disagree.
Bill, what makes you think that? Readers are harmed by buying a wrong concept. Prospective consumers are also harmed. Misrepresentation of products/services are questionable trade practices. You know better than this. I understand why you want and try to be polite, but please don’t lose credibility in the process.
Bill, one thing is promoting for years specific products and services as “LSI-like” and writing articles about and claiming how allegedly LSI works to later on brushing things off (for example, by saying “sorry I was mistaken LSI for this or that, but it doesn’t matter to SEOs”). Duz and Peter correctly asserted that this is typical from someone caught with the pants off. How honest is that?
Now compare that with waking up one day to realize one was wrong about how search engines, LSI, or Term Vector Models work and rectify –like Peter Nisbet (from article-services.com), Randfish, and others have done.
As for my research, Bill and Wall, I stand by everything I have stated about PageRank in the past. If you or anyone has peer reviewed research work to the contrary I would be happy to read that.
Time have demonstrated that Bob Massa was quite right and ahead of the times. The original claims that PageRank was an objective metric hard to deceive has been demonstrated to be another lie to no end and not just another “moan”. The link citation-literature citation analogy from the 2000′s a la Garfield’s Impact Factors have been long ago debunked and reduced to just another fallacy as well.
PS. For those IRs, graduate students, and search marketers interested in having advanced resources on LSA (LSI) all in one place, check Benoit Lemaire and Phippe Dessus resource page, Readings in Latent Semantic Analysis for Cognitive Science and Education. The entries lead you to another vast amount of resources.
I received email from Bill Slawski wherein he clarifies to me what he meant. I must admit that I misunderstood to a certain point what he meant. I have no problem apologizing to you, Bill.
For the rest above, I stand by every single letter. Those are not accusations, those are the facts.
The most depressing thing about the searchen link is that I found it on the wikipedia page for LSA, i promptly edited it out, not sure how it managed to get on there (actually i’m pretty sure i know how, but no sense in pointing fingers).
Good catch triplah!
From http://www.marketingpilgrim.com/2007/06/smx-notes-you-a-with-matt-cutts.html, this regarding an SMX conference:
Matt Cutts sat down with Danny Sullivan and the SMX Advanced attendees for a Q&A session. At one point the following surfaced:
“Q: What’s your progress on LSI and should you theme a website or will it dilute your rank?”
“A: We neither confirm nor deny that we use this. You try it and see.”
“Also: Google does a lot of work behind the scenes to do good semantic matching. We know bio = biography, but apple doesn’t = apples. The ~ is for synonym search. We try to do it “under the hood” to bring better results.”
While the first part is not an answer (just a yes-no), the second states how Google uses the tilde operator.
As we stated long ago the tilde operator is a synonym search thing designed to find similar or related terms.
If I have a search engine I can use the string (or any string) to invoke a lookup list of terms and then to these map documents. I can make the entire process transparent to end-users. I can also provide a visual representation of this using nodes and arcs (a graph). I can even use this to expand in the background an answer set and incorrectly call that “concept expansion” or “topic expansion”
This does not mean that I’m using LSI.
In fact, my approach is far different from LSI.
In LSI there are no query operators, neither is a synonym search technique.
It appears SEOs are still quoting early and outdated LSI papers wherein the role of co-occurrence was not clearly understood.
Let’s hope that one day SEOs learn how SVD works so they can realize the limitations of LSI and how it is used. Maybe my hope is just a dream.
Pingback: Subsumptions vs Synonyms - Conceptual Indexing Revisited « IR Thoughts
Pingback: Finding Topic-Specific Posts « IR Thoughts