In A Complete Glossary of Essential SEO Jargon an SEOMOZ poster defines LSI as follows:
“LSI(Latent Semantic Indexing) This mouthful just means that the search engines index commonly associated groups of words in a document. SEOs refer to these same groups of words as “Long Tail”. The majority of searches consist of three or more words strung together. See also “long tail”. The significance is that it might be almost impossible to rank well for “mortgage”, but fairly easy to rank for “second mortgage to finance monster truck team”
I have been asked to comment about this.
To put the post in perspective, a jargon glossary is like a collection of expressions used within a specific group of individuals with similar interests. Normally jargon is not intended for outsiders.
Overall, the post is a nice coffee table reading. The title states this is a complete glossary of essential SEO jargon. However, it can be argued whether the glossary is complete or if some entries of the glossary are indeed essential to SEOs.
Within SEO circles, jargon connected to search engine technology often comes with two elements:
To the poster’s credit, not all entries of the glossary have (a), (b), or both, but are actually informative. Like some of the comments these generate, some are entertaining.
Unfortunately the LSI entry comes with both, (a) and (b). Last time I revisited the post the LSI entry was ignored by commenters. I could have posted these comments there and add content to their blog, but I decided at the last minute to add content to this blog, instead.
Now let’s comment on the sustantive part.
Firstly, two different concepts are almost concatenated by the poster: LSI and the so-called “long tail”. The former is based on SVD, and the later is an expression that describes a distribution. Research on long tail-shaped distributions are found in Mandelbrot’s early work from the 50’s and 60’s, and even before Mandelbrot. Page 84 of James Gleick’s best-seller, Chaos (1987) also mentions a long tail distribution Mandelbrot came across.
Secondly, LSI is not exactly document indexing as some may loosely imply by reading the LSI entry and as many SEOs have claimed in the past. LSI is applied to already indexed documents from which terms have been extracted and already scored with a particular term weight model. Thus before applying LSI, terms and docs are identified and indexed. Now using LSI to cluster terms and documents and then reclassifying these is a different thing. Sometimes this is called reindexing and loosely referred to as “indexing” by few folks.
The initial statement of the LSI entry is simply sloppy, a hearsay, and made out of thin air: “LSI(Latent Semantic Indexing) This mouthful just means that the search engines index commonly associated groups of words in a document”.
The other problem with this statement is the informational service it provides to the casual reader, who might believe and repeat such notion of LSI across the Web. Besides, LSI is not essential to SEOs.