“LSI-Friendly” Documents: No Such Thing

Indeed, this was the topic of a post I made at this Cre8asiteForums thread

Quoting myself in part:

“When LSI is applied to a term-document matrix representing a collection of documents in the zillions, the co-occurrence phenomenon that affects the LSI scores becomes a global effect, occuring between documents in the collection.

Thus, the only way that end users (e.g. SEOs) would influence the LSI scores is if they can access and control the content of all the documents of the matrix or launch a coordinated spam attack to the entire collection. The later would be the case of a spammer trying to make an LSI-based search engine to index billion of documents (to say a quantity) he/she has created.

If an end user or research want to understand and manipulate the effect of co-occurrence in a single document, he/she would need to deconstruct a single document and make a term-passage matrix for that single document and to this apply LSI –then play by manipulating single terms. Whatever the results these will only be valid for that universe represented by the matrix, that is for that and only that document.

If such document is then submitted to the LSI-based search engine that local effect simply vanishes and global co-occurrence “takes control” and spreads throughout the collection, forming the corresponding connectivity paths that eventually forces a redistribution of term weights.

Consequently, SEOs that sell this idea of making documents “LSI-friendly” like some firms sending emails reading “is your site LSI optimized?”, “we can make your documents LSI-valid!” or those that promote the notion of “LSI and Link Popularity” end exposed for what they are and for how much they know about search engines. The sad thing is that these find their way via search engine conferences (SES), blogs and forums to deceive the industry with such blogonomies”

This is a legacy post originally published in 10/20/2006

Leave a comment