Dictionary defense definition lovers, here is one for you: misnomer.

According to Dictionary.com, a misnomer is:

1. a misapplied or inappropriate name or designation.
2. an error in naming a person or thing.

i.e., an inappropriate designation.

Once a while, misnomers are found in IR. Here are at least two:

1. Binary Independence Model
2. Latent Semantic Indexing (LSI)

Here is a quick overview

1. Binary Independence IR Model

Back in 1991 Cooper wrote “Some Inconsistencies and Misnomers in Probabilitistic Information Retrieval”

His article’s abstract states:

“The probabilistic theory of information retrieval involves
the construction of mathematical models based
on statistical assumptions of various sorts. One of the
hazards inherent in this kind of theory construction
is that the assumptions laid down may be inconsistent
with the data to which they are applied. Another
hazard is that the stated assumptions may not
be the real assumptions on which the derived modelling
equations or resulting experiments are actually
based. Both kinds of error have been made repeatedly
in research on probabilistic information retrieval.
One consequence of these lapses is that the statistical
character of certain probabilistic IR models, including
the so-called ‘binary independence’ model, has been
seriously misapprehended.”

In that article Cooper wrote:

“Since one can derive all possible assertions from an inconsistent
theory, such a theory must be meaningless –
entirely lacking in significance or predictive power. It
makes no sense that good experimental results could
come out of an inconsistent theory.”

“It is tempting to explain this conundrum by suggesting
that the inconsistencies in question were only minor
ones. However, there is no such thing as a theory
that is just ‘a little bit’ inconsistent. A theory cannot
be just a little bit inconsistent, any more than the
scientist proposing it can be just a little bit pregnant.
A logical inconsistency, if its implications are followed
out, destroys a theory utterly. It is a disaster, and is
totally unacceptable. If rationality is to be preserved,
inconsistencies simply cannot be tolerated.”

2. Latent Semantic Indexing

Contrary to SEO myths, LSI is not an “indexing” model, but an SVD matrix decomposition technique. Despite of what has been written in the early literature on LSI, today we know that LSI does not assesses semantics, but gets its power from high order co-occurrence relationships. Furthermore, one can grasp latent (i.e., hidden) structures present in a system by using diverse techniques other than LSI. In the early papers on LSI, structured collections of journal articles and abstracts about specific topics -medical, science, and computers-  were used. These were rich in synonyms. Evidently, the latent semantic structures identified these by forming clusters of synonyns and related terms.

Soon after, SEOs that misread those papers developed a synonymity myth around LSI. The fact is that in LSI we can arrive to the so-called “LSI latent clusters” regardless if terms are synonyms or not. This myth is easy to debunk. Once the clusters have been obtained, arbitrarily replace one term from the term-doc matrix that is known to belong to a cluster by a new arbitrary term not present in the collection, keeping the weights intact. Run the SVD algorithm again. You should arrive at the same latent structure, but the new term might not be semantically related at all to terms in the cluster since the algorithm only understands and processes numbers, not semantics. Thus, it processes a la garbage in-garbage out. This is a great topic for another ‘SEO Myth Debunking’ IRW article.

For those SEOs, marketers, and spammers that claim to know what is LSI, like Aaron Wall and the likes, read: http://irthoughts.wordpress.com/2007/05/01/irwatch-may-issue-demystifying-lsi/


Cooper, W. S. (1991). Some inconsistencies and misnomers in probabilistic information retrieval. In A. Bookstein, Y. Chiaramella, G. Salton, & V. V. Raghavan (Eds.), Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (ACM, SIGIR ’91) (pp 57-61). Chicago, Illinois: ACM.

Mike Grehan
Lies, Lies, and LSI by Mike Grehan

Bill Slawski
Personalization Through Tracking Triplets of Users, Queries, and Web Pages

Rand Fishkin
InfoSearch Media & ContentLogic – Purveyors of Falsehoods

Lee Odden
5 Myths about SEO

Marios Alexandrou
The History of Latent Semantic Indexing

Web content and LSI mega-rant. Part Two…







About these ads