Dictionary defense definition lovers, here is one for you: misnomer.

According to Dictionary.com, a misnomer is:

1. a misapplied or inappropriate name or designation.

2. an error in naming a person or thing.

i.e., an inappropriate designation.

Once a while, misnomers are found in IR. Here are at least two:

1. Binary Independence Model

2. Latent Semantic Indexing (LSI)

Here is a quick overview

**1. Binary Independence IR Model**

Back in 1991 Cooper wrote “Some Inconsistencies and Misnomers in Probabilitistic Information Retrieval”

His article’s abstract states:

“The probabilistic theory of information retrieval involves

the construction of mathematical models based

on statistical assumptions of various sorts. One of the

hazards inherent in this kind of theory construction

is that the assumptions laid down may be inconsistent

with the data to which they are applied. Another

hazard is that the stated assumptions may not

be the real assumptions on which the derived modelling

equations or resulting experiments are actually

based. Both kinds of error have been made repeatedly

in research on probabilistic information retrieval.

One consequence of these lapses is that the statistical

character of certain probabilistic IR models, including

the so-called ‘binary independence’ model, has been

seriously misapprehended.”

In that article Cooper wrote:

“Since one can derive all possible assertions from an inconsistent

theory, such a theory must be meaningless –

entirely lacking in significance or predictive power. It

makes no sense that good experimental results could

come out of an inconsistent theory.”

“It is tempting to explain this conundrum by suggesting

that the inconsistencies in question were only minor

ones. However, there is no such thing as a theory

that is just ‘a little bit’ inconsistent. A theory cannot

be just a little bit inconsistent, any more than the

scientist proposing it can be just a little bit pregnant.

A logical inconsistency, if its implications are followed

out, destroys a theory utterly. It is a disaster, and is

totally unacceptable. If rationality is to be preserved,

inconsistencies simply cannot be tolerated.”

**2. Latent Semantic Indexing**

Contrary to SEO myths, LSI is not an “indexing” model, but an SVD matrix decomposition technique. Despite of what has been written in the early literature on LSI, today we know that LSI does not assesses semantics, but gets its power from high order co-occurrence relationships. Furthermore, one can grasp latent (i.e., hidden) structures present in a system by using diverse techniques other than LSI. In the early papers on LSI, structured collections of journal articles and abstracts about specific topics -medical, science, and computers- were used. These were rich in synonyms. Evidently, the latent semantic structures identified these by forming clusters of synonyns and related terms.

Soon after, SEOs that misread those papers developed a synonymity myth around LSI. The fact is that in LSI we can arrive to the so-called “LSI latent clusters” regardless if terms are synonyms or not. This myth is easy to debunk. Once the clusters have been obtained, arbitrarily replace one term from the term-doc matrix that is known to belong to a cluster by a new arbitrary term not present in the collection, keeping the weights intact. Run the SVD algorithm again. You should arrive at the same latent structure, but the new term might not be semantically related at all to terms in the cluster since the algorithm only understands and processes numbers, not semantics. Thus, it processes *a la* garbage in-garbage out. This is a great topic for another ‘SEO Myth Debunking’ IRW article.

For those SEOs, marketers, and spammers that claim to know what is LSI, like Aaron Wall and the likes, read: http://irthoughts.wordpress.com/2007/05/01/irwatch-may-issue-demystifying-lsi/

**References**

Cooper, W. S. (1991). Some inconsistencies and misnomers in probabilistic information retrieval. In A. Bookstein, Y. Chiaramella, G. Salton, & V. V. Raghavan (Eds.), Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (ACM, SIGIR ’91) (pp 57-61). Chicago, Illinois: ACM.

Mike Grehan

Lies, Lies, and LSI by Mike Grehan

http://www.clickz.com/showPage.html?page=3623571

Bill Slawski

Personalization Through Tracking Triplets of Users, Queries, and Web Pages

http://www.seobythesea.com/?p=535

Rand Fishkin

InfoSearch Media & ContentLogic – Purveyors of Falsehoods

http://www.seomoz.org/blog/infosearch-media-contentlogic-purveyors-of-falsehoods

Lee Odden

5 Myths about SEO

http://www.toprankblog.com/2006/12/5-myths-about-seo/

Marios Alexandrou

The History of Latent Semantic Indexing

http://www.searchgrit.com/history-of-latent-semantic-indexing/

Carson

Web content and LSI mega-rant. Part Two…

http://contentdonebetter.com/2007/03/30/web-content-and-lsi-mega-rant-part-two/

http://irthoughts.wordpress.com/2007/12/11/perpetuating-lsi-misconceptions/

http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comments

http://irthoughts.wordpress.com/2008/07/14/claps-and-slaps/

http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/

http://irthoughts.wordpress.com/2007/07/19/seos-and-still-their-lsi-misconceptions/

http://irthoughts.wordpress.com/2007/05/03/latest-seo-incoherences-lsi/