Now that the semester is over we can take on other projects. After a little break from the blog, it is good to be back. We are putting the final touches to this month issue of IR Watch - The Newsletter. During the break dozen of new subscribers signed.
The piece takes on several IDF myths and misconceptions promoted by SEOs and on what IDF is/is not. Here is an excerpt:
One recurrent misconception found across online media channels (search marketing blogs, forums, etc) is the assertion that IDF can be used to assess how important or relevant a term might be to the content of a document. This claim has no basis.
It should be stressed that as a measure of term specificity over N, IDF is not a local, but a global measure. IDF evaluates the discriminating power of a term within a collection of documents. A term ti might be relevant or important to the content of a document. However, if this document is part of a collection wherein all documents repeat ti, the term loses its discriminating power since N = ni and IDFi = log(N/ni) = 0.
Somehow, these marketers are mistaking IDF for the RSJ model or who knows what to possibly, as is often the case, promote themselves or whatever they sell.
June 18, 2008 at 12:52 pm
An example of such non sense is given in http://alwaysbetesting.com/abtest/index.cfm/2008/5/24/Term-Frequency-Inverse-Document-Frequency-TDIDF-Exploring-TheRarestWordscom
July 3, 2008 at 9:24 am
[...] and their IDF Myths: Part 2 This post is a continuation of a previous one on the topic of SEO non sense in relation with inverse document frequency (IDF). IDF has a long [...]