Sneak Preview of IRWatch: Understanding IDF

idf

“IDF is simply neither a pure heuristic, nor the
theoretical mystery many have made it out to be.
We have a pretty good idea why it works as well
as it does.” –Stephen E. Robertson

Here is a sneak preview of IR Watch for the month of June, 2008. It should be in subscribers inbox during the day or at the latest tomorrow. 

It is discussed within the context of co-occurrence theory and term independence/dependence assumptions. Issues and misconceptions related with this measure are addressed. Initially we made plans for including current ongoing work we are conducting on specificity measures, but we have chosen not to since is not the appropriate forum. 

IRW-2008-06: Understanding Inverse Document Frequency (IDF)

In this issue:

Introduction
Robertson-Sparck Jones Early Work on IDF
What IDF Is Not
What IDF Really Is
On Terms Independence
On Terms Dependence
Few Examples
Estimating the IDF of a Phrase
Conclusion
References
News, Research, and Events
Terms of Use and Copyright

Leave a comment