I’m putting together a piece on several local term weight models. It should be ready in few weeks.
It is a research paper that can be used as a tutorial. It describes a systematic approach for the derivation of any kind of local term weighting model. Students can use it as a recipe for proposing their own candidate models.
The article touches on some aspects of the problem of trusting models that lack of attenuation. Here is one snippet on the subject:
<last nail in KD coffin style=”intensity:100%;”>
“It should be stressed that term repetition not necessarily satisfies users’ queries nor is evidence of:
Pertinence (P); e.g., that a term repeated x times is x times more pertinent to the document.
Aboutness (A); e.g., that the document is x times more about the term.
Importance (I); i.e., that there is a term-document relationship of pertinence and aboutness.
Relevance (R);i..e., that a document repeating a term x times is x times more relevant.
Accordingly, fulfilling such ‘PAIR criteria’ on a regular basis is hard to accomplish with any model that lacks of attenuation.”
</last nail in KD coffin>