I’m putting together a piece on several local term weight models. It should be ready in few weeks.

It is a research paper that can be used as a tutorial. It describes a systematic approach for the derivation of any kind of local term weighting model. Students can use it as a recipe for proposing their own candidate models.

The article touches on some aspects of the problem of trusting models that lack of attenuation. Here is one snippet on the subject:

“It should be stressed that term repetition not necessarily satisfies users’ queries nor is evidence of:

 Pertinence (P); e.g., that a term repeated x times is x times more pertinent to the document.

Aboutness (A); e.g., that the document is x times more about the term.

Importance (I); i.e., that there is a term-document relationship of pertinence and aboutness.

Relevance (R);i..e., that a document repeating a term x times is x times more relevant.

Accordingly, fulfilling such ‘PAIR criteria’ on a regular basis is hard to accomplish with any model that lacks of attenuation.”

