Tags

, , , , , ,

Just a reminder on how we can model good keywords:

Poisson Mixtures: Poisson mixtures fit the data better than standard Poissons

ftp://ftp.cis.upenn.edu/pub/datamining/public_html/ReadingGroup/papers/church-poisson.pdf

Inverse Document Frequency: A Measure of Deviation from Poisson: Low frequency words tend to be rich in content, and vice versa. But not all equally frequent words are equally meaningful.

https://www.aclweb.org/anthology/W/W95/W95-0110.pdf