Answers to the IR Quiz are given below:
Term Independence Assumption:
If k1 and k2 are statistically independent they should occur by chance, co-occurring in only
(100)(200)/500 = 40 documents.
Thus, if they occur by chance, the number of documents mentioning the k1 k2 sequence should be unknown, but certainly no greater than 40.
Term Dependence Assumption:
If terms actually co-occur in 70 documents, they are co-occuring more often than by chance (70 > 40). So, terms are statistically dependent and positively correlated. It is a given that the k1 k2 terms sequence is present in 25 out of the 70 documents wherein terms co-occur.
Results are given below, rounded off to two decimal places. First/second results respectively are for terms independence/dependence assumptions. You should be able to double check these results.
1. k1 NOT k2: 60, 30
2. k2 NOT k1: 160, 130
3. k1 OR k2 (unconditional OR): 260, 230
4. k1 OR k2 (conditional OR): 220, 160
5. NOT k1: 400, 400
6. NOT k2: 300, 300
7. NOT (k1 AND k2): 460, 430
8. k1 AND k2 NOT (k1 k2): NC, 45
9. EF-Ratio of the k1 k2 terms sequence: NC, 0.36
10. c12-index of the k1 k2 terms sequence: NC, 0.11
11. c12-index of k1 AND k2: 0.15, 0.30
12. IDF of k1: 0.70, 0.70
13. IDF of k2: 0.40, 0.40
14. IDF of k1 AND k2: 1.10, 0.85
15. IDF of k1 k2 terms sequence: NC, 1.30
Additional exercises open to discussion:
Calculate the associated odds, odd ratios, and logits.