Answers to the IR Quiz are given below:

Term Independence Assumption:

If k1 and k2 are statistically independent they should occur by chance, co-occurring in only

(100)(200)/500 = 40 documents.

Thus, if they occur by chance, the number of documents mentioning the k1 k2 sequence should be unknown, but certainly no greater than 40.

Term Dependence Assumption:

If terms actually co-occur in 70 documents, they are co-occuring more often than by chance (70 > 40). So, terms are statistically dependent and positively correlated. It is a given that the k1 k2 terms sequence is present in 25 out of the 70 documents wherein terms co-occur.

Detailed Results:

Results are given below, rounded off to two decimal places. First/second results respectively are for terms independence/dependence assumptions. You should be able to double check these results.

1. k1 NOT k2: 60, 30

2. k2 NOT k1: 160, 130

3. k1 OR k2 (unconditional OR): 260, 230

4. k1 OR k2 (conditional OR): 220, 160

5. NOT k1: 400, 400

6. NOT k2: 300, 300

7. NOT (k1 AND k2): 460, 430

8. k1 AND k2 NOT (k1 k2): NC, 45

9. EF-Ratio of the k1 k2 terms sequence: NC, 0.36

10. c12-index of the k1 k2 terms sequence: NC, 0.11

11. c12-index of k1 AND k2: 0.15, 0.30

12. IDF of k1: 0.70, 0.70

13. IDF of k2: 0.40, 0.40

14. IDF of k1 AND k2: 1.10, 0.85

15. IDF of k1 k2 terms sequence: NC, 1.30

Additional exercises open to discussion:

Calculate the associated odds, odd ratios, and logits.

 

Advertisements