Often independent events are mistaken for exclusive (disjoint) events. These are two different animals.

Consider two events, A and B. Let p(A OR B) be their union probability and p(A AND B) their joint probability. The general addition law for probabilities states that for any two events A and B

p(A OR B) = p(A) + p(B) – p(A AND B)

If events are independent

p(A AND B) = p(A)p(B)

Thus,

p(A OR B) = p(A) + p(B) – p(A)p(B)

Whereas if these are exclusive

p(A AND B) = 0

Therefore,

p(A OR B) = p(A) + p(B)

Furthermore,

if p(A AND B) = p(A)p(B) events are independent, occurring by chance.
if p(A AND B) > p(A)p(B) events are positively correlated, occurring more often than by chance.
if p(A AND B) < p(A)p(B) events are negatively correlated, occurring less often than by chance.

Talking in “rice and beans” (Hablando en “arroz con habichelas”):

Exclusive events do not have common outcomes as the occurrence of one excludes the occurrence of the other. By contrast, independent events have common outcomes, but the occurrence of one does not influence the occurrence of the other.

Independence and disjointness are very different things.

In IR, assuming that the IDF of a combination of terms can be taken for the sum of individual term IDF values presumes that terms are independent regardless of the actual data.

Arbitrarily assuming event independence, ignoring the experimental evidence, is one of the main sources of innaccuracies/flaws in many IR models (Cooper, 1991). However, excluding independence altogether is also unreasonable (Sparck-Jones, Walker, and Robertson, 1998).

References

Cooper, W. S. (1991). Some inconsistencies and misnomers in probabilistic information retrieval. In A. Bookstein, Y. Chiaramella, G. Salton, & V. V. Raghavan (Eds.), Proceedings of the 14th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (ACM, SIGIR ’91) (pp 57-61). Chicago, Illinois: ACM.

Sparck Jones, K., Walker, S., & Robertson, S. E. (1998). A probabilistic model of information retrieval: development and status. TR 446, September. Computer Laboratory, University of Cambridge.

Advertisements