I reviewed few months ago a research manuscript and a thesis wherein the same author indiscriminately used the expression “a co-occurrence matrix”. The author, a graduate student and friend, allowed me to post this, since we think it may be of benefit to other graduate students.

**Co-Weight Matrices**

Let **A** be a term-document matrix populated with term weights, aij, where aij is the weight of term i in document j, and defined as follows:

aij = Lij*Gi*Nj

Lij = a local weight

Gi = a global weight

Nj = a normalization weight

Let **AT **be the transpose of **A**. Consequently, an unnormalized co-weight matrix, **Cu**, is defined as

**Cu = A*AT**

**Cu** can be normalized by restating its elements as Jaccard’s Coefficients, in which case a normalized co-weight matrix, **Cn**, is obtained. If Jaccard’s Coefficients are taken for similarity measures, then **Cn** is a normalized *similarity matrix*.

**Co-Occurrence Matrices**

An unnormalized and a normalized **co-occurrence** matrix are respectively obtained from **Cu** and **Cn**. This is accomplished by initially setting Nj = 1, Gi = 1, and Lij = fij; where fij is the occurrence of term i in document j.

This means that term weights are defined as mere local weights and based on raw word occurrences in documents:

aij = fij

All these matrices can be transformed into binary matrices by setting aij values to 1 or 0. These values indicate the presence (1) or absence (0) of term i in document j, regardless if terms occur many times in documents. Thus, binary co-occurrence -and therefore, binary co-weight- matrices are particular cases.

To conclude, a co-occurrence matrix, normalized or not, or binary or not, is just a particular case of a co-weight matrix.

The indiscriminate use of the term “co-occurrence matrix” should be avoided, since the expression implies that term weights are defined as occurrences, aij = fi. This is not always the case, though.

All co-occurrence matrices are co-weight matrices, but the reverse is not necessarily true; not all co-weight matrices are co-occurrence matrices. Calling “co-occurrence” something that is not is risky.

Unfortunately, we frequently read research papers, including LSI papers, wherein authors and reviewers fail to recognize this generalization.

I advice graduate students and readers (i.e., SEOs, IR friends, colleagues) to avoid such generalizations.

### Like this:

Like Loading...

*Related*

E. Garcia

said:I’ve been asked if I could summarize the gist of this post, so this comment goes:

Calling the

AATa co-occurrence matrix when the matrices being multiplied are not populated with occurrences is misleading.If you don’t know which weight model was used to populate such a product just call it a co-weight matrix.