The QA column of the current issue of IR Watch – The Newsletter features the following question:

Question: In Excel, how do you convert a term-document occurrence matrix into a term-term or document-document co-occurrence matrix?

Let A be a matrix populated with term occurrences (frequencies).
Let AT be its transpose.

Then, T = AAT is a term-term co-occurrence matrix, and D = ATA is a document-document co-occurrence matrix.

The following table emulates an Excel spreadsheet.

 A B C D 1 A = d1 d2 d3 2 t1 0 1 0 3 t2 0 0 1 4 t3 1 1 1 5 6 T = AAT t1 t2 t3 7 t1 1 0 1 8 t2 0 1 1 9 t3 1 1 3 10 11 D = ATA d1 d2 d3 12 d1 1 1 1 13 d2 1 2 1 14 d3 1 1 2

In the table, T was computed by selecting a destination array, entering in its first empty cell (B7) the formula =MMULT(B2:D4,TRANSPOSE(B2:D4)), pressing the f2 key and then the Ctrl+Shift+Enter keys.

Similarly, D was computed by selecting a destination array, entering in its first empty cell (B12) the formula =MMULT(TRANSPOSE(B2:D4),B2:D4), pressing the f2 key and then the Ctrl+Shift+Enter keys.

That was easy!

Note that none of these are similarity matrices. Can you tell why?