The QA column of the current issue of IR Watch – The Newsletter features the following question:
Question: In Excel, how do you convert a term-document occurrence matrix into a term-term or document-document co-occurrence matrix?
Answer:
Let A be a matrix populated with term occurrences (frequencies).
Let AT be its transpose.
Then, T = AAT is a term-term co-occurrence matrix, and D = ATA is a document-document co-occurrence matrix.
The following table emulates an Excel spreadsheet.
|
|
A |
B |
C |
D |
| 1 | A = |
d1 |
d2 |
d3 |
| 2 |
t1 |
0 |
1 |
0 |
| 3 |
t2 |
0 |
0 |
1 |
| 4 |
t3 |
1 |
1 |
1 |
| 5 |
|
|
|
|
| 6 |
T = AAT |
t1 |
t2 |
t3 |
| 7 |
t1 |
1 |
0 |
1 |
| 8 |
t2 |
0 |
1 |
1 |
| 9 |
t3 |
1 |
1 |
3 |
| 10 |
|
|
|
|
| 11 |
D = ATA |
d1 |
d2 |
d3 |
| 12 |
d1 |
1 |
1 |
1 |
| 13 |
d2 |
1 |
2 |
1 |
| 14 |
d3 |
1 |
1 |
2 |
In the table, T was computed by selecting a destination array, entering in its first empty cell (B7) the formula =MMULT(B2:D4,TRANSPOSE(B2:D4)), pressing the f2 key and then the Ctrl+Shift+Enter keys.
Similarly, D was computed by selecting a destination array, entering in its first empty cell (B12) the formula =MMULT(TRANSPOSE(B2:D4),B2:D4), pressing the f2 key and then the Ctrl+Shift+Enter keys.
That was easy!
Note that none of these are similarity matrices. Can you tell why?