The QA column of the current issue of IR Watch – The Newsletter features the following question:

Question: In Excel, how do you convert a term-document occurrence matrix into a term-term or document-document co-occurrence matrix?

Answer:

Let A be a matrix populated with term occurrences (frequencies).
Let AT be its transpose.

Then, T = AAT is a term-term co-occurrence matrix, and D = ATA is a document-document co-occurrence matrix.

The following table emulates an Excel spreadsheet.

 

A

B

C

D

1  A =

d1

d2

d3

2

t1

0

1

0

3

t2

0

0

1

4

t3

1

1

1

5

 

 

 

 

6

T = AAT

t1

t2

t3

7

t1

1

0

1

8

t2

0

1

1

9

t3

1

1

3

10

 

 

 

 

11

D = ATA

d1

d2

d3

12

d1

1

1

1

13

d2

1

2

1

14

d3

1

1

2

In the table, T was computed by selecting a destination array, entering in its first empty cell (B7) the formula =MMULT(B2:D4,TRANSPOSE(B2:D4)), pressing the f2 key and then the Ctrl+Shift+Enter keys.

Similarly, D was computed by selecting a destination array, entering in its first empty cell (B12) the formula =MMULT(TRANSPOSE(B2:D4),B2:D4), pressing the f2 key and then the Ctrl+Shift+Enter keys.

That was easy!

Note that none of these are similarity matrices. Can you tell why?

Advertisements