A reader asked me an interesting question: Without using LSI, how do you represent documents, terms, and queries in the same space?

The answer is simple:

  1. Instead of a term space, construct a document space and treat documents as the basis vectors of this space. Project terms and queries in this space as regular vectors.
  2. Compute term-doc and query-doc cosine similarities.
  3. To rank documents, sort these according to query-doc similarity values.

This is a legacy post originally published in 9/15/2006.

Advertisements