Representing Documents, Terms, and Queries in the Same Space
A reader asked me an interesting question: Without using LSI, how do you represent documents, terms, and queries in the same space?
The answer is simple:
- Instead of a term space, construct a document space and treat documents as the basis vectors of this space. Project terms and queries in this space as regular vectors.
- Compute term-doc and query-doc cosine similarities.
- To rank documents, sort these according to query-doc similarity values.
This is a legacy post originally published in 9/15/2006.