Two important concepts for estimating the retrieval performance of search systems are recall (R) and precision (P). In laymen terms, picture two partially overlapped circles A and B representing answer sets (group of documents). Let C be the overlapping region between A and B and wherein
A = relevant documents
B = retrieved documents
C = relevant retrieved documents
then we define the following ratios:
Recall, R = C/A = relevant documents retrieved/number of relevant documents
Precision, P = C/B = relevant documents retrieved/number of retrieved documents
As a memory aid for these definitions keep in mind that C is the common numerator. To remember the denominator use the following mnemonic: “recall” ends with an “l” and “relevance” has an “l”.
Check the above figure, wherein I have colored in red the string –students love mnemonics!
For a 100% overlap, A = B = C and thus R = P = 1. These are ideal conditions. In practice perfect precision and recall, wherein a system retrieves and returns all relevant documents without introducing irrelevant documents in the answer set is impossible to achieve. More likely during implementation there is a tradeoff between precision and recall and between retrieval effectiveness and computing cost one needs to work with.
Baeza-Yates and Ribeiro-Neto in pages 74 – 84 of Modern Information Retrievalhave an indepth look into precision-recall curves and other combined measures for evaluating retrieval performance.
As noted by Grossman and Frieder in pages 2 – 8 of Information Retrieval: Algorithms and Heuristics:
“Computing the total number of relevant documents is non-trivial. The only sure means of doing this is to read the entire document collection.”
To top off, there is an inherent relevance perception gap between humans and search system.
The June 2007 issue of IR Watch – The Newsletter discusses this perception divide.
Our newsletter reaches computer scientists, CS departments, graduate students, and search marketers interested in information retrieval (IR) and search engine technologies.
Subscribe to IRW, today. It’s free.