SEOmoz has a great discussion on why at times search engines don’t return relevant results; that is, why some results perceived by users as being not relevant to their information needs (queries) are ranked high by search engines.

Some bloggers at SEOmoz attribute this in part to precision and recall issues. We have covered these topics in different occasions; so, let revisit some points along those lines.

First, there is a relevance perception mismatch between end-users and search systems that not necessarily is due to precision and recall issues. We have discussed the nature of this relevance divide in the June 2007 issues of IRW – The Newsletter. This post has a little preview.

Second, many things force a tradeoff between precision and recall. We have discussed precision vs recall before.

One source of the aforementioned tradeoff consists in expanding answer sets with an incorrect lookup list or thesaurus. We have discussed this also in the post Subsumption vs Synonyms. In that post we referred readers to the research work of William Woods, published a decade ago (1997): Conceptual Semantics: A better way to organize knowledge. I prefer to quote Woods’s work to clarify things and to show why one must pay attention to the nature and quality of the thesaurus used for expanding answer sets (emphasis added):

The reaction of many people who are familiar with information retrieval techniques, when confronted with the problems described above, is to propose the use of a synonym thesaurus to expand a request by adding terms that are synonyms of the terms requested. Such a thesaurus is sometimes used by full-text search and retrieval systems, most of which employ search algorithms that ignore word order and simply count the number of terms of the (perhaps expanded) query that occur in the target text. Some such systems allow for different weighting of different terms rather than simply counting them (in order to capture the fact that some terms of a request may be more important than others). Unfortunately, attempts to automatically expand queries using a synonym thesaurus often fail to improve the retrieval effectiveness of these systems and frequently actually degrade performance. Part of the reason is that there are very few true synonyms in English (or any other language for that matter), and members of the “synonym” sets in such thesauri often differ significantly in meaning. Typically there is a difference in generality, as in the set: {automobile, car, truck, bus, taxi, motor vehicle}, where the last is clearly much more general than any of the others, and in this case effectively summarizes the others.

Thus, which criteria are used to classify terms as synonyms or related terms do matter. Which type of thesarus is used to expand an answer set and classify terms as synonyms does matter as well. For instance, a similarity thesarus is not the same as a hierarchical or statistical thesaurus –a similarity thesaurus is not based on co-occurrence data at all.

Regarding improperly classifying terms as synonyms, Wood adds:

Choosing to treat terms as if they were synonyms, when they really aren’t fully synonymous, introduces a level of granularity in the retrieval process that trades off precision against recall. Treating such terms as synonymous amounts to generalizing the query to
the level of abstraction at which the differences among the individual “synonyms” does not matter. Unfortunately, there is no a priori level of generality that is correct for all information needs. For example, if you were to ask for “motor vehicle” then you would probably expect to pick up hits for “car,” “truck,” and “bus,” but if you asked for “automobile” then you would presumably not want to get “truck” and “bus.” If requests are automatically expanded by the addition of terms drawn from a fixed synonym thesaurus, then for any large number of queries, it is often the case that the queries whose recall results are improved by the query expansion are offset by other queries in which the expansion produces unwanted noise that dilutes the precision of the result. In a system where the results are ordered and at most a specified maximum number of items are retrieved, this can result in a decrease in recall as well as precision, because additional “noise” items can displace correct items.

How could we improve this? Woods continue:

One can improve this situation by matching requests to targets using a relationship of generality rather than synonymy. In this case, a term in a request will match any corresponding term in a target that is at least as specific as the requested term. Thus, a request for “motor
vehicle” would retrieve all kinds of motor vehicles, while a request for “automobile” would retrieve cars and taxis but not trucks and buses. We can express this notion of generality as a relationship of conceptual subsumption, where a more general term is said to “subsume” a more specific term. Formally, a term also subsumes itself and subsumes any true synonyms that it may have. Thus, subsumption is more general than synonymy (i.e., subsumption subsumes synonymy). True synonymy is equivalent to mutual subsumption. If a retrieval system is designed to retrieve all items that are subsumed by a request, then the information seeker has a way of controlling the level of generality of the search by choosing the level of generality of the query terms, thus avoiding a major source of precision/recall trade-off.

In 2004 in the seminal paper On-Topic Analysis – Online Discovery of On-Topic Terms we described a procedure for the online discovery of on-topic terms with a scope-based subsumption relationship, according to a data structure of the form

broader terms > narrower terms > specific terms

The gist of that article was that this notion of generality is not just of the semantics type, but can also be of the hierarchical and statistical type. Although not the only way, such data structures can be discovered through co-occurrence data, even from noisy universes like web collections contaminated with artificially induced co-retrieval.