In IRW-2007-8 I introduced a row-pruning algorithm (RPA) and its unconditional version (URPA).

Let C be a collection of documents and let T = {t1, t2… tm} be m unique terms extracted from C. Assume that term i is combined with all other terms such that starting with term i a family of term sequences is obtained. Assume that these term sequences are hidden (latent) in C. The purpose of RPA and URPA is to identify the composition of these term sequences and to find those that occur more frequently in C.

I developed these algorithms to help myself with the problem of finding hidden term sequences in non-commercial collections like scientific and government repositories. But, why not apply these to search engines?

I must admit that the example presented in the IRW newsletter was not the best. Thus, I have written a basic tutorial using a more practical example: the term set consisting of mortgage, loan, refinance, equity, and rates. In this way, readers that are search engine marketers could connect the dots between association rule mining and keyword research.

In addition to document collections, row-pruning algorithms can be applied to the analysis of search logs or to retail transactions wherein customers tend to buy items in a strict order. An example follows.

Suppose that a customer buys item a and, due to this transaction, he receives by mail a discount coupon referencing the transaction with an offer to buy item b. He goes back to the store, buys b, and also buys item c.

The purchase of a and b follows a strict sequence. Indeed, the purchase of a and then of b are not entirely independent from one another, but can be tied to a common transaction ID, so as c. These purchases can be described by an association rule of the form

{a b}   —>   {c}