As mentioned the July issue of IR Watch is running late due to the backend changes we made last week to our main site ( If you are a subscriber, IRW should arrive to your inbox in few days. This issue is dedicated to Market Basket Analysis and Keyword Research. Some portions are adaptations from Tan, Steinbach, and Kumar book “Introduction to Data Mining”.

 Here is a sneak preview:

“Customer purchase data collected at checkout counters of grocery stores are commonly known as market basket transactions.  Normally the data is organized as a customer transaction-item purchased tabular array. This is a market basket table or matrix, with rows corresponding to transactions and columns to items. Each transaction is considered a “bag of items” since normally the order in which customers place items in a basket or shopping cart and pay for these at the checkout does not matter.”

“Retailers are interested in mining information from these transactions in order to understand the purchasing behavior of their customers. The information extracted can be used to support business-related services such as marketing promotions, inventory, and customer relationship management (CRM) programs. One way of doing this consists in developing association rules based on relationships like ”

Customers buying X also buy Y

“where X and Y are itemsets or sets consisting of one or more items.”

“This association rule can be expressed using the following implication expression:”

{X}     —->           {Y}

“Implication neither means that X and Y co-occur by chance nor by a cause-effect relationship, but according to an underlying association rule derived from the Law of Conditional Probability…”

“Retailers can use this type of rules to help them identify new ways for positioning (cross-selling) products to their customers. ”

“Such market basket scenarios are frequently found in Web Mining studies since”

  • Words can be viewed as items.
  • A pool of candidate words, a wordset, is an itemset.
  • New sets can be extracted from a wordset.
  • A document can be thought as a transaction or “bag of words”.

“Thus, association rules of the form”

“Documents mentioning keywords k1 and k2 also mention keyword k3.”


{k1, k2}          —->          {k3}

“can be investigated through market basket association models. ”

“To sum up, for an IR or marketing specialist questions like: ”

  • “Why documents mentioning real estate also mention mortgage?”
  • “Why users searching for divorce also search for lawyers?”
  • “Why documents relevant to the topic of company car are also relevant to the topic of insurance rates?”

“cannot be addressed with non-substantive arguments.”

“Perhaps the next time you visit a grocery store you might realize why certain items -like milk and bread- are placed close to one another or why some grocery stores selling diapers also sell beer.”

After understanding market basket theory, perhaps the importance of co-occurrence and co-retrieval indices (c-indices) will be more than evident to you.

Stop being missing-in-action due to SEO myths or an “seo book” misinforming you. Subscribe to IRW, today.