Archive for the ‘Queries’ Category

Search Smart with WCC’s ELISE

August 9, 2007

The list of subscribers to IRWatch is growing at fast pace. One of our recent subscribers is a developer at WCC, makers of ELISE Smart Search & Match. This seems to be a quite interesting technology. I highly recommend readers to visit their site http://wcc-group.com/

(more…)

Row-Pruning Algorithm Tutorial

August 8, 2007

In IRW-2007-8 I introduced a row-pruning algorithm (RPA) and its unconditional version (URPA).

Let C be a collection of documents and let T = {t1, t2… tm} be m unique terms extracted from C. Assume that term i is combined with all other terms such that starting with term i a family of term sequences is obtained. Assume that these term sequences are hidden (latent) in C. The purpose of RPA and URPA is to identify the composition of these term sequences and to find those that occur more frequently in C.

(more…)

Snake Preview of IR Watch

July 31, 2007

The current issue of IRW-2007-08 should be out within the next few days. The Association Rules Part 2 discuses how association rule mining techniques from market basket can be applied to Web Mining.

(more…)

Association Rule Mining Thesaurus

July 30, 2007

Here is a 2004 paper on Association Thesaurus Construction for Interactive Query Expansion based on Association Rule Mining

The article discusses basic association rule mining concepts like support, confidence, and pruning as we described in Association Rules Part 1 (July issue of IR Watch - The Newsletter). BTW, read Part 2 in the August issue.

(more…)

Revisiting EXACT Search Shortcuts in Google

July 24, 2007

I have discussed AND and EXACT searches many times, but did you know the following?

In addition to enclosing search terms with double quotes (”like this”), in some search engines one can invoke a shortcut to an EXACT search by using certain characters that serve as sequence connectors. These work in the same way double quotes work. The most common is the hyphen; e.g. 

(more…)

The Risks of Thesaurus-based Expansions

July 23, 2007

SEOmoz has a great discussion on why at times search engines don’t return relevant results; that is, why some results perceived by users as being not relevant to their information needs (queries) are ranked high by search engines.

Some bloggers at SEOmoz attribute this in part to precision and recall issues. We have covered these topics in different occasions; so, let revisit some points along those lines.

(more…)

Thesis on Redundant ITF

July 17, 2007

Thomas Richard Lynam, has researched extensively a variant of ITF called Redundant ITF (RITF). His 2002 master thesis, “Exploitation of Redundant Inverse Term Frequency”, is a must-read for anyone interested in the topic. His thesis is available as a PDF and Postscript.

The justification for using RITF is as follows.

(more…)

What is a Similarity Thesaurus?

July 16, 2007

In my previous post I explained to a reader the difference between inverse term frequency (ITF) and inverse document frequency (IDF), but did not provide practical applications. This post is to explain what ITF is good for.Like IDF, ITF is a global weight measure; i.e., Gi = ITF. Combined with a local weight measure (Lij), it can be used to compute an overall weight.Local weights can be defined in many different ways. Here is one definition:

(more…)

Query-Log-based Personalization

July 5, 2007

Data mining query logs? Then these research papers might interest you.

(more…)

Query Relevance Feedback Algorithms

July 4, 2007

Dependence or Independence Day? Ask regular citizens.

Meanwhile, how about some IR papers on the dependence/independence of query relevance feedback?

Here is a list of really interesting papers on the subject from Michael Ortega-Binderberger’s group:

(more…)