Archive for the ‘Queries’ Category
August 9, 2007
The list of subscribers to IRWatch is growing at fast pace. One of our recent subscribers is a developer at WCC, makers of ELISE Smart Search & Match. This seems to be a quite interesting technology. I highly recommend readers to visit their site http://wcc-group.com/
(more…)
Posted in Data Mining, Latent Semantic Indexing, Queries | No Comments »
August 8, 2007
In IRW-2007-8 I introduced a row-pruning algorithm (RPA) and its unconditional version (URPA).
Let C be a collection of documents and let T = {t1, t2… tm} be m unique terms extracted from C. Assume that term i is combined with all other terms such that starting with term i a family of term sequences is obtained. Assume that these term sequences are hidden (latent) in C. The purpose of RPA and URPA is to identify the composition of these term sequences and to find those that occur more frequently in C.
(more…)
Posted in Data Mining, Machine Learning, Queries | 1 Comment »
July 31, 2007

The current issue of IRW-2007-08 should be out within the next few days. The Association Rules Part 2 discuses how association rule mining techniques from market basket can be applied to Web Mining.
(more…)
Posted in Data Mining, Queries | No Comments »
July 24, 2007
I have discussed AND and EXACT searches many times, but did you know the following?
In addition to enclosing search terms with double quotes (”like this”), in some search engines one can invoke a shortcut to an EXACT search by using certain characters that serve as sequence connectors. These work in the same way double quotes work. The most common is the hyphen; e.g.
(more…)
Posted in Queries | 1 Comment »
July 23, 2007
SEOmoz has a great discussion on why at times search engines don’t return relevant results; that is, why some results perceived by users as being not relevant to their information needs (queries) are ranked high by search engines.
Some bloggers at SEOmoz attribute this in part to precision and recall issues. We have covered these topics in different occasions; so, let revisit some points along those lines.
(more…)
Posted in Machine Learning, Queries | No Comments »
July 17, 2007
Thomas Richard Lynam, has researched extensively a variant of ITF called Redundant ITF (RITF). His 2002 master thesis, “Exploitation of Redundant Inverse Term Frequency”, is a must-read for anyone interested in the topic. His thesis is available as a PDF and Postscript.
The justification for using RITF is as follows.
(more…)
Posted in Queries, Theses | No Comments »
July 16, 2007
In my previous post I explained to a reader the difference between inverse term frequency (ITF) and inverse document frequency (IDF), but did not provide practical applications. This post is to explain what ITF is good for.Like IDF, ITF is a global weight measure; i.e., Gi = ITF. Combined with a local weight measure (Lij), it can be used to compute an overall weight.Local weights can be defined in many different ways. Here is one definition:
(more…)
Posted in Latent Semantic Indexing, Queries, Vector Space Models | No Comments »
July 5, 2007
Data mining query logs? Then these research papers might interest you.
(more…)
Posted in Queries | No Comments »
July 4, 2007
Dependence or Independence Day? Ask regular citizens.
Meanwhile, how about some IR papers on the dependence/independence of query relevance feedback?
Here is a list of really interesting papers on the subject from Michael Ortega-Binderberger’s group:
(more…)
Posted in Queries | No Comments »