Archive for June, 2007

A Week before Greatness

June 29, 2007

I just came from ICANN. Yesterday I attended Paul Twomey and Vint Cerf, Google’s Chief Evangelist presentations at the Law School of University of Puerto Rico. Very inspiring talks. A lot of representatives from ICANN were present.

(more…)

Research Channel, LRA, Microsoft, and more

June 26, 2007

I forget to mention that I’m attending ICANN this week, so most will be legacy posts –straight from the conference.

The ResearchChannel is a research consortium dedicated to serve as an online channel for the dissemination of cutting edge technologies. If you want to learn the real stuff under the hood of search engines, just do it through the ResearchChannel. Want to learn the difference between LSA(LSI) and LRA (Latent Relational Analysis)?

(more…)

On Recall, Precision, and Relevance

June 25, 2007

Recall, Precision, and Relevance

Two important concepts for estimating the retrieval performance of search systems are recall (R) and precision (P). In laymen terms, picture two partially overlapped circles A and B representing answer sets (group of documents). Let C be the overlapping region between A and B and wherein

(more…)

ICANN, Vint Cerf, and Paul Twomey in Puerto Rico

June 22, 2007

Next week is ICANN’s 29th International Public Meeting; 25-29 June, 2007, here in beautiful San Juan, Puerto Rico.

As part of the occasion, I just received an invite from the Law School of University of Puerto Rico to attend special presentations from two of my heroes: Vint Cerf and Paul Twomey.

(more…)

IR Relevance vs. Suggestion Task Relevance

June 21, 2007

This is a great topic for a graduate thesis: Traditional IR considers the problem of matching documents to a query as a single information need to be satisfied. However, since a system doesn’t know what is in the mind of users, the query itself can be a multiple information need.

When you think thoroughly this is why Web searching, how users search and reformulate queries on the Web, is different from, for example, IR searching –wherein user resource to query expansion and relevance feedback mechanisms.

(more…)

Closeness, Proximity, Similarity, and Distance

June 20, 2007

Often a distinction between the terms given in the title of this post is not clear in the literature.

Closeness is a generic notion that can be expressed in terms of proximity, similarity or distance.

(more…)

Mining End User Locations

June 19, 2007

Ever wonder how to conduct data mining from end user locations? This is easier to do than you think.

At Mi Islita we have been testing for a while a redirection mechanism that collects directory and file path information from end users. Our goals are:

(a) to illustrate that on the Web privacy is an illusion.
(b) to conduct data mining from user’s behaviors.

(more…)

Data Mining for All Disciplines

June 18, 2007

It looks like I’ll be teaching this Fall a graduate course on Data Mining (DM) for CS and Business students. I often find myself explaining across disciplines that DM is the Discipline of Knowledge (DK), that there is nothing unusual for someone with a background in chemistry, biology, or business to cross the line of university departments and reach computer engineering courses, looking for data mining or knowledge discovery in data bases (KDD). This might explain why search engine companies hire PhDs from all disciplines.

(more…)

On n-Grams and IR Theses

June 15, 2007

A grad student asked me about n-grams in IR as a thesis topic.

What are n-grams

Well, most of the modern work on n-grams is due to the work by D’Amore and Mah (1985) ONE-TIME COMPLETE INDEXING OF TEXT: THEORY AND PRACTICE

Page 116 of Grossman and Frieder (Information Retrieval: Algorithms and Heuristics) has great introductory material.

(more…)

Ontologies as Expectations of Co-Occurrence

June 14, 2007

Title: Altering Document Term Vectors for Classification - Ontologies as Expectations of Co-occurrence

Authors: Meenakshi Nagarajan, Amit Sheth; LSDIS Lab, Dept. Of Computer Science, University of Georgia, Athens, GA, USA; Marcos Aguilera, Kimberly Keeton, Arif Merchant, Mustafa Uysal, HP Labs, Palo Alto, CA

This new study, presented at WWW2007, Banff, Canada, confirms the importance of co-occurrence, this time in relation with ontologies.

The abstract states:

(more…)

Is SEO Dead?

June 12, 2007

With the catchy title, SEO Is Dead. Long Live, er, the Other SEO, once again, my friend Mike Grehan has a great ClickZ column wherein he comments on Google and ASK new approaches to satisfy users’ information needs.

He ends the article as follows:

(more…)

Subsumptions vs Synonyms - Conceptual Indexing Revisited

June 11, 2007

Back in 1997, William Woods, Principal Scientist and Distinguished Engineer at Sun Microsystems Labs, wrote Conceptual Indexing: A Better Way to Organize Knowledge. Although the notion of conceptual indexing turned out to be a complex thing, his paper is still relevant these days wherein many SEOs make incorrect claims about how search engines use Latent Semantic Indexing (LSI) and wherein others are paying attention to synonymy and phrase processing patents. This post is based in part on Woods’s manuscript.

(more…)

Harvard Law School Internet News and Conferences

June 8, 2007

I have received the May issue of Internet News from the Berkman Center for Internet & Society at Harvard Law School. They have a great list of upcoming conference, which I’m reproducing below. Some of these are relevant to IR, while others are at the intersection of search technologies and Internet Law.

(more…)

The Mathematics of Mind

June 7, 2007

IPAM is offering a graduate summer school program called: “Probabilistic Models of Cognition: The Mathematics of Mind” during July 9 - 27, 2007. More information is available at http://ipam.ucla.edu/programs/gss2007/

According to that link and quote:

(more…)

LSI Blog Posts and SEOs

June 6, 2007

I’m still trying to understand why so many SEOs have LSI backward and why others insists in promoting or explaining something that is not LSI as LSI. Some even repeat previous fallacies they have heard across the Web or from contaminated pools of knowlege like Wikipedia.

To top off, I have emails from SEOs so mad about being misled into error by other SEO “experts” regarding claims about what is LSI or how it works.

(more…)

Relevance Gap Analysis as Part of SEO Work

June 5, 2007

IR colleagues and marketers are now reading the IRW 2007-06 issue wherein I elaborate on The User-Machine Relevance Perception Gap. So far, I have received some feedback from these. As part of the process, Ben Pfeiffer from Ranksmart asked me a valid question:

(more…)

Call for Papers to Knowledge Discovery Conference

June 4, 2007

Dr. Ellen Voorhees, Director of TREC, over at NIST.gov informed me by email of this Call for Papers. Over the years, I have received invitations to several TREC tracks and no doubt that the groups that conform these are a great place to be.

For those that want to submit manuscript, here is the full Call:

(more…)

When SEOs are Caught in Lies

June 2, 2007

Mr. Aaron Wall is quoting me over at his blog.

Funny how SEOs that are caught in lies brush things off.

Mr. Wall: When people mistake concept A (e.g, LSI) for concept B (e.g., whatever) and spent few years promoting the former for the later across the Web, not only it is clear they don’t know what they are talking about, but they just spread fallacies and induce others into error. That is a fact.

(more…)

Snake Preview of IR Watch 2007-6 Issue

June 1, 2007

Relevance Perception

Here is a snake preview of the June issue of IR Watch. If you are a subscriber it will arrive to your inbox over the weekend or at the latest by Monday.

Enjoy it!

(more…)