I just came from ICANN. Yesterday I attended Paul Twomey and Vint Cerf, Google’s Chief Evangelist presentations at the Law School of University of Puerto Rico. Very inspiring talks. A lot of representatives from ICANN were present.
I forget to mention that I’m attending ICANN this week, so most will be legacy posts –straight from the conference.
The ResearchChannel is a research consortium dedicated to serve as an online channel for the dissemination of cutting edge technologies. If you want to learn the real stuff under the hood of search engines, just do it through the ResearchChannel. Want to learn the difference between LSA(LSI) and LRA (Latent Relational Analysis)?
Two important concepts for estimating the retrieval performance of search systems are recall (R) and precision (P). In laymen terms, picture two partially overlapped circles A and B representing answer sets (group of documents). Let C be the overlapping region between A and B and wherein
Next week is ICANN’s 29th International Public Meeting; 25-29 June, 2007, here in beautiful San Juan, Puerto Rico.
As part of the occasion, I just received an invite from the Law School of University of Puerto Rico to attend special presentations from two of my heroes: Vint Cerf and Paul Twomey.
This is a great topic for a graduate thesis: Traditional IR considers the problem of matching documents to a query as a single information need to be satisfied. However, since a system doesn’t know what is in the mind of users, the query itself can be a multiple information need.
When you think thoroughly this is why Web searching, how users search and reformulate queries on the Web, is different from, for example, IR searching –wherein user resource to query expansion and relevance feedback mechanisms.
Often a distinction between the terms given in the title of this post is not clear in the literature.
Closeness is a generic notion that can be expressed in terms of proximity, similarity or distance.
Ever wonder how to conduct data mining from end user locations? This is easier to do than you think.
At Mi Islita we have been testing for a while a redirection mechanism that collects directory and file path information from end users. Our goals are:
(a) to illustrate that on the Web privacy is an illusion.
(b) to conduct data mining from user’s behaviors.
It looks like I’ll be teaching this Fall a graduate course on Data Mining (DM) for CS and Business students. I often find myself explaining across disciplines that DM is the Discipline of Knowledge (DK), that there is nothing unusual for someone with a background in chemistry, biology, or business to cross the line of university departments and reach computer engineering courses, looking for data mining or knowledge discovery in data bases (KDD). This might explain why search engine companies hire PhDs from all disciplines.
A grad student asked me about n-grams in IR as a thesis topic.
What are n-grams
Well, most of the modern work on n-grams is due to the work by D’Amore and Mah (1985) ONE-TIME COMPLETE INDEXING OF TEXT: THEORY AND PRACTICE
Page 116 of Grossman and Frieder (Information Retrieval: Algorithms and Heuristics) has great introductory material.
Authors: Meenakshi Nagarajan, Amit Sheth; LSDIS Lab, Dept. Of Computer Science, University of Georgia, Athens, GA, USA; Marcos Aguilera, Kimberly Keeton, Arif Merchant, Mustafa Uysal, HP Labs, Palo Alto, CA
This new study, presented at WWW2007, Banff, Canada, confirms the importance of co-occurrence, this time in relation with ontologies.
The abstract states: