Archive for June, 2007
June 29, 2007
I just came from ICANN. Yesterday I attended Paul Twomey and Vint Cerf, Google’s Chief Evangelist presentations at the Law School of University of Puerto Rico. Very inspiring talks. A lot of representatives from ICANN were present.
(more…)
Posted in Conferences | 1 Comment »
June 26, 2007
I forget to mention that I’m attending ICANN this week, so most will be legacy posts –straight from the conference.
The ResearchChannel is a research consortium dedicated to serve as an online channel for the dissemination of cutting edge technologies. If you want to learn the real stuff under the hood of search engines, just do it through the ResearchChannel. Want to learn the difference between LSA(LSI) and LRA (Latent Relational Analysis)?
(more…)
Posted in Latent Semantic Indexing, Legacy Posts, Vector Space Models | No Comments »
June 25, 2007

Two important concepts for estimating the retrieval performance of search systems are recall (R) and precision (P). In laymen terms, picture two partially overlapped circles A and B representing answer sets (group of documents). Let C be the overlapping region between A and B and wherein
(more…)
Posted in Machine Learning | No Comments »
June 22, 2007
Next week is ICANN’s 29th International Public Meeting; 25-29 June, 2007, here in beautiful San Juan, Puerto Rico.
As part of the occasion, I just received an invite from the Law School of University of Puerto Rico to attend special presentations from two of my heroes: Vint Cerf and Paul Twomey.
(more…)
Posted in Conferences | No Comments »
June 21, 2007
This is a great topic for a graduate thesis: Traditional IR considers the problem of matching documents to a query as a single information need to be satisfied. However, since a system doesn’t know what is in the mind of users, the query itself can be a multiple information need.
When you think thoroughly this is why Web searching, how users search and reformulate queries on the Web, is different from, for example, IR searching –wherein user resource to query expansion and relevance feedback mechanisms.
(more…)
Posted in Legacy Posts, Machine Learning | 1 Comment »
June 20, 2007
Often a distinction between the terms given in the title of this post is not clear in the literature.
Closeness is a generic notion that can be expressed in terms of proximity, similarity or distance.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts, Vector Space Models | No Comments »
June 19, 2007
Ever wonder how to conduct data mining from end user locations? This is easier to do than you think.
At Mi Islita we have been testing for a while a redirection mechanism that collects directory and file path information from end users. Our goals are:
(a) to illustrate that on the Web privacy is an illusion.
(b) to conduct data mining from user’s behaviors.
(more…)
Posted in Data Mining | No Comments »
June 18, 2007
It looks like I’ll be teaching this Fall a graduate course on Data Mining (DM) for CS and Business students. I often find myself explaining across disciplines that DM is the Discipline of Knowledge (DK), that there is nothing unusual for someone with a background in chemistry, biology, or business to cross the line of university departments and reach computer engineering courses, looking for data mining or knowledge discovery in data bases (KDD). This might explain why search engine companies hire PhDs from all disciplines.
(more…)
Posted in Data Mining | No Comments »
June 15, 2007
A grad student asked me about n-grams in IR as a thesis topic.
What are n-grams
Well, most of the modern work on n-grams is due to the work by D’Amore and Mah (1985) ONE-TIME COMPLETE INDEXING OF TEXT: THEORY AND PRACTICE
Page 116 of Grossman and Frieder (Information Retrieval: Algorithms and Heuristics) has great introductory material.
(more…)
Posted in Machine Learning, Vector Space Models | No Comments »
June 14, 2007
Title: Altering Document Term Vectors for Classification - Ontologies as Expectations of Co-occurrence
Authors: Meenakshi Nagarajan, Amit Sheth; LSDIS Lab, Dept. Of Computer Science, University of Georgia, Athens, GA, USA; Marcos Aguilera, Kimberly Keeton, Arif Merchant, Mustafa Uysal, HP Labs, Palo Alto, CA
This new study, presented at WWW2007, Banff, Canada, confirms the importance of co-occurrence, this time in relation with ontologies.
The abstract states:
(more…)
Posted in Legacy Posts | No Comments »
June 12, 2007
With the catchy title, SEO Is Dead. Long Live, er, the Other SEO, once again, my friend Mike Grehan has a great ClickZ column wherein he comments on Google and ASK new approaches to satisfy users’ information needs.
He ends the article as follows:
(more…)
Posted in Latent Semantic Indexing, SEO Myths | No Comments »
June 11, 2007
Back in 1997, William Woods, Principal Scientist and Distinguished Engineer at Sun Microsystems Labs, wrote Conceptual Indexing: A Better Way to Organize Knowledge. Although the notion of conceptual indexing turned out to be a complex thing, his paper is still relevant these days wherein many SEOs make incorrect claims about how search engines use Latent Semantic Indexing (LSI) and wherein others are paying attention to synonymy and phrase processing patents. This post is based in part on Woods’s manuscript.
(more…)
Posted in Latent Semantic Indexing, Machine Learning, SEO Myths | 1 Comment »
June 8, 2007
I have received the May issue of Internet News from the Berkman Center for Internet & Society at Harvard Law School. They have a great list of upcoming conference, which I’m reproducing below. Some of these are relevant to IR, while others are at the intersection of search technologies and Internet Law.
(more…)
Posted in Conferences | No Comments »
June 7, 2007
IPAM is offering a graduate summer school program called: “Probabilistic Models of Cognition: The Mathematics of Mind” during July 9 - 27, 2007. More information is available at http://ipam.ucla.edu/programs/gss2007/
According to that link and quote:
(more…)
Posted in Graduate Courses | No Comments »
June 6, 2007
I’m still trying to understand why so many SEOs have LSI backward and why others insists in promoting or explaining something that is not LSI as LSI. Some even repeat previous fallacies they have heard across the Web or from contaminated pools of knowlege like Wikipedia.
To top off, I have emails from SEOs so mad about being misled into error by other SEO “experts” regarding claims about what is LSI or how it works.
(more…)
Posted in Latent Semantic Indexing, SEO Myths | 5 Comments »
June 5, 2007
IR colleagues and marketers are now reading the IRW 2007-06 issue wherein I elaborate on The User-Machine Relevance Perception Gap. So far, I have received some feedback from these. As part of the process, Ben Pfeiffer from Ranksmart asked me a valid question:
(more…)
Posted in Machine Learning | No Comments »
June 4, 2007
Dr. Ellen Voorhees, Director of TREC, over at NIST.gov informed me by email of this Call for Papers. Over the years, I have received invitations to several TREC tracks and no doubt that the groups that conform these are a great place to be.
For those that want to submit manuscript, here is the full Call:
(more…)
Posted in Conferences, Machine Learning | No Comments »
June 2, 2007
Mr. Aaron Wall is quoting me over at his blog.
Funny how SEOs that are caught in lies brush things off.
Mr. Wall: When people mistake concept A (e.g, LSI) for concept B (e.g., whatever) and spent few years promoting the former for the later across the Web, not only it is clear they don’t know what they are talking about, but they just spread fallacies and induce others into error. That is a fact.
(more…)
Posted in Latent Semantic Indexing | 3 Comments »
June 1, 2007

Here is a snake preview of the June issue of IR Watch. If you are a subscriber it will arrive to your inbox over the weekend or at the latest by Monday.
Enjoy it!
(more…)
Posted in Machine Learning | No Comments »