Archive for the ‘Legacy Posts’ Category

Reviewing Papers: How-To

August 16, 2007

As reviewer of journal manuscripts and conference papers I normally look to see if the piece before me answers the following questions:

1. WHAT-WHY: What is the scientific problem at hand and why is important?
2. WHO-WHAT-WHY: Who proposed what previous solutions and why are these inadequate or incomplete?
3. WHAT-YOUR-WHY: What is your proposed solution and why is better?
4. HOW-WHAT: How is the solution implemented and what are the benefits or practical applications?
5. PROS-CONS-WHAT: What are the possible pros and cons of your solution and what are the next areas of research?

(more…)

Research Channel, LRA, Microsoft, and more

June 26, 2007

I forget to mention that I’m attending ICANN this week, so most will be legacy posts –straight from the conference.

The ResearchChannel is a research consortium dedicated to serve as an online channel for the dissemination of cutting edge technologies. If you want to learn the real stuff under the hood of search engines, just do it through the ResearchChannel. Want to learn the difference between LSA(LSI) and LRA (Latent Relational Analysis)?

(more…)

IR Relevance vs. Suggestion Task Relevance

June 21, 2007

This is a great topic for a graduate thesis: Traditional IR considers the problem of matching documents to a query as a single information need to be satisfied. However, since a system doesn’t know what is in the mind of users, the query itself can be a multiple information need.

When you think thoroughly this is why Web searching, how users search and reformulate queries on the Web, is different from, for example, IR searching –wherein user resource to query expansion and relevance feedback mechanisms.

(more…)

Closeness, Proximity, Similarity, and Distance

June 20, 2007

Often a distinction between the terms given in the title of this post is not clear in the literature.

Closeness is a generic notion that can be expressed in terms of proximity, similarity or distance.

(more…)

Ontologies as Expectations of Co-Occurrence

June 14, 2007

Title: Altering Document Term Vectors for Classification - Ontologies as Expectations of Co-occurrence

Authors: Meenakshi Nagarajan, Amit Sheth; LSDIS Lab, Dept. Of Computer Science, University of Georgia, Athens, GA, USA; Marcos Aguilera, Kimberly Keeton, Arif Merchant, Mustafa Uysal, HP Labs, Palo Alto, CA

This new study, presented at WWW2007, Banff, Canada, confirms the importance of co-occurrence, this time in relation with ontologies.

The abstract states:

(more…)

Eigenvectors and Reggaeton Music = Eiggaeton

May 21, 2007

Eigenvectors and eigenvalues come in pairs; that is why we use the term eigenpair. Some have asked me about practical applications of eigenpairs. So this post goes.

Did you know the connection between eigenvectors and Reggaeton Music (or music in general)? How about eigenvectors and bridges, car designers, speakers, architecture, or oil companies?

(more…)

Our Tutorials, Required Readings at University of Maryland

May 18, 2007

Yan Qu over at the College of Information Studies, University of Maryland taughts the graduate course

LBSC 670 Information Structure

For the course Qu selected as required readings our tutorials:

(more…)

LSA: A Goldmine for Educators and Curriculum Developers

May 17, 2007

LSA

Marco Kalz, M.A. over at Educational Technology Expertise Centre Open University of the Netherlands, informed me months ago that the University of Netherlands was organizing the 1st European Workshop on LSA in Technology-Enhanced Learning. Marco is part of the Scientific Committee responsible for organizing the event and co-author of the workshop proceedings.

It is my pleasure to inform our readers that the event was a complete success. I will ask Marco for additional inside information, to perhaps include in our next issue of IRW Newsletter.

(more…)

Thesis: A Hybrid Knowledge-based/Content-based Recommender

May 10, 2007

Here are some great news:

1. I am getting ready for my presentation at the Intektel International Conference and Expo. I am presenting the second day of the conference on “The Impact of Search Engines in the Internet”.

2. Next week we have the ARIN Conference (American Registry of Internet Numbers) in Puerto Rico, and in June we have also in San Juan, PR the 29th ICANN Conference. WOW!

3. Taschuk Morgan has written an excellent Honour Thesis in which kindly references our tutorial on Cosine Similarity and Term Weights. Morgan writes:

(more…)

Thesis: Understanding LSI via the Term-Term Truncated Matrix

May 10, 2007

As we mentioned in IR Watch - The Newsletter (got a free subscription?), although LSI (LSA) itself is not first-order co-occurrence (see Prof. Tom Landauer: Introduction to Latent Semantic Analysis), a recent thesis from Regis Newo shows that high-order co-occurrence might be at the heart of LSI and is what makes the technique works. This 2005 thesis abstract on Understanding LSI via the Truncated Term-Term Matrix states:

(more…)

Thesis: Information Retrieval with Genetic Programming

May 10, 2007

Here is the 2002 master thesis of Nir Oren, University of the Witwatersand, Johannnesburg:

Improving the effectiveness of information retrieval with genetic programming

where he proposes an interesting approach to IR using genetic algorithms. Part of his abstract states:

(more…)

Thesis: A Language-Based Approach to Categorical Analysis

May 10, 2007

I am finishing reading the 2001 Master Thesis of Cameron Alexander Marlow, from MIT:
A Language-Based Approach to Categorical Analysis

where he proposed the use of Synchronic Imprints (SI) combined with LSI. Great thesis. Essentially, SI incorporates a spring model in which term frequencies are inversely proportional to their distances.

(more…)

Representing Documents, Terms, and Queries in the Same Space

May 6, 2007

A reader asked me an interesting question: Without using LSI, how do you represent documents, terms, and queries in the same space?

(more…)

PCA Is Not LSI

May 5, 2007

The fact that singular value decomposition (SVD) is used in principal component analysis (PCA) and in latent semantic indexing (LSI) has made some (even some “johnnycomeslate-to-IR” assistant professors) to think that PCA is LSI.

(more…)

On SVD and PCA: Some Applications

May 5, 2007

Some readers have asked me to clarify the difference between SVD and PCA, since these have many overlapping heritages. This was clarified at a TREC9 presentation. For those interested in a mathematical explanation or in ongoing research using these, the following might help.

(more…)

Demystifying LSA, LSI, SVD, PCA, and IS ACRONYMS

May 3, 2007

If you are interested in learning what the LSA, LSI, SVD, and PCA acronyms mean this post is for you.

(more…)

Two SEO Blogonomies

May 3, 2007

As I mentioned in a ClickZ column written by Mike Grehan, The Myths and Maths of SEOs, a blogonomy is the dissemination of false knowledge through electronic forums, especially through blogs. Today I want to commment on two LSI blogonomies promoted by several SEO firms.

(more…)

“LSI-Friendly” Documents: No Such Thing

May 3, 2007

Indeed, this was the topic of a post I made at this Cre8asiteForums thread

Quoting myself in part:

“When LSI is applied to a term-document matrix representing a collection of documents in the zillions, the co-occurrence phenomenon that affects the LSI scores becomes a global effect, occuring between documents in the collection.

(more…)

Latest SEO Incoherences (LSI)

May 3, 2007

One of the reasons I started the SVD and LSI Tutorial series was to debunk so many myths about latent semantic indexing. These myths come mostly from a given sector of the search engine marketing industry. In the 1800s and 1900s, when new drugs and medicines were discovered, an interesting phenomenon took place in the old wild west: unscrupulous marketers started to sell “amazing potions” and ”miracle syrups”. These “snake oil sellers” are nothing new since each decade has its versions.

(more…)

SEO Blogonomies: The Search Engine Markov Chain

May 3, 2007

Note: I added this post content to the  Stochastic Matrix tutorial.

The spreading of incorrect knowledge or at best innaccurate representation of concepts is prevalent in circles associated to search engine optimization (SEO). This is a social phenomenon more notorious in the blogosphere and through public forums (sites and discussion forums). Because of this, we call the phenomenon a bunches of “blogonomies”.

(more…)

Some Definitions

May 3, 2007

While working on Part 1 of the math tutorial I was asked to define “blogonomies”, a term I like to use in reference to an interesting social blog behavior.

Well, I have other definitions, equally interesting and worth a study: “blogorrhea” and “linkphilis

I call “blogonomies” the dissemination of false knowledge through blogs and “blogorrhea” when a false concept is promoted for profit.

A blogonomy can be the result of ignorance or speculations; nothing that a damage control campaign can fix to save face. I have seen many of these in some blogs and discussion forums.

(more…)

IR Thoughts Legacy Posts

May 3, 2007

We have a list of legacy posts that were published at the old home of IR Thoughts. The IR Thoughts Legacy Posts List is available at

http://www.miislita.com/blog/irthoughts-legacy-posts.html

The list was generated by converting the mdb Access database into an XLS file with third party convertware (software that converts documents into different formats). After sorting and removing irrelevant columns the output was saved as an HTML document. The entire process was pretty much straightforward.

To get the full text of a particular post contact Dr. Garcia at admin@miislita.com.

(more…)