Archive for the ‘Latent Semantic Indexing’ Category

Thesis: Understanding LSI via the Term-Term Truncated Matrix

May 10, 2007

As we mentioned in IR Watch - The Newsletter (got a free subscription?), although LSI (LSA) itself is not first-order co-occurrence (see Prof. Tom Landauer: Introduction to Latent Semantic Analysis), a recent thesis from Regis Newo shows that high-order co-occurrence might be at the heart of LSI and is what makes the technique works. This 2005 thesis abstract on Understanding LSI via the Truncated Term-Term Matrix states:

(more…)

IR Watch - The Newsletter

May 8, 2007

The goal of IR Watch - The Newsletter is to disseminate recent advances, research, and news from the information retrieval world. The current issue (IRW-2007-5) is a summary of my presentation at the OJOBuscador Congress 2 (March 8, 9 - Madrid, Spain),

Demystifying LSI for SEOs.

(more…)

SEOs Blogging LSI Non Sense

May 6, 2007

At this SEOMoz.org blog, posters are discussing about search engines semantic capabilities, including LSI.

I stopped by to clarify several things since many of these present their hearsay as valid statements.

(more…)

PCA Is Not LSI

May 5, 2007

The fact that singular value decomposition (SVD) is used in principal component analysis (PCA) and in latent semantic indexing (LSI) has made some (even some “johnnycomeslate-to-IR” assistant professors) to think that PCA is LSI.

(more…)

On SVD and PCA: Some Applications

May 5, 2007

Some readers have asked me to clarify the difference between SVD and PCA, since these have many overlapping heritages. This was clarified at a TREC9 presentation. For those interested in a mathematical explanation or in ongoing research using these, the following might help.

(more…)

How to Populate a Matrix for SVD

May 4, 2007

SVD has been applied to different scenarios like IR, Economy, Computational Chemistry, BioComputation, and other scenarios. In all these cases, one must pay attention to how one populates the initial matrix to be “SVDied”.

(more…)

Demystifying LSA, LSI, SVD, PCA, and IS ACRONYMS

May 3, 2007

If you are interested in learning what the LSA, LSI, SVD, and PCA acronyms mean this post is for you.

(more…)

Two SEO Blogonomies

May 3, 2007

As I mentioned in a ClickZ column written by Mike Grehan, The Myths and Maths of SEOs, a blogonomy is the dissemination of false knowledge through electronic forums, especially through blogs. Today I want to commment on two LSI blogonomies promoted by several SEO firms.

(more…)

“LSI-Friendly” Documents: No Such Thing

May 3, 2007

Indeed, this was the topic of a post I made at this Cre8asiteForums thread

Quoting myself in part:

“When LSI is applied to a term-document matrix representing a collection of documents in the zillions, the co-occurrence phenomenon that affects the LSI scores becomes a global effect, occuring between documents in the collection.

(more…)

Latest SEO Incoherences (LSI)

May 3, 2007

One of the reasons I started the SVD and LSI Tutorial series was to debunk so many myths about latent semantic indexing. These myths come mostly from a given sector of the search engine marketing industry. In the 1800s and 1900s, when new drugs and medicines were discovered, an interesting phenomenon took place in the old wild west: unscrupulous marketers started to sell “amazing potions” and ”miracle syrups”. These “snake oil sellers” are nothing new since each decade has its versions.

(more…)

IRWatch May Issue: Demystifying LSI

May 1, 2007

If you are a subscriber, the current issue or IRWatch - The Newsletter should arrive to your inbox during the day. If not, let me know.

The piece is a summary of my 03/8,9/07 presentation at the OJOBuscador Congress 2 in Madrid, Spain. My topic was Demystifying LSI for SEOs

The material presented at the conference has been heavily edited and adapted to the newsletter. It is presented using figures and side-by-side comparisons of myths and facts. This time I wanted to skip the math and lengthy explanations.

(more…)