Archive for the ‘Legacy Posts’ Category
August 16, 2007
As reviewer of journal manuscripts and conference papers I normally look to see if the piece before me answers the following questions:
1. WHAT-WHY: What is the scientific problem at hand and why is important?
2. WHO-WHAT-WHY: Who proposed what previous solutions and why are these inadequate or incomplete?
3. WHAT-YOUR-WHY: What is your proposed solution and why is better?
4. HOW-WHAT: How is the solution implemented and what are the benefits or practical applications?
5. PROS-CONS-WHAT: What are the possible pros and cons of your solution and what are the next areas of research?
(more…)
Posted in Conferences, Legacy Posts, Theses | Leave a Comment »
June 26, 2007
I forget to mention that I’m attending ICANN this week, so most will be legacy posts –straight from the conference.
The ResearchChannel is a research consortium dedicated to serve as an online channel for the dissemination of cutting edge technologies. If you want to learn the real stuff under the hood of search engines, just do it through the ResearchChannel. Want to learn the difference between LSA(LSI) and LRA (Latent Relational Analysis)?
(more…)
Posted in Latent Semantic Indexing, Legacy Posts, Vector Space Models | Leave a Comment »
June 21, 2007
This is a great topic for a graduate thesis: Traditional IR considers the problem of matching documents to a query as a single information need to be satisfied. However, since a system doesn’t know what is in the mind of users, the query itself can be a multiple information need.
When you think thoroughly this is why Web searching, how users search and reformulate queries on the Web, is different from, for example, IR searching –wherein user resource to query expansion and relevance feedback mechanisms.
(more…)
Posted in Legacy Posts, Machine Learning | 1 Comment »
June 20, 2007
Often a distinction between the terms given in the title of this post is not clear in the literature.
Closeness is a generic notion that can be expressed in terms of proximity, similarity or distance.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts, Vector Space Models | Leave a Comment »
June 14, 2007
Title: Altering Document Term Vectors for Classification – Ontologies as Expectations of Co-occurrence
Authors: Meenakshi Nagarajan, Amit Sheth; LSDIS Lab, Dept. Of Computer Science, University of Georgia, Athens, GA, USA; Marcos Aguilera, Kimberly Keeton, Arif Merchant, Mustafa Uysal, HP Labs, Palo Alto, CA
This new study, presented at WWW2007, Banff, Canada, confirms the importance of co-occurrence, this time in relation with ontologies.
The abstract states:
(more…)
Posted in Legacy Posts | Leave a Comment »
May 21, 2007
Eigenvectors and eigenvalues come in pairs; that is why we use the term eigenpair. Some have asked me about practical applications of eigenpairs. So this post goes.
Did you know the connection between eigenvectors and Reggaeton Music (or music in general)? How about eigenvectors and bridges, car designers, speakers, architecture, or oil companies?
(more…)
Posted in Legacy Posts, Miscellaneous | Leave a Comment »
May 18, 2007
Yan Qu over at the College of Information Studies, University of Maryland taughts the graduate course
LBSC 670 Information Structure
For the course Qu selected as required readings our tutorials:
(more…)
Posted in IR Tutorials, Legacy Posts, Vector Space Models | Leave a Comment »
May 17, 2007

Marco Kalz, M.A. over at Educational Technology Expertise Centre Open University of the Netherlands, informed me months ago that the University of Netherlands was organizing the 1st European Workshop on LSA in Technology-Enhanced Learning. Marco is part of the Scientific Committee responsible for organizing the event and co-author of the workshop proceedings.
It is my pleasure to inform our readers that the event was a complete success. I will ask Marco for additional inside information, to perhaps include in our next issue of IRW Newsletter.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts | Leave a Comment »
May 10, 2007
Here are some great news:
1. I am getting ready for my presentation at the Intektel International Conference and Expo. I am presenting the second day of the conference on “The Impact of Search Engines in the Internet”.
2. Next week we have the ARIN Conference (American Registry of Internet Numbers) in Puerto Rico, and in June we have also in San Juan, PR the 29th ICANN Conference. WOW!
3. Taschuk Morgan has written an excellent Honour Thesis in which kindly references our tutorial on Cosine Similarity and Term Weights. Morgan writes:
(more…)
Posted in Legacy Posts, Theses | Leave a Comment »
May 10, 2007
I am finishing reading the 2001 Master Thesis of Cameron Alexander Marlow, from MIT:
A Language-Based Approach to Categorical Analysis
where he proposed the use of Synchronic Imprints (SI) combined with LSI. Great thesis. Essentially, SI incorporates a spring model in which term frequencies are inversely proportional to their distances.
(more…)
Posted in Legacy Posts, Theses | Leave a Comment »
May 6, 2007
A reader asked me an interesting question: Without using LSI, how do you represent documents, terms, and queries in the same space?
(more…)
Posted in Legacy Posts, Vector Space Models | Leave a Comment »
May 5, 2007
The fact that singular value decomposition (SVD) is used in principal component analysis (PCA) and in latent semantic indexing (LSI) has made some (even some “johnnycomeslate-to-IR” assistant professors) to think that PCA is LSI.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts | 1 Comment »
May 5, 2007
Some readers have asked me to clarify the difference between SVD and PCA, since these have many overlapping heritages. This was clarified at a TREC9 presentation. For those interested in a mathematical explanation or in ongoing research using these, the following might help.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts | 1 Comment »
May 3, 2007
If you are interested in learning what the LSA, LSI, SVD, and PCA acronyms mean this post is for you.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts | Leave a Comment »
May 3, 2007
As I mentioned in a ClickZ column written by Mike Grehan, The Myths and Maths of SEOs, a blogonomy is the dissemination of false knowledge through electronic forums, especially through blogs. Today I want to commment on two LSI blogonomies promoted by several SEO firms.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts, SEO Myths | 1 Comment »
May 3, 2007
Indeed, this was the topic of a post I made at this Cre8asiteForums thread
Quoting myself in part:
“When LSI is applied to a term-document matrix representing a collection of documents in the zillions, the co-occurrence phenomenon that affects the LSI scores becomes a global effect, occuring between documents in the collection.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts | Leave a Comment »
May 3, 2007
One of the reasons I started the SVD and LSI Tutorial series was to debunk so many myths about latent semantic indexing. These myths come mostly from a given sector of the search engine marketing industry. In the 1800s and 1900s, when new drugs and medicines were discovered, an interesting phenomenon took place in the old wild west: unscrupulous marketers started to sell “amazing potions” and ”miracle syrups”. These “snake oil sellers” are nothing new since each decade has its versions.
(more…)
Posted in Latent Semantic Indexing, Legacy Posts | 3 Comments »
May 3, 2007
Note: I added this post content to the Stochastic Matrix tutorial.
The spreading of incorrect knowledge or at best innaccurate representation of concepts is prevalent in circles associated to search engine optimization (SEO). This is a social phenomenon more notorious in the blogosphere and through public forums (sites and discussion forums). Because of this, we call the phenomenon a bunches of “blogonomies”.
(more…)
Posted in Legacy Posts, SEO Myths | Leave a Comment »
May 3, 2007
While working on Part 1 of the math tutorial I was asked to define “blogonomies”, a term I like to use in reference to an interesting social blog behavior.
Well, I have other definitions, equally interesting and worth a study: “blogorrhea” and “linkphilis
I call “blogonomies” the dissemination of false knowledge through blogs and “blogorrhea” when a false concept is promoted for profit.
A blogonomy can be the result of ignorance or speculations; nothing that a damage control campaign can fix to save face. I have seen many of these in some blogs and discussion forums.
(more…)
Posted in Legacy Posts | Leave a Comment »
May 3, 2007
We have a list of legacy posts that were published at the old home of IR Thoughts. The IR Thoughts Legacy Posts List is available at
http://www.miislita.com/blog/irthoughts-legacy-posts.html
The list was generated by converting the mdb Access database into an XLS file with third party convertware (software that converts documents into different formats). After sorting and removing irrelevant columns the output was saved as an HTML document. The entire process was pretty much straightforward.
To get the full text of a particular post contact Dr. Garcia at admin@miislita.com.
(more…)
Posted in Legacy Posts | Leave a Comment »