• About IR Thoughts

IR Thoughts

~ Thoughts on Information Retrieval, Data Mining, and Search Engines

IR Thoughts

Search results for: lsi

Google early years and LSI

24 Thursday Nov 2011

Posted by egarcia in Data Mining

≈ 4 Comments

For years many SEOs fooled their own peers with the assertion that LSI was something new that Google implemented. Some even have claimed LSI was a proprietary algorithm from Google. I’ve spent sooooo many years debunking all this crap and few other urban legends from unscrupulous SEOs.

In this Thanksgiving Day I thank that all these myths have been debunked to no end: LSI-rank correlations, LDA-rank correlations, KD-rank correlations, additiveness of correlation coefficients, blah, blah, blah…  I thank also that along came this:
http://infolab.stanford.edu/~sergey/349/

LSI?

Known from the onset by Google.

A cost effective implementation in a large scale and dynamic environment as the Web is?

Nope.

Finally SEOs are getting the LSI Myth!

09 Thursday Apr 2009

Posted by egarcia in Latent Semantic Indexing

≈ 10 Comments

If you search this blog (IRThoughts) for LSI or visit its Latent Semantic Indexing category you will find many posts wherein SEO LSI Myths are debunked. Prior to this wordpress blog I used to maintain a personal blog wherein SEO myths regarding LSI were also debunked.

Over the years, many realized they were taken by the usual agents of misinformation, at least when it comes to “SEO LSI” and “LSI-Friendly” documents.

Recently, I found traffic coming from a blog discussion about a video (http://www.stomperblog.com/warning-advanced-seo-technique-does-not-work/) wherein LSI in relation with Google is debunked.

The video also discusses one flavor of LSI; i.e. one wherein weights are tf-IDF weights. This flavor does not incorporate relevance information or entropy information, like other LSI variants.

The video does a good job at debunking LSI Myths. However, it has at least a factually incorrect argument in relation to how the SVD algorithm works.

The video gives an example implying that SVD works by reducing a large set of words to a few words, such that, for example thousand of words are reduced to, let say 300 words.  This is incorrect and certainly is not a trivial flaw.

SVD does not work by reducing a vocabulary, but by reducing dimensions, and there are as many dimensions as singular values. This is why is called a dimensionality-reduction and not a vocabulary-reduction algorithm.  I should stress that an LSI Space is not like a Term Space wherein each term is a dimension such that there is a 1:1 correspondence.

In LSI, the SVD algorithm is used to reduce the dimensions of a matrix; the number of singular values of the matrix.

For instance in our SVD and LSI Tutorial series at


http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-5-lsi-keyword-research-co-occurrence.html

we present an LSI problem example consisting of many words and few initial dimensions such that for the initial matrix

#words >> # initial dimensions

more specific, we used 11 words and 3 dimensions

After truncation, we ended up with 11 words and 2 dimensions.

Other than this, the video is fun to watch, but ended up as an introductory promotion for another SEO proposal.

 PS.

After reviewing several times the video, unfortunately I found the video has another incorrect argumentation.

When objecting to that Google might not use LSI, an argument is made in the sense that LSI has to return same results when word variants are used like plurals and tenses. This might be the case if stemming is heavily used in an LSI implementation, but the use of stemming is not a requirement for implementing LSI at all.

When stemming is not implemented, for sure the SVD reduction will return different results since these will be entered in the original term-doc matrix to be undergo decomposition as different tokens.

The video also misses what the power of LSI comes from: higher order co-occurrence connectivity path hidden (latent) in the original matrix. Whether terms have to be synonyms, related terms, or even of non-derivative forms is not a requirement for observing these hidden paths in LSI.

Terms no need to be related terms either to end up clustered with LSI. It is the hidden co-occurrence patterns what is behind the clustering. For example, in our SVD and LSI tutorial above, we intentionally used stopwords and zero synonyms/related terms and these ended-up in their corresponding clusters, without being necessarily semantically related. This simple example shows that in LSI the SVD algorithm produces an output based on crushing numbers, not on making sense out of meaning or intelligence, and contradicts the generalized opinion that LSI works at the level of meaning. 

I have to conclude that while the video is intended to debunk LSI SEO myths (a noble effort), it uses incorrect arguments and hearsays lines from around the Web. Debunking hearsay with more hearsay: What a shame.

 

Vector Space, Probabilistic LSI, and LDA

03 Friday Apr 2009

Posted by egarcia in Latent Semantic Indexing, Vector Space Models

≈ 3 Comments

 lda
source:
http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf

There is a kind of buzz about Probabilistic Latent Semantics Indexing, so this post goes.

From VSM to LSI

Prior to 1988 the prevalent IR model was Salton’s Vector Space Model (VSM). This model treats documents and queries as vectors in a multidimensional space. In this space a query is treated just as another document. In this term space, it is not possible to assign a position to terms simply because these are the dimensions of the space. Coordinate  values assigned to document and query vectors are given by terms weights computed using a particular weighting scheme.

VSM and its many variants are based on matching query terms to terms found in documents. These models assume term independence. However, we know this assumption is not necessarily correct since terms can be dependent via (a) synonymity and (b) polysemy.

In 1988, Dumais and co-workers at Bellcore (now Telcordia) published two papers in which they applied Golub and Kahan’s 1965 SVD algorithm to “documents” exhibiting (a) and (b) and called that Latent Semantic Indexing (LSI).

LSI became an improvement over the simplistic point of view of term matching, accounting for term dependencies. The “documents” were not HTML Web documents (there were no Web documents back then), but just abstracts and memos from specific knowledge domains (HCI, scientific, med). As expected these consisted of synonyms and related terms used in these domains. Thus, clusters of these were obtained.

It was immediately claimed that LSI could be used to model aspects of basic linguistic -like synonymy and polysemy- and how the human mind associates words to concepts and concepts to meaning.

Moving twenty years forward, SEOs misread such outdated research and the synonym-stuffing myth was born.

There is now a crew of SEOs claiming that they can design documents “LSI-friendly” by making these rich in synonyms and related terms. We have demonstrated via our SVD and LSI tutorial series why this is not possible. These marketers are simply inventing out of thin air LSI Myths in order to market better whatever they sell or promote (often their own image as “experts”). Same goes for those that claim “PLSI-SEO” strategies.

Research findings suggest that what makes LSI works is first and higher-order co-occurrence paths hidden in the term-term LSI matrix. These paths are responsible for how and why of the redistribution of term weights in a truncated term-document matrix. Altering terms (even a single term) of this matrix provokes a redistribution of term weights across the entire matrix, whose outcome cannot be predicted. This is why “LSI-friendly” documents is plain SEO Snakeoil. Again, the same goes for those that claim “PLSI-SEO” strategies. Keep reading.

Enters Probabilistic Latent Semantic Indexing (PLSI) model

In 1998 LSI was put into question. Given a generative model of text: why adopt LSI when one could use Bayesian or maximum likelihood methods and fit the model to data?

In 1999, Thomas Hofmann presented the Probabilistic Latent Semantic Indexing (PLSI) model, also known as the Aspect Model, as an alternative to LSI. PLSI (or PLSA) models each word in a document as a sample from a mixture model. The mixture components are multinomial random variables viewed as representations of topics.

Each word is generated from a single topic, and different words in a document can be generated from different topics. In this model each document is represented as a list of mixing proportions for these mixture components. Thus, documents are reduced to a probability distribution over a set of topics, which is the expected “reduced description” associated with the document.

But there is a problem.

Enters Latent Dirichlet Allocation Model (LDA)

By 2003 Hofman’s PLSI model was put into question, this time by David Blei, Andrew Ng and Michael Jordan, who proposed that year the Latent Dirichlet Allocation Model (LDA). As noted by Blei, et al. (and quote) PLSI “is incomplete in that it provides no probabilistic model at the level of documents. In pLSI, each document is represented as a list of numbers (the mixing proportions for topics), and there is no generative probabilistic model for these numbers. “

Blei and co-workers then stated that this leads to two problems:

1. the number of parameter in the model grows linearly with the size of the corpus, which leads to serious problems with over fitting

2. it is not clear how to assign probability to a document outside of the training set.

Thus, it is not true that PLSI is the preferred model to work with in IR, as some have claimed. In addition, the model has non-trivial theoretical flaws and limitations.

In Salton Term Vector Model as in the LSI and PLSI models word order does not matter. Documents are simply considered a “bag of words”. However, common sense dictates that this is not a valid assumption since word semantics is sensitive to word ordering. This explains why searches in Google for college junior or junior college produce far different results.

To underscore the importance of word ordering consider this: applying a similarity measure like a Jaccard Coefficient computed from a term-term matrix to the above two queries produces identical results, but again the computed similarity scores are disconnected from word semantics.

Blei and co-workers have argued that if we want to consider exchangeable representations (ordering) for documents and words, we need to consider mixture models that capture the exchangeability of both words and documents. This is why they proposed their LDA model.

In LDA documents are represented as random mixtures over latent topics, and each topic is characterized by a distribution over words.

I believe we are moving toward a Unified IR Theory where Co-Occurrence, Probability and Geometry will converge. In this unified framework there is no room for the idea of term independence or of documents as mere “bags of words”. The former is IR’s Original Sin and the later is its copycat.

The image above gives me a flash back on research work I conducted in the late ’80s on sequential simplex optimization methods.

More LSI Snakeoil

10 Wednesday Dec 2008

Posted by egarcia in Latent Semantic Indexing

≈ Leave a Comment

Here is another SEO resource (
http://www.billhartzer.com/pages/latent-symantec-indexing-lsi-is-the-key-to-great-search-engine-rankings/
) that a la Aaron Wall is still promoting LSI SEO non sense in connection with ranking high in search engines. Like if these marketers really know what is LSI or how it works. Otherwise, they will never publish such crap.

There is no such thing as “LSA/LSI sites” nor SEOs can manipulate LSI to influence ranking results. It is this type of snakeoil marketing what is a black eye in the face of the SEO industry.

Claps and Slaps, the LSI Way

04 Monday Aug 2008

Posted by egarcia in Latent Semantic Indexing, SEO Myths, Spam

≈ 2 Comments

Claps

We are happy to learn that Dr. Deepak Khemani from the Artificial Intelligence & Database Research Group at the Indian Institute of Technology in Madras, India is using our SVD LSI tutorial as lecture material for his course: CS625, Memory Based Reasoning in AI.
http://aidb.cs.iitm.ernet.in/cs625/11.SVD-LSI.pdf
 

Another investigator, this time from the cancer research field, congratulated us for the LSI tutorials. Jaime Fernandez Vera from Structural Biology and Biocomputing, Centro Nacional de Investigaciones Oncologicas, Madrid, Spain wrote (contact info removed):

Estimado Dr. García:

Muchas gracias por poner a disposición de la Comunidad sus magníficas guías prácticas y, en especial, la de LSI que es la que he seguido.

Un abrazo,

Jaime Fernández Vera

Biología Estructural y Biocomputación Structural Biology and Biocomputing
Centro Nacional de Investigaciones Oncológicas

Our LSI/SVD tutorials are also listed in
http://www-timc.imag.fr/Benoit.Lemaire/lsa.html
huge repository of LSI research resources.

For additional IR resources quoting our tutorials, check the following link at 
http://www.miislita.com
.


http://www.miislita.com/searchito/educational-links.html
.

Slaps

Talking about LSI…

Spammers disguised as ethical SEOs and that promote LSI crap are now hidding. There is less talking on the blogosphere on “SEO LSI” and “LSI-friendly SEO Optimization” myths. As we always say, these crooks are a black eye to the ethical sector of the search marketing industry.

Their signature seems to be the promotion of crap tools and services like Keyword Density tools, Markov Chain generators (if you believe that crap), TFIDF rarity calculators, “semantic page strength” estimators, lookup lists based on “LSI operators”, etc. What will be their next effort at misleading the public? Latent Dirichlet Allocation (LDA) tools?

However, in an effort to save face, the usual suspects are still making gymnastic wording. They are desperate. It is clear that our efforts at exposing these crook marketers through IR knowledge are working.

Many are learning why they should stay away from the incorrect knowledge promoted by marketers that ocassionally use IR jargon to pretend they know what they are talking about. They often do these IR-like talking attempts to promote their image as “experts” before either naive or ignorant followers. We still cannot assess the dumbers, if the snakeoil sellers or their groupies. They even game each others.

When we expose SEO myths from their competitors they praise us as long as the debunkig works for them, but when their own myths are exposed they get angry at us. Ha, Ha.

 

Posts somehow related with this post


http://irthoughts.wordpress.com/2007/12/11/perpetuating-lsi-misconceptions/
 


http://irthoughts.wordpress.com/2008/07/21/seos-and-their-exhaustivity-search-myths/


http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comments


http://irthoughts.wordpress.com/2008/07/14/claps-and-slaps/


http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/


http://irthoughts.wordpress.com/2007/07/19/seos-and-still-their-lsi-misconceptions/


http://irthoughts.wordpress.com/2007/05/03/latest-seo-incoherences-lsi/

SEOs Scams: LSI, KW, and Markov Chains

03 Tuesday Jun 2008

Posted by egarcia in SEO Myths, Spam

≈ Leave a Comment

I’m happy to learn that Dr. Deepak Khemani from the Artificial Intelligence & Database Research Group at Indian Institute of Technology Madras, India is using my LSI and Term Vector tutorials for his graduate courses:


http://aidb.cs.iitm.ernet.in/cs625/11.SVD-LSI.pdf


http://aidb.cs.iitm.ernet.in/cs625/10.VectorSpace-model.pdf

It is great to see that more and more IRs and graduate students are realizing how certain SEOs have induced the public and their clients into error; that is, by selling their snakeoil in the form of “LSI optimization” and keyword density services. The most recent scam comes in the form of “markov chain” services. Like if they really know about matrix algebra and markov chain processes. Same old tricks…

It is not surprising to hear colleagues referring to these SEOs as vulgar crooks and scammers.

Demystifying LSI Video

07 Monday Apr 2008

Posted by egarcia in Conferences, Latent Semantic Indexing, SEO Myths

≈ Leave a Comment

Tags

Add new tag

Here is a video of my presentation, Demystifying LSI, at the OJOBuscador Congress 2.0, Madrid, Spain, 2007. One year later, nothing has changed. Many of the same crook SEOs exposed during the congress are still deceiving the public about what is LSI.

Unfortunately, the quality of the video and lights are not good enough to see the pdf slides, plus the presentation is in Spanish. Since attendees were not scientists, I talked very slow for over an hour.

Want to get bored for the next hour? View the video.

Thanks to N. Valenzuela Alonso, Director of SEO and Search Engine Marketing of Media Bit, S.L. for the link (www.ithinksearch.com/2008/03/31/video-lsi-de-edel-garcia-desmitificando-lsi/).

Here is also the presentation of Carlos Castillo (Chato), from Yahoo! Research Spain:

Adversarial IR with Web Spam, parts 1 and 2 
(
http://www.ojobuscador.com/2007/06/14/ir-con-adversario-y-webspam-videopost/
).

I spent great time talking with Carlos, a former grad student of Ricardo Baeza-Yates.

Baeza-Yates, Andrei Broder, Gerald Salton, and Keith van Rijsbergen and few others have helped to shape what is today known as Information Retrieval Research

Talking about Andrei Broder (one of the main researchers behind the old mighty Altavista), here is also a great interview, thanks to ojobuscador site: 

http://www.ojobuscador.com/2006/05/20/entrevista-a-andrei-broder/

 

Adressing Some LSI Questions

02 Wednesday Apr 2008

Posted by egarcia in Latent Semantic Indexing, Search Engines Architecture Course

≈ Leave a Comment

At the last Search Engines Architecture lecture we discussed LSI and Terrier. Great questions were raised. Some of these follows:

Q: How many dimensions to keep?
A: This is done by trial and error. I have a research project on the topic. None of the current ways of addressing this problem convince me.

Q: How do we compute a truncated version of the initial matrix, A?
A: After SVDing A, truncate U, S, and V by retaining the first k columns of U and V (rows of V transpose) and the first k diagonal elements of S. Multiply these as discussed in class to get A truncated.

Q: To compute the query vector in the reduced space, do we need to compute A truncated for each query?
A: No. The new coordinates of this vectors are defined as
q = qTUkSk-1
This means that A can be called from the cache. See the fast track tutorial

http://www.miislita.com/information-retrieval-tutorial/lsi-keyword-research-fast-track-tutorial.pdf

over at Mi Islita.com site.

Q: Do I need to compute A truncated each time a new document is added or previous are modified?
A: For small matrices the answer is YES. However, for huge matrices we can resource to updating/appending techniques. Some of these add doc vectors without recomputing the previous matrix. There is a point wherein this can compromise orthogonality, though.

Q: How do I use Desktop Terrier?
A: Follow the instructions provided in the updated version of Lab Report 2.

LSI: How Many Dimensions to Keep?

13 Wednesday Feb 2008

Posted by egarcia in Latent Semantic Indexing

≈ Leave a Comment

In How to Populate a Matrix for SVD I referred readers to Igvita’s great blog posts on SVD. A recent visit to the blog shows it is still very much alive and equally interesting. The issues been discussed are not really new, though.

When we lecture on SVD an issue that soon or later arises is how many dimensions k to keep. A recent visitor of the aforementioned blog finally raised the same question.

Can you pls give me a clue as to how we decide how many dimensions to project our data onto when using SVD?

How many dimenisions to keep is the so-called Rank k Approximation that often leads to the dreaded dimensionality reduction curse in which performance can be compromised.

In the Latest SEO Incoherences (LSI) post we mentioned that this issue was already addressed by Dr. Susan Dumais, many times, and througout her first papers and talks on LSI. In that post we referred readers to Dumais’s talk Transcription of the Application presentation by Susan Dumais, Bellcore (now at Microsoft). That talk is now a classic in the history of LSI.

In those days Dumais approach was simply “by seat of the pants“:

Let me end, as my time is running out, with some of the statistical issues that we have encountered and that I hope you have some hints about. The first is how we choose the number of dimensions in our reduced representation. We have done it largely by seat of the pants. You know when it doesn’t work. You know when you have too few dimensions. We would like some better methods for doing this, things like the scree test don’t seem to correspond very well to behavioral data that we have.

Later during the QA session participants revisited this issue. Let us reproduce participants-Dumais QA:

PARTICIPANT: Thank you, Susan. Questions from the floor?

PARTICIPANT: I’m a little nervous that if someone was browsing the Web and we hoped to put some of this material in the Web, that we’re in trouble. We’re talking about seat of the pants and underwear models, that people are going to get the wrong context for why we’re here. But that is part of the big problem that Susan is talking about.

PARTICIPANT: I thought I would just mention an entirely different approach to this problem, with Joe (word lost) at EDS. What we’re doing is –

PARTICIPANT: Can you get to a mike?

PARTICIPANT: We are using a poisson model for the word counts. Then we’re interested in finding maximum likelihood estimates for the clustering, and we found various combinations of simulated annealing and markup chain Monte Carlo to work very well with funding these things.

One of the nice things in a model based approach is that you get natural measures of association rather than just SVD types of things, although it could be slower.

PARTICIPANT: I think one thing we will try to ask everyone after the conference is to send us electronically two or three references of relevant work that we can disseminate in this way, because we do hope to learn about new approaches and new methods and related work. So keep that in mind as the discussion progresses. We will send out E-mail requesting those in electronic form.

PARTICIPANT: (Comments off mike.)

DR. DUMAIS: It is first of all not clear that the 300 or 400 dimensions we have used for the trek databases is optimal. We find that performance is still increasing up to 400 dimensions; it may well increase beyond that.

In fact, I should mention that if you plot performance as a function of number of dimensions, what you get is an inverted U function that is heavily skewed. That is, performance increases dramatically as you go from 20 or 30 up to several hundred dimensions, and then it tails off gradually through the level of performance that you see with raw key word matching, which is the full dimensional solution.

We don’t know that we have reached the peak. In problems where we know what the optimal number of dimensions is, we have found that the peak is not so sharp.

Twenty years later (first LSI papers saw the light in 1988, not in 1990 as some SEOs have incorrectly claimed) a lot of research advances on SVD in relation to LSI have been published. Old IR ideas regarding LSI have been dropped and new ones have been adopted. That is what research is all about.

Still, the issue of how many dimensions to keep is still an open issue and a “by seat of the pants” one. All kind of things and guidelines have been tried. But at the end we need to test and retest the system under examination.

I even have tested my own guideline: keep the top k singular values that amount to more than X percent of the trace of the S matrix; where

S is the matrix of singular values.
X is a threshold value, usually 80-85%

But, again, some would ask: why 80%? Why not 90%, 70%, 60%, etc?

While the above guideline works for many systems, I have trepidated on some systems in which the above threshold is not good. So I always come to “find X experimentally or by seat of the pants”.

We could inspect this as an optimization problem and use Nelder-Mead Multivariative Sequential Simplex Optimization, but I haven’t tried this yet. I’m not sure if this is the way to go either, but might be worth to test.

Another idea is to iteratively update-test-update-test the matrix using any of the current SVD updating methods for several X values. I need to spare some time on this one to see what comes out.

 I’m also open to suggestions.

For those interested, a 1.0 Mb download of Dumais’s 1995 presentation is available. If you have problems downloading it, let me know. I can send you a zip file.

April Kontostathis, from Ursinus College, in Essential Dimensions of Latent Semantic Indexing (LSI), proposes an interesting approach to address aspects of this problem. She illustrates her approach with a model wherein term weights are computed using a well known base-2 LOG model for local weights combined with the ENTROPY model for global weights.

More work is still needed along these lines.

Perpetuating LSI Misconceptions

11 Tuesday Dec 2007

Posted by egarcia in Latent Semantic Indexing

≈ 3 Comments

Mr. Nick Yorchak from Fusionbox and an alleged SEO “expert” has written this Sitepronews.com article about LSI, which perpetuates myths and wrong statements about LSI, similar to those claimed by Mr. Aaron Wall at this SearchEngineJournal article, and by Valerie DiCarlo in this unfortunate article.

Mike Duz has written a quick rebuttal to Yorchak.

Wishful Thinking: Let us hope that in 2008 SEOs learn how SVD works so they stop spreading misinformation about what is LSI.

To learn about SEO misconceptions regarding LSI, check my tutorial series on the topic, starting with

Tutorial 1: Understanding SVD and LSI

Fortunately, more and more SEOs, like Andy Beal here (MarketingPilgrim.com) and Melissa Fach here (SEOAware.com), are realizing what is not LSI.

BTW, here is an “invitation” issued by Mike Grehan and me back in July, 2007: A Call to SEOs Claiming to Sell LSI.

← Older posts
June 2013
M T W T F S S
« May    
 12
3456789
10111213141516
17181920212223
24252627282930

Favorite Sites

  • Mi Islita

Pages

  • About IR Thoughts

Categories

  • AIRWeb Course
  • Conferences
  • Data Mining
  • Dynamics
  • Fractal Geometry
  • Graduate Courses
  • Hacking
  • Homeland Security
  • Human-Computer Interaction
  • Image Compression
  • Internet Engineering
  • IR Quizzes
  • IR Tools
  • IR Tutorials
  • Latent Semantic Indexing
  • Legacy Posts
  • Machine Learning
  • Marketing Research
  • Miscellaneous
  • News
  • Newsletters
  • Programming
  • Quack Science
  • Queries
  • Scripts
  • Search Engines Architecture Course
  • SEO Myths
  • Software
  • Spam
  • Statistics and Mathematics
  • Theses
  • Vector Space Models
  • Web Mining Course

Recent Posts

  • “Powered by” in Spanish
  • Some nice features added to the Image Crawler
  • The Images Crawler
  • A nice service for my locals
  • An update to the Web Crawler
  • New similarity measures
  • The Web Crawler is Back!
  • Tracking Users: An Email Crawler on Steroids
  • The Email Crawler: A Tool for Gathering Emails
  • The Binary Distance Calculator – a tool for comparing binary sets
  • Fractalettes: A Fractal Design Strategy to Color Mining and Learning through Discovery
  • AZZOO and WAZZOO: New Similarity Measures for the 21st Century
  • The Binary Similarity Calculator
  • From Harlem Shake to Link Shake: The Qualified Links Shake
  • Web Vulnerabilities and Search Engines

Archives

  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • September 2012
  • August 2012
  • July 2012
  • June 2012
  • May 2012
  • April 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • May 2011
  • April 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • May 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008
  • December 2007
  • November 2007
  • October 2007
  • September 2007
  • August 2007
  • July 2007
  • June 2007
  • May 2007
  • April 2007

AIRWeb Course Conferences Data Mining Fractal Geometry Graduate Courses Hacking Homeland Security Human-Computer Interaction Internet Engineering IR Quizzes IR Tools IR Tutorials Latent Semantic Indexing Legacy Posts Machine Learning Marketing Research Miscellaneous Newsletters Programming Quack Science Queries Scripts Search Engines Architecture Course SEO Myths Software Spam Statistics and Mathematics Theses Vector Space Models Web Mining Course

Blog at WordPress.com. Theme: Chateau by Ignacio Ricci.