<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments for IR Thoughts</title>
	<atom:link href="http://irthoughts.wordpress.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://irthoughts.wordpress.com</link>
	<description>Thoughts on Information Retrieval &#38; Data Mining</description>
	<lastBuildDate>Tue, 13 Oct 2009 14:03:26 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Internet Engineering I: Course Lectures by DNS Intelligence &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2009/09/21/internet-engineering-i-course-lectures/#comment-820</link>
		<dc:creator>DNS Intelligence &#171; IR Thoughts</dc:creator>
		<pubDate>Tue, 13 Oct 2009 14:03:26 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1100#comment-820</guid>
		<description>[...] Please check Lecture 8 [...]</description>
		<content:encoded><![CDATA[<p>[...] Please check Lecture 8 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Internet Engineering I: Course Lectures by Email Protocols &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2009/09/21/internet-engineering-i-course-lectures/#comment-819</link>
		<dc:creator>Email Protocols &#171; IR Thoughts</dc:creator>
		<pubDate>Mon, 05 Oct 2009 11:31:15 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1100#comment-819</guid>
		<description>[...] E. Garcia  If you are a student enrolled in the Internet Engineering I graduate course, check the Lecture 7 [...]</description>
		<content:encoded><![CDATA[<p>[...] E. Garcia  If you are a student enrolled in the Internet Engineering I graduate course, check the Lecture 7 [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on New Graduate Courses by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/09/01/new-graduate-courses/#comment-812</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Thu, 10 Sep 2009 15:14:37 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1087#comment-812</guid>
		<description>Hi, Hassan:

Thank you for stopping by.

There are at least two ways of addressing  OR query weights

(a) Using the Boolean Model 
(b) At the level of the inverted index

(a) For this question, a brief tutorial on the Boolean Model is given at http://www.miislita.com To find it, just do a search at the site for the keyword [boolean model].

(b) This question can be addressed at the level of the inverted index by identifying all posting lists containing at least one of the query terms and by not intersecting posting lists. The returned posting lists are used to construct the term-document matrix. Any flavor of scoring weights can be used to fill the cells. Vector analysis is then applied to the corresponding document vectors.

tf-IDF is used when we do not have relevance information feedback from users. If we need relevance information, then we can use any flavor of the Robertson-Sparck Jones Probabilistic Model. In addition, you can  tweak parameters using OKAPI BM-25.  A tutorial on the RSJ-PM is also available at Mi Islita. To find it, do a search at the site for the keyword [rsj-pm]

I hope this help.</description>
		<content:encoded><![CDATA[<p>Hi, Hassan:</p>
<p>Thank you for stopping by.</p>
<p>There are at least two ways of addressing  OR query weights</p>
<p>(a) Using the Boolean Model<br />
(b) At the level of the inverted index</p>
<p>(a) For this question, a brief tutorial on the Boolean Model is given at <a href="http://www.miislita.com" rel="nofollow">http://www.miislita.com</a> To find it, just do a search at the site for the keyword [boolean model].</p>
<p>(b) This question can be addressed at the level of the inverted index by identifying all posting lists containing at least one of the query terms and by not intersecting posting lists. The returned posting lists are used to construct the term-document matrix. Any flavor of scoring weights can be used to fill the cells. Vector analysis is then applied to the corresponding document vectors.</p>
<p>tf-IDF is used when we do not have relevance information feedback from users. If we need relevance information, then we can use any flavor of the Robertson-Sparck Jones Probabilistic Model. In addition, you can  tweak parameters using OKAPI BM-25.  A tutorial on the RSJ-PM is also available at Mi Islita. To find it, do a search at the site for the keyword [rsj-pm]</p>
<p>I hope this help.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on New Graduate Courses by hassan1388</title>
		<link>http://irthoughts.wordpress.com/2009/09/01/new-graduate-courses/#comment-811</link>
		<dc:creator>hassan1388</dc:creator>
		<pubDate>Thu, 10 Sep 2009 13:57:54 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1087#comment-811</guid>
		<description>Hi there,
I have some little questions and if you kindly answer me please, i&#039;ll be thankful.
I&#039;d like to know your idea about a problem. In vectorial model IR, how we can process the queries with OR or NOT logic operators betweens query words. I think that we give &quot;0&quot; as weight of term which do not have to exists in document. But in this case, it is always possible to return a document that contain this word even we have given 0 as weight of the term in query vector.
I&#039;d like to expand my question, if we have some feedbacks of user like the returned documents do not have to contain such words but it&#039;s not certain.
So how can we give more importance to some words of queries? I know that we do that by weighting method like tf-idf but they are not based on user feedback...

The last question, can you kindly please explain me if the significant words of a document are not frequent in that document but in other ones. (in contrast to tf-idf hypothesis). So they are not weighted as important by weighting methods based on frequent of terms. How can we weight them in a relevant way?
many thanks.
regards</description>
		<content:encoded><![CDATA[<p>Hi there,<br />
I have some little questions and if you kindly answer me please, i&#8217;ll be thankful.<br />
I&#8217;d like to know your idea about a problem. In vectorial model IR, how we can process the queries with OR or NOT logic operators betweens query words. I think that we give &#8220;0&#8243; as weight of term which do not have to exists in document. But in this case, it is always possible to return a document that contain this word even we have given 0 as weight of the term in query vector.<br />
I&#8217;d like to expand my question, if we have some feedbacks of user like the returned documents do not have to contain such words but it&#8217;s not certain.<br />
So how can we give more importance to some words of queries? I know that we do that by weighting method like tf-idf but they are not based on user feedback&#8230;</p>
<p>The last question, can you kindly please explain me if the significant words of a document are not frequent in that document but in other ones. (in contrast to tf-idf hypothesis). So they are not weighted as important by weighting methods based on frequent of terms. How can we weight them in a relevant way?<br />
many thanks.<br />
regards</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/06/17/seos-and-their-idf-myths/#comment-810</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Wed, 26 Aug 2009 18:45:27 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=219#comment-810</guid>
		<description>Hi optimmysql:

Thank you for stopping by.

That was the perception back in the early days of IR, repeated often by some authors unaware of current research. A lot of understanding has been realized since then. You might want to check 

http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/

Terms repeated x times are not necessarily x times more relevant to a document or means that the document is x times more pertinent to the term in question.

IDF is a crude measure of the discriminatory power of a term and is used in the absence of relevance information. 

Cheers</description>
		<content:encoded><![CDATA[<p>Hi optimmysql:</p>
<p>Thank you for stopping by.</p>
<p>That was the perception back in the early days of IR, repeated often by some authors unaware of current research. A lot of understanding has been realized since then. You might want to check </p>
<p><a href="http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/" rel="nofollow">http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/</a></p>
<p>Terms repeated x times are not necessarily x times more relevant to a document or means that the document is x times more pertinent to the term in question.</p>
<p>IDF is a crude measure of the discriminatory power of a term and is used in the absence of relevance information. </p>
<p>Cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths by optimmysql</title>
		<link>http://irthoughts.wordpress.com/2008/06/17/seos-and-their-idf-myths/#comment-809</link>
		<dc:creator>optimmysql</dc:creator>
		<pubDate>Wed, 26 Aug 2009 07:22:21 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=219#comment-809</guid>
		<description>I recently got in touch with TF-IDF concept and this is how I understand it in simple english.

 - TF is a measure of how relevant a document is as compared to a term. Assumption here is more relevant documents will have the term repeated often.

 - IDF measures how important (or how specialized subject) is the term itself. Assumption here is that the terms that occur too frequently among documents constitute lesser specialized subjects.

Would appreciate if someone can verify my understanding.</description>
		<content:encoded><![CDATA[<p>I recently got in touch with TF-IDF concept and this is how I understand it in simple english.</p>
<p> &#8211; TF is a measure of how relevant a document is as compared to a term. Assumption here is more relevant documents will have the term repeated often.</p>
<p> &#8211; IDF measures how important (or how specialized subject) is the term itself. Assumption here is that the terms that occur too frequently among documents constitute lesser specialized subjects.</p>
<p>Would appreciate if someone can verify my understanding.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Thesaurus as a Complex Network by marianasoffer</title>
		<link>http://irthoughts.wordpress.com/2009/08/06/thesaurus-as-a-complex-network/#comment-808</link>
		<dc:creator>marianasoffer</dc:creator>
		<pubDate>Mon, 17 Aug 2009 03:59:49 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1070#comment-808</guid>
		<description>Thanks a lot</description>
		<content:encoded><![CDATA[<p>Thanks a lot</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Thesaurus as a Complex Network by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/08/06/thesaurus-as-a-complex-network/#comment-807</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 14 Aug 2009 13:47:06 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1070#comment-807</guid>
		<description>I assume you wanted to post this in the Vector Notation post, but I will answer now.

Determinants are normally specified by delimiting the corresponding matrix  symbol with pipes (vertical lines) like this: 

&lt;strong&gt;&#124;A&#124;&lt;/strong&gt;. This was apparently adopted by Cayley in 1841.

For a brief history of early matrix notation check here:
 
&lt;a href=&quot;http://jeff560.tripod.com/matrices.html&quot; rel=&quot;nofollow&quot;&gt;Earliest Uses of Symbols for Matrices and Vectors&lt;/a&gt; 

&lt;a href=&quot;http://www-gap.dcs.st-and.ac.uk/~history/HistTopics/Matrices_and_determinants.html&quot; rel=&quot;nofollow&quot;&gt;Matrices and determinants&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>I assume you wanted to post this in the Vector Notation post, but I will answer now.</p>
<p>Determinants are normally specified by delimiting the corresponding matrix  symbol with pipes (vertical lines) like this: </p>
<p><strong>|A|</strong>. This was apparently adopted by Cayley in 1841.</p>
<p>For a brief history of early matrix notation check here:</p>
<p><a href="http://jeff560.tripod.com/matrices.html" rel="nofollow">Earliest Uses of Symbols for Matrices and Vectors</a> </p>
<p><a href="http://www-gap.dcs.st-and.ac.uk/~history/HistTopics/Matrices_and_determinants.html" rel="nofollow">Matrices and determinants</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Vector Notation by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/08/10/vector-notation/#comment-806</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 14 Aug 2009 13:36:10 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1079#comment-806</guid>
		<description>Different authors use different notations for sparse matrices. Most prefer to use the APS notation for matrices and then specify in the text the type of matrix used.</description>
		<content:encoded><![CDATA[<p>Different authors use different notations for sparse matrices. Most prefer to use the APS notation for matrices and then specify in the text the type of matrix used.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Thesaurus as a Complex Network by marianasoffer</title>
		<link>http://irthoughts.wordpress.com/2009/08/06/thesaurus-as-a-complex-network/#comment-805</link>
		<dc:creator>marianasoffer</dc:creator>
		<pubDate>Tue, 11 Aug 2009 06:48:22 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1070#comment-805</guid>
		<description>I am working with opinion mining and nlp, and I always have trouble with matrix and vectors because I do not have a strong mathematical background regarding that. One question I want to ask you? do you have several different kind of notations to handle these data, if so how many and what are they, which are the most common ones of them. For example I always have problems with understanding if the equation is asking me go solve the determinant or the other sum of columns you do, which I do not remember now the name, depending on using one straight line or two on each side of the matrix you are going to process.</description>
		<content:encoded><![CDATA[<p>I am working with opinion mining and nlp, and I always have trouble with matrix and vectors because I do not have a strong mathematical background regarding that. One question I want to ask you? do you have several different kind of notations to handle these data, if so how many and what are they, which are the most common ones of them. For example I always have problems with understanding if the equation is asking me go solve the determinant or the other sum of columns you do, which I do not remember now the name, depending on using one straight line or two on each side of the matrix you are going to process.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Vector Notation by marianasoffer</title>
		<link>http://irthoughts.wordpress.com/2009/08/10/vector-notation/#comment-804</link>
		<dc:creator>marianasoffer</dc:creator>
		<pubDate>Tue, 11 Aug 2009 06:44:50 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1079#comment-804</guid>
		<description>This is great thanks a lot, I am working with opinion mining and nlp, and I always have trouble with matrix and vectors because I do not have a strong mathematical background regarding that. One question I am going to ask, it might be a little stupid but here it goes, how do you handle the notation of a sparse matrix while you do the operation that smooth the terms for example, the svd one.</description>
		<content:encoded><![CDATA[<p>This is great thanks a lot, I am working with opinion mining and nlp, and I always have trouble with matrix and vectors because I do not have a strong mathematical background regarding that. One question I am going to ask, it might be a little stupid but here it goes, how do you handle the notation of a sparse matrix while you do the operation that smooth the terms for example, the svd one.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on PCA Is Not LSI by Thesaurus as a Complex Network &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2007/05/05/pca-is-not-lsi/#comment-803</link>
		<dc:creator>Thesaurus as a Complex Network &#171; IR Thoughts</dc:creator>
		<pubDate>Thu, 06 Aug 2009 17:27:02 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/2007/05/05/pca-is-not-lsi/#comment-803</guid>
		<description>[...] PCA is not LSI. See http://irthoughts.wordpress.com/2007/05/05/pca-is-not-lsi/ Possibly related posts: (automatically generated)LSA: A Goldmine for Educators and Curriculum [...]</description>
		<content:encoded><![CDATA[<p>[...] PCA is not LSI. See <a href="http://irthoughts.wordpress.com/2007/05/05/pca-is-not-lsi/" rel="nofollow">http://irthoughts.wordpress.com/2007/05/05/pca-is-not-lsi/</a> Possibly related posts: (automatically generated)LSA: A Goldmine for Educators and Curriculum [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Most Influential Paper that Gerard Salton Never Wrote by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/07/15/the-most-influential-paper-that-gerard-salton-never-wrote/#comment-802</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Wed, 05 Aug 2009 11:32:59 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1026#comment-802</guid>
		<description>Hi, William:

Thank you for stopping by. 

You blogged an interesting discussion on the topic. Your points raised are quite valid. 

Students beware: if the raw data is flawed, same goes for the analysis. I believe this might go for citation analysis from wrong citations or any analysis for that matter.</description>
		<content:encoded><![CDATA[<p>Hi, William:</p>
<p>Thank you for stopping by. </p>
<p>You blogged an interesting discussion on the topic. Your points raised are quite valid. </p>
<p>Students beware: if the raw data is flawed, same goes for the analysis. I believe this might go for citation analysis from wrong citations or any analysis for that matter.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Most Influential Paper that Gerard Salton Never Wrote by Citing papers that you&#8217;ve never read &#8212; or that were never written &#171; IREvalEtAl</title>
		<link>http://irthoughts.wordpress.com/2009/07/15/the-most-influential-paper-that-gerard-salton-never-wrote/#comment-801</link>
		<dc:creator>Citing papers that you&#8217;ve never read &#8212; or that were never written &#171; IREvalEtAl</dc:creator>
		<pubDate>Wed, 05 Aug 2009 05:39:00 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1026#comment-801</guid>
		<description>[...] citer has made a mistake, and this has been blindly carried forward by their imitators. Via Edel Garcia comes The Most Influential Paper Gerard Salton Never Wrote, an article by David Dubin tracing the [...]</description>
		<content:encoded><![CDATA[<p>[...] citer has made a mistake, and this has been blindly carried forward by their imitators. Via Edel Garcia comes The Most Influential Paper Gerard Salton Never Wrote, an article by David Dubin tracing the [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Random Notes Before School Starts by cre8pc</title>
		<link>http://irthoughts.wordpress.com/2009/08/03/random-notes-before-school-starts/#comment-800</link>
		<dc:creator>cre8pc</dc:creator>
		<pubDate>Mon, 03 Aug 2009 18:23:06 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1066#comment-800</guid>
		<description>Many thanks for that.  I couldn&#039;t let it roll off because it seemed that so many very good people were taking a serious hit.  I know Rand and feel quite sure he was unhappy at how his community treated me personally. Since he is also one of the A-listers they were so angry with, talk about being rude to your host!</description>
		<content:encoded><![CDATA[<p>Many thanks for that.  I couldn&#8217;t let it roll off because it seemed that so many very good people were taking a serious hit.  I know Rand and feel quite sure he was unhappy at how his community treated me personally. Since he is also one of the A-listers they were so angry with, talk about being rude to your host!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Searchmageddon: Microsoft to Buy Yahoo! by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/02/01/searchmaggedon-microsofts-to-buy-yahoo/#comment-799</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Sat, 01 Aug 2009 12:25:42 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=161#comment-799</guid>
		<description>Well, a lot of things happened.

1. Yahoo rejected Microsoft proposal to be acquired.

2. Many departured from Yahoo!

3. Yang was forced to leave his post at Yahoo!

4. More departures at Yahoo!

5. Microsoft was &quot;no longer interested&quot; in Yahoo!

6. More brains leaving Yahoo!

7. Finally Microsoft adquired only the search side of Yahoo!

It was a matter of time. I told you so.

I don&#039;t think this will end here. Although not limited to &quot;Microhoo&quot;, but there is still room for additional consolidation of services gravitating around search as a user activity. There are still plenty of search companies to buy or kill from the inside out.

It is a matter of time.</description>
		<content:encoded><![CDATA[<p>Well, a lot of things happened.</p>
<p>1. Yahoo rejected Microsoft proposal to be acquired.</p>
<p>2. Many departured from Yahoo!</p>
<p>3. Yang was forced to leave his post at Yahoo!</p>
<p>4. More departures at Yahoo!</p>
<p>5. Microsoft was &#8220;no longer interested&#8221; in Yahoo!</p>
<p>6. More brains leaving Yahoo!</p>
<p>7. Finally Microsoft adquired only the search side of Yahoo!</p>
<p>It was a matter of time. I told you so.</p>
<p>I don&#8217;t think this will end here. Although not limited to &#8220;Microhoo&#8221;, but there is still room for additional consolidation of services gravitating around search as a user activity. There are still plenty of search companies to buy or kill from the inside out.</p>
<p>It is a matter of time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Most Influential Paper that Gerard Salton Never Wrote by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/07/15/the-most-influential-paper-that-gerard-salton-never-wrote/#comment-798</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Thu, 23 Jul 2009 00:51:05 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1026#comment-798</guid>
		<description>Hi, Dave:

Thank you for stopping by and commenting. I understand what you&#039;re saying. 

Regardless, sloppy referencing/reviewing of papers is what it is --no more, no less.

Between A and B above, I prefer A. I never like long sentences, particularly with three &quot;my&quot;s.</description>
		<content:encoded><![CDATA[<p>Hi, Dave:</p>
<p>Thank you for stopping by and commenting. I understand what you&#8217;re saying. </p>
<p>Regardless, sloppy referencing/reviewing of papers is what it is &#8211;no more, no less.</p>
<p>Between A and B above, I prefer A. I never like long sentences, particularly with three &#8220;my&#8221;s.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on The Most Influential Paper that Gerard Salton Never Wrote by ddubin1</title>
		<link>http://irthoughts.wordpress.com/2009/07/15/the-most-influential-paper-that-gerard-salton-never-wrote/#comment-797</link>
		<dc:creator>ddubin1</dc:creator>
		<pubDate>Wed, 22 Jul 2009 23:57:04 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1026#comment-797</guid>
		<description>Thanks for your kind words about my article; I&#039;m glad you found it interesting. But you know, it really wasn&#039;t my intention to wag my finger at my colleagues&#039; sloppy citation practices. I think references to Salton&#039;s phantom paper point to a basic theoretical problem in what the VSM actually models, and how we understand what IR models do for us.

It&#039;s the difference between saying something like:

A) I wrote some software, and here&#039;s a geometric model to help you understand exactly what it does.

as opposed to...

B) I want to advance a substantive claim about the nature of documents and queries. My model makes some simplifying assumptions, but its purpose is to represent something real about the world in which my software functions -- something that&#039;s independent of my system design decisions.

I see a basic confusion between approaches A and B that shows up surprisingly often in information science research.

Dave Dubin</description>
		<content:encoded><![CDATA[<p>Thanks for your kind words about my article; I&#8217;m glad you found it interesting. But you know, it really wasn&#8217;t my intention to wag my finger at my colleagues&#8217; sloppy citation practices. I think references to Salton&#8217;s phantom paper point to a basic theoretical problem in what the VSM actually models, and how we understand what IR models do for us.</p>
<p>It&#8217;s the difference between saying something like:</p>
<p>A) I wrote some software, and here&#8217;s a geometric model to help you understand exactly what it does.</p>
<p>as opposed to&#8230;</p>
<p>B) I want to advance a substantive claim about the nature of documents and queries. My model makes some simplifying assumptions, but its purpose is to represent something real about the world in which my software functions &#8212; something that&#8217;s independent of my system design decisions.</p>
<p>I see a basic confusion between approaches A and B that shows up surprisingly often in information science research.</p>
<p>Dave Dubin</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Microsoft, Inter-Metro to Co-Launch a MIC by Official: MIC Puerto Rico &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2009/04/29/microsoft-inter-metro-to-co-launch-a-mic/#comment-781</link>
		<dc:creator>Official: MIC Puerto Rico &#171; IR Thoughts</dc:creator>
		<pubDate>Tue, 23 Jun 2009 15:32:21 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=900#comment-781</guid>
		<description>[...] will be co-launching with Interamerican University of Puerto Rico, Metropolitan Campus the Microsoft Innovation Center (MIC) of Puerto [...]</description>
		<content:encoded><![CDATA[<p>[...] will be co-launching with Interamerican University of Puerto Rico, Metropolitan Campus the Microsoft Innovation Center (MIC) of Puerto [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on IR Quiz: Matrices by What is a Similarity Matrix? &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2009/05/13/ir-quiz-matrices/#comment-779</link>
		<dc:creator>What is a Similarity Matrix? &#171; IR Thoughts</dc:creator>
		<pubDate>Tue, 16 Jun 2009 14:24:02 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=933#comment-779</guid>
		<description>[...] This information will help those that took the IR Quiz on Matrices to realize how well they [...]</description>
		<content:encoded><![CDATA[<p>[...] This information will help those that took the IR Quiz on Matrices to realize how well they [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on IDF and Vector Space Models by SEO ROI &#187; Inverse Document Frequency In Plain English - Dr E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/03/11/idf-and-vector-space-models/#comment-774</link>
		<dc:creator>SEO ROI &#187; Inverse Document Frequency In Plain English - Dr E. Garcia</dc:creator>
		<pubDate>Thu, 11 Jun 2009 18:00:55 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=780#comment-774</guid>
		<description>[...] p.s. Read also Dr Garcia&#8217;s post recapping the SIDIM conference and sharing an explanation of IDF and &#8220;vector space [...]</description>
		<content:encoded><![CDATA[<p>[...] p.s. Read also Dr Garcia&#8217;s post recapping the SIDIM conference and sharing an explanation of IDF and &#8220;vector space [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SIDIM XXIV Conference by SEO ROI &#187; Inverse Document Frequency In Plain English - Dr E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/03/05/sidim-xxiv-conference/#comment-773</link>
		<dc:creator>SEO ROI &#187; Inverse Document Frequency In Plain English - Dr E. Garcia</dc:creator>
		<pubDate>Thu, 11 Jun 2009 18:00:48 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=768#comment-773</guid>
		<description>[...]  http://irthoughts.wordpress.com/2009/03/05/sidim-xxiv-conference/ [...]</description>
		<content:encoded><![CDATA[<p>[...]  <a href="http://irthoughts.wordpress.com/2009/03/05/sidim-xxiv-conference/" rel="nofollow">http://irthoughts.wordpress.com/2009/03/05/sidim-xxiv-conference/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by SEO ROI &#187; Inverse Document Frequency In Plain English - Dr E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-772</link>
		<dc:creator>SEO ROI &#187; Inverse Document Frequency In Plain English - Dr E. Garcia</dc:creator>
		<pubDate>Thu, 11 Jun 2009 18:00:40 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-772</guid>
		<description>[...] Check here:  http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/ [...]</description>
		<content:encoded><![CDATA[<p>[...] Check here:  <a href="http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/" rel="nofollow">http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on AIRWeb Course Announcement by SEO ROI &#187; Inverse Document Frequency In Plain English - Dr E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/04/02/airweb-course-announcement/#comment-771</link>
		<dc:creator>SEO ROI &#187; Inverse Document Frequency In Plain English - Dr E. Garcia</dc:creator>
		<pubDate>Thu, 11 Jun 2009 18:00:34 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=814#comment-771</guid>
		<description>[...] BTW: I am teaching a new graduate course on Web Spam wherein SEO Snakeoil Myths and case studies of these will be dissected. The syllabus is announced at http://irthoughts.wordpress.com/2009/04/02/airweb-course-announcement/ [...]</description>
		<content:encoded><![CDATA[<p>[...] BTW: I am teaching a new graduate course on Web Spam wherein SEO Snakeoil Myths and case studies of these will be dissected. The syllabus is announced at <a href="http://irthoughts.wordpress.com/2009/04/02/airweb-course-announcement/" rel="nofollow">http://irthoughts.wordpress.com/2009/04/02/airweb-course-announcement/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on On Term Repetition and Local Models by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/05/27/on-term-repetition-and-local-models/#comment-769</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Sat, 30 May 2009 12:51:52 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=968#comment-769</guid>
		<description>You&#039;re welcome.

An example of implementations that take plain raw frequencies for local term weights without incorporating attenuation transformations -and thus are susceptible to keyword repetition- are some latent semantic indexing (LSI) and Vector Space models as well as the readability metric known as keyword density (KD). </description>
		<content:encoded><![CDATA[<p>You&#8217;re welcome.</p>
<p>An example of implementations that take plain raw frequencies for local term weights without incorporating attenuation transformations -and thus are susceptible to keyword repetition- are some latent semantic indexing (LSI) and Vector Space models as well as the readability metric known as keyword density (KD).</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on On Term Repetition and Local Models by marianasoffer</title>
		<link>http://irthoughts.wordpress.com/2009/05/27/on-term-repetition-and-local-models/#comment-768</link>
		<dc:creator>marianasoffer</dc:creator>
		<pubDate>Sat, 30 May 2009 08:18:20 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=968#comment-768</guid>
		<description>I really apreciate you spent your time answering my doubts. Now I completelly got it. Txs</description>
		<content:encoded><![CDATA[<p>I really apreciate you spent your time answering my doubts. Now I completelly got it. Txs</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on On Term Repetition and Local Models by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/05/27/on-term-repetition-and-local-models/#comment-767</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Sat, 30 May 2009 00:04:11 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=968#comment-767</guid>
		<description>It would be hard to address PAIR with VSMs for at least two reasons:

(a) most VSMs assume term independence
(b) most are bag-of-words models.

The point to be made is that local models that take term repetition (raw frequencies) as local weights to be incorporated in Vector Space Models are susceptible to manipulations and thus are not reliable. The article touches on some of these issues.</description>
		<content:encoded><![CDATA[<p>It would be hard to address PAIR with VSMs for at least two reasons:</p>
<p>(a) most VSMs assume term independence<br />
(b) most are bag-of-words models.</p>
<p>The point to be made is that local models that take term repetition (raw frequencies) as local weights to be incorporated in Vector Space Models are susceptible to manipulations and thus are not reliable. The article touches on some of these issues.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on On Term Repetition and Local Models by marianasoffer</title>
		<link>http://irthoughts.wordpress.com/2009/05/27/on-term-repetition-and-local-models/#comment-766</link>
		<dc:creator>marianasoffer</dc:creator>
		<pubDate>Fri, 29 May 2009 19:42:06 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=968#comment-766</guid>
		<description>I was asking if you use this in vector space model, and what do you use it for?</description>
		<content:encoded><![CDATA[<p>I was asking if you use this in vector space model, and what do you use it for?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on On Term Repetition and Local Models by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/05/27/on-term-repetition-and-local-models/#comment-765</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 29 May 2009 19:29:02 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=968#comment-765</guid>
		<description>I&#039;m not sure what you are asking. If you ask if I use KD, the answer is no.</description>
		<content:encoded><![CDATA[<p>I&#8217;m not sure what you are asking. If you ask if I use KD, the answer is no.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on On Term Repetition and Local Models by marianasoffer</title>
		<link>http://irthoughts.wordpress.com/2009/05/27/on-term-repetition-and-local-models/#comment-764</link>
		<dc:creator>marianasoffer</dc:creator>
		<pubDate>Fri, 29 May 2009 03:22:47 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=968#comment-764</guid>
		<description>what does it has to do with vsm? do you use them to calculate this metrics?</description>
		<content:encoded><![CDATA[<p>what does it has to do with vsm? do you use them to calculate this metrics?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Why IDF is Expressed Using Logs by Understanding TFIDF &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2009/04/15/why-idf-is-expressed-using-logs/#comment-763</link>
		<dc:creator>Understanding TFIDF &#171; IR Thoughts</dc:creator>
		<pubDate>Fri, 15 May 2009 00:35:33 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=865#comment-763</guid>
		<description>[...] Why IDF is Expressed using Logs? [...]</description>
		<content:encoded><![CDATA[<p>[...] Why IDF is Expressed using Logs? [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on RSJ-PM: Probabilistic Model Tutorial by Understanding TFIDF &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2009/03/30/rsj-pm-probabilistic-model-tutorial/#comment-762</link>
		<dc:creator>Understanding TFIDF &#171; IR Thoughts</dc:creator>
		<pubDate>Fri, 15 May 2009 00:35:31 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=811#comment-762</guid>
		<description>[...] RSJ-PM: Probabilistic Model Tutorial [...]</description>
		<content:encoded><![CDATA[<p>[...] RSJ-PM: Probabilistic Model Tutorial [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Vector Normalization with Excel by Vector Normalization with Excel - Part II &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2009/03/04/vector-normalization-with-excel/#comment-758</link>
		<dc:creator>Vector Normalization with Excel - Part II &#171; IR Thoughts</dc:creator>
		<pubDate>Thu, 07 May 2009 12:07:00 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=764#comment-758</guid>
		<description>[...] Normalization with Excel - Part&#160;II By E. Garcia  Back in March, we explained how to normalize column vectors with Excel. But, what about normalizing row vectors in Excel? This question is addressed in the current QA [...]</description>
		<content:encoded><![CDATA[<p>[...] Normalization with Excel &#8211; Part&nbsp;II By E. Garcia  Back in March, we explained how to normalize column vectors with Excel. But, what about normalizing row vectors in Excel? This question is addressed in the current QA [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-754</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Mon, 20 Apr 2009 14:02:55 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-754</guid>
		<description>There is no need to blame Google for SEO hearsays and propaganda. SEOs are the one that have claimed Google uses LSI, not Google. Applying LSI to the Web would be hard if not impossible. As said before, you don&#039;t have to buy my words, just check here:

http://www.seo-blog.com/latent-semantic-index-lsi-myth.php

The fact that you claimed to use LSI without knowing the IR facts behind it, reinforces my perception that you don&#039;t know what is LSI or how it works at all. 

Probably you are forming an opinion of what you think is LSI or simply repeating what you have heard from other SEOs. You are now blaming others for inducing you as you said &#039;parroting&#039;. It is easy to blame others for repeating urban legends. That&#039;s the easy way out.

When it comes to SEO claims about LSI, there is one SEO side that claims (like you and ThemeZoom) to use LSI without really understanding what is LSI. And there is another side that uses incorrect arguments to debunk LSI (as Stompernet). So indeed both sides are getting the LSI Myth. The title of this post is therefore more than appropriate. You can nickpick all you want on this to save some face, and that&#039;s understandable. I don&#039;t expect less from marketers.

This blog is about debunking SEO/IR myths through information retrieval knowledge. Some of these are then used as case studies to be tested and dissected in my IR graduate courses. Check here:
http://irthoughts.wordpress.com/2009/04/02/airweb-course-announcement/</description>
		<content:encoded><![CDATA[<p>There is no need to blame Google for SEO hearsays and propaganda. SEOs are the one that have claimed Google uses LSI, not Google. Applying LSI to the Web would be hard if not impossible. As said before, you don&#8217;t have to buy my words, just check here:</p>
<p><a href="http://www.seo-blog.com/latent-semantic-index-lsi-myth.php" rel="nofollow">http://www.seo-blog.com/latent-semantic-index-lsi-myth.php</a></p>
<p>The fact that you claimed to use LSI without knowing the IR facts behind it, reinforces my perception that you don&#8217;t know what is LSI or how it works at all. </p>
<p>Probably you are forming an opinion of what you think is LSI or simply repeating what you have heard from other SEOs. You are now blaming others for inducing you as you said &#8216;parroting&#8217;. It is easy to blame others for repeating urban legends. That&#8217;s the easy way out.</p>
<p>When it comes to SEO claims about LSI, there is one SEO side that claims (like you and ThemeZoom) to use LSI without really understanding what is LSI. And there is another side that uses incorrect arguments to debunk LSI (as Stompernet). So indeed both sides are getting the LSI Myth. The title of this post is therefore more than appropriate. You can nickpick all you want on this to save some face, and that&#8217;s understandable. I don&#8217;t expect less from marketers.</p>
<p>This blog is about debunking SEO/IR myths through information retrieval knowledge. Some of these are then used as case studies to be tested and dissected in my IR graduate courses. Check here:<br />
<a href="http://irthoughts.wordpress.com/2009/04/02/airweb-course-announcement/" rel="nofollow">http://irthoughts.wordpress.com/2009/04/02/airweb-course-announcement/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by bnweb</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-753</link>
		<dc:creator>bnweb</dc:creator>
		<pubDate>Mon, 20 Apr 2009 08:08:16 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-753</guid>
		<description>That may well be true but you&#039;d have to blame Google for that!  What we implement is some form of Latent Indexed Symantics, our implementation of what I call LSI is based on what Google have said about the subject so in honesty, yes I may be wrong but I&#039;m just parroting Google.

Though technically and theoretically it may well actually be true that this is not LSI but as I say your fight is therefore with them.  I only understand IR up to a point and I wouldn&#039;t claim to be an IR expert and in fact from my point of view I don&#039;t really care about IR my main concern is with SEO and getting rankings for our clients.

But the title of your post has nothing to do with LSI in the IR world and to be frank unless you are actually a practising expert in the seosphere as you call it should you be commeting at all because I find your comment highly misleading as an SEO expert and my question to you is should you really be talking at all about how some form of LSI is applied by Google?  So your title &quot;Finally SEOs are getting the LSI Myth!! is factually totally incorrect, from a seosphere point of view, as is the Stomper Video which I have to say is by far the most inaccurate video they&#039;ve ever brought out.</description>
		<content:encoded><![CDATA[<p>That may well be true but you&#8217;d have to blame Google for that!  What we implement is some form of Latent Indexed Symantics, our implementation of what I call LSI is based on what Google have said about the subject so in honesty, yes I may be wrong but I&#8217;m just parroting Google.</p>
<p>Though technically and theoretically it may well actually be true that this is not LSI but as I say your fight is therefore with them.  I only understand IR up to a point and I wouldn&#8217;t claim to be an IR expert and in fact from my point of view I don&#8217;t really care about IR my main concern is with SEO and getting rankings for our clients.</p>
<p>But the title of your post has nothing to do with LSI in the IR world and to be frank unless you are actually a practising expert in the seosphere as you call it should you be commeting at all because I find your comment highly misleading as an SEO expert and my question to you is should you really be talking at all about how some form of LSI is applied by Google?  So your title &#8220;Finally SEOs are getting the LSI Myth!! is factually totally incorrect, from a seosphere point of view, as is the Stomper Video which I have to say is by far the most inaccurate video they&#8217;ve ever brought out.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-752</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Sat, 18 Apr 2009 03:19:48 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-752</guid>
		<description>Whether you might be getting good results or not is totally meaningless to the discussion when it comes to explaining what is LSI or how it works.

The problem with your stand is that you keep making references to the theory of LSI as according to &#039;Google, me, you and others&#039; when in fact, generally speaking there is only one theory and implementation of LSI and that is the one described in the Information Retrieval literature and in the Bellcore (now Telcordia) patent. You might find some variants of implementing LSI in the IR literature or in the USPTO (United State Patents Office), but all are based on applying the SVD algorithm as described in my previous reply.

You claimed to use LSI. Well, to implement LSI you MUST use the SVD algorithm to reconstruct an initial term-document matrix as a reduced representation of the original matrix. There is no work around this. If you don’t do that then you ARE NOT using LSI, Period. If you don’t grasp this simple concept, then I&#039;m afraid to say that chances are you don’t know what is LSI after all.

I don’t know what SEO strategies you are using to get good results and good for you, but whatever it is or whatever you might want to call it is not LSI. Many SEOs use some particular techniques for related terms and synonyms in their optimization strategies and dare to call that “latent semantic indexing”. 

Like ThemeZoom and others, you can call these SEO strategies whatever you want to call them for marketing purposes, self-promotion, or to extract money from naive clients, but aren’t LSI. At the end of the day, miscalling a valid concept (C1) as a phony one (C2) or vice versa is pure SEO propaganda.

You don’t have to buy my words. For a second opinion, read from any IR colleague a tutorial on LSI, but stay away from misleading SEO “LSI tutorials and videos”.</description>
		<content:encoded><![CDATA[<p>Whether you might be getting good results or not is totally meaningless to the discussion when it comes to explaining what is LSI or how it works.</p>
<p>The problem with your stand is that you keep making references to the theory of LSI as according to &#8216;Google, me, you and others&#8217; when in fact, generally speaking there is only one theory and implementation of LSI and that is the one described in the Information Retrieval literature and in the Bellcore (now Telcordia) patent. You might find some variants of implementing LSI in the IR literature or in the USPTO (United State Patents Office), but all are based on applying the SVD algorithm as described in my previous reply.</p>
<p>You claimed to use LSI. Well, to implement LSI you MUST use the SVD algorithm to reconstruct an initial term-document matrix as a reduced representation of the original matrix. There is no work around this. If you don’t do that then you ARE NOT using LSI, Period. If you don’t grasp this simple concept, then I&#8217;m afraid to say that chances are you don’t know what is LSI after all.</p>
<p>I don’t know what SEO strategies you are using to get good results and good for you, but whatever it is or whatever you might want to call it is not LSI. Many SEOs use some particular techniques for related terms and synonyms in their optimization strategies and dare to call that “latent semantic indexing”. </p>
<p>Like ThemeZoom and others, you can call these SEO strategies whatever you want to call them for marketing purposes, self-promotion, or to extract money from naive clients, but aren’t LSI. At the end of the day, miscalling a valid concept (C1) as a phony one (C2) or vice versa is pure SEO propaganda.</p>
<p>You don’t have to buy my words. For a second opinion, read from any IR colleague a tutorial on LSI, but stay away from misleading SEO “LSI tutorials and videos”.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by bnweb</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-750</link>
		<dc:creator>bnweb</dc:creator>
		<pubDate>Fri, 17 Apr 2009 22:06:03 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-750</guid>
		<description>My main point was the theory is fine but what about the practice?  As I&#039;ve stated previously in my post we&#039;ve tried to understand the theory of LSI.  Then we&#039;ve tried to understand how Google may possibly apply what we&#039;ve learned then practically apply it and test it on dozens off project and we&#039;ve had positive results every time.  We actually carefully and deliberately isolate what we do with any potential LSI algorthym to insure Google are altering rankings based on what we are testing.  I could understand that we may fluke a portion of our results so are you therefore suggesting we fluke them every time?  If not how would you explain our results.

Let me clarify a previous point I was alluding to that technically when I&#039;m talking about LSI actually Google do not use LSI, at least theoretically what you describe to be LSI.  Moreover because of the limitations of it&#039;s operating environment this would be impossible.  What Google use is some of your theory fitted into the environmental limitations that provides a best fit version of LSI.  So Ultimately when your saying Google don&#039;t implement LSI at all because this is your theory of how LSI works then your theory works within my experience because I&#039;m saying that&#039;s not how Google does it.  The same with the Stompernet Video - they&#039;re saying Google doesn&#039;t apply LSI because it couldn&#039;t possibly do this.  And I&#039;m agreeing with them that it doesn&#039;t do that but what I&#039;m saying, particularly in relation to the Stompernet Video  is in my experience Google is doing something radically different anyway so when the theory of what both you and saying backs up my experience</description>
		<content:encoded><![CDATA[<p>My main point was the theory is fine but what about the practice?  As I&#8217;ve stated previously in my post we&#8217;ve tried to understand the theory of LSI.  Then we&#8217;ve tried to understand how Google may possibly apply what we&#8217;ve learned then practically apply it and test it on dozens off project and we&#8217;ve had positive results every time.  We actually carefully and deliberately isolate what we do with any potential LSI algorthym to insure Google are altering rankings based on what we are testing.  I could understand that we may fluke a portion of our results so are you therefore suggesting we fluke them every time?  If not how would you explain our results.</p>
<p>Let me clarify a previous point I was alluding to that technically when I&#8217;m talking about LSI actually Google do not use LSI, at least theoretically what you describe to be LSI.  Moreover because of the limitations of it&#8217;s operating environment this would be impossible.  What Google use is some of your theory fitted into the environmental limitations that provides a best fit version of LSI.  So Ultimately when your saying Google don&#8217;t implement LSI at all because this is your theory of how LSI works then your theory works within my experience because I&#8217;m saying that&#8217;s not how Google does it.  The same with the Stompernet Video &#8211; they&#8217;re saying Google doesn&#8217;t apply LSI because it couldn&#8217;t possibly do this.  And I&#8217;m agreeing with them that it doesn&#8217;t do that but what I&#8217;m saying, particularly in relation to the Stompernet Video  is in my experience Google is doing something radically different anyway so when the theory of what both you and saying backs up my experience</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-749</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 17 Apr 2009 19:29:55 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-749</guid>
		<description>&lt;strong&gt;About LSI&lt;/strong&gt;

To use/apply LSI on a database collection of documents of size &lt;strong&gt;D&lt;/strong&gt;, you MUST HAVE:

1. Access to every single document of the collection to be analyzed (Something quite hard to do at the scale of the Web.)

2. Construct a term-document matrix &lt;strong&gt;A&lt;/strong&gt; and populate this with term weights &lt;strong&gt;w&lt;/strong&gt; precomputed according to a particular term weight framework. Said framework can be tf*idf-based or based on many variants of this, or it can be based on entropy or relevance information.

3. Decide how many dimensions &lt;strong&gt;k&lt;/strong&gt; (singular values) to keep. The optimum number of dimensions to keep is obtained by trial and error and is valid only for the collection under inspection.

4. Apply the Singular Value Decompostion Algorithm (SVD) in order to decompose the original matrix.

5. From the resultant left and right eigenvector matrices, apply clustering techniques to identify clusters of both documents and terms.

6. If the goal is to rank documents, sort these using query-document vector cosine similarity values in the LSI space.

Any LSI outcome is valid only for the collection of size &lt;strong&gt;D&lt;/strong&gt; used, the orignal matrix &lt;strong&gt;A&lt;/strong&gt; used, the term weight &lt;strong&gt;w&lt;/strong&gt; scoring framework used, and the number of dimensions &lt;strong&gt;k&lt;/strong&gt; used when applying the SVD algorithm.

If these initial conditions change in time the outcome will also change. In addition, any change in the values of the cells of the original matrix &lt;strong&gt;A&lt;/strong&gt; (even a single change) will provoke a redistribution of weights in the reconstructed matrix obtained from the SVD algorithm. The outcome of said redistribution cannot be predicted as all weights in the truncated matrix will be interrelated. Said weights determines the creation and strength of the co-occurrence connectivity paths observed. And we still don’t know when or how many times a particular document of the collection has been altered by its owners at any given point in time. 

Because all of the above, this is why any attempt at designing “LSI-Friendly” documents is plain snakeoil. Additional information is provided at:

http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/


&lt;strong&gt;About the video&lt;/strong&gt;

With regard to Stompernet’s video, the only thing I share is the author’s position regarding that LSI cannot be used/manipulated by SEOs to optimize web pages. However, the arguments used are plain wrong and it would be hard for others to learn any valid knowledge out of these.

The problem with the video is that, as mentioned above, before trying to debunk something at least we need to know and understand what exactly we are trying to debunk or refute. It appears to me now that Rohdes does not really understand what is LSI or how it works at all, so as many of the posters following him.

Some of the arguments used in his video regarding what is LSI or how it works are false and unnecessarily adds to the confusion many SEOs have regarding what is/is not LSI. For example, that SVD works by reducing a vocabulary and that LSI works at the level of semantics, meaning, etc. The later were ideas stretched in the early LSI literature, dated back 20 years or so. Today we know exactly what the LSI algorithm does and why. 

I cannot prejudge at this point whether Stompernet put out a confusing message to make some noise. If that was the intention, it is just marketing 101. It is not the first time some of their “faculty” comes with non sense or misquotes from IR papers, so as many other SEOs looking to put out a product or a service, at the expense of trying to debunk the competition’s hearsay. Debunking incorrect ideas with incorrect ideas is fraudulent teaching simple because the problem is that two wrongs don’t make things right.

Adding to the confusion is the fact that many SEOs, in an effort to market whatever they sell, call LSI something that is not. This is a standard propaganda practice and works as follows: 

Let say we have two concepts C1 and C2. C1 is a legit, proven concept. C2 is bogus or made out of thin air. To promote C2, rename it as C1. Any argument against C2 is diverted invoking the underlying true facts of C1. Then, profit out of the phony ideas and easy to impress followers and naive “students”. For the parasites of the truth, this is a sinister way of doing marketing and is snakeoil at its best.</description>
		<content:encoded><![CDATA[<p><strong>About LSI</strong></p>
<p>To use/apply LSI on a database collection of documents of size <strong>D</strong>, you MUST HAVE:</p>
<p>1. Access to every single document of the collection to be analyzed (Something quite hard to do at the scale of the Web.)</p>
<p>2. Construct a term-document matrix <strong>A</strong> and populate this with term weights <strong>w</strong> precomputed according to a particular term weight framework. Said framework can be tf*idf-based or based on many variants of this, or it can be based on entropy or relevance information.</p>
<p>3. Decide how many dimensions <strong>k</strong> (singular values) to keep. The optimum number of dimensions to keep is obtained by trial and error and is valid only for the collection under inspection.</p>
<p>4. Apply the Singular Value Decompostion Algorithm (SVD) in order to decompose the original matrix.</p>
<p>5. From the resultant left and right eigenvector matrices, apply clustering techniques to identify clusters of both documents and terms.</p>
<p>6. If the goal is to rank documents, sort these using query-document vector cosine similarity values in the LSI space.</p>
<p>Any LSI outcome is valid only for the collection of size <strong>D</strong> used, the orignal matrix <strong>A</strong> used, the term weight <strong>w</strong> scoring framework used, and the number of dimensions <strong>k</strong> used when applying the SVD algorithm.</p>
<p>If these initial conditions change in time the outcome will also change. In addition, any change in the values of the cells of the original matrix <strong>A</strong> (even a single change) will provoke a redistribution of weights in the reconstructed matrix obtained from the SVD algorithm. The outcome of said redistribution cannot be predicted as all weights in the truncated matrix will be interrelated. Said weights determines the creation and strength of the co-occurrence connectivity paths observed. And we still don’t know when or how many times a particular document of the collection has been altered by its owners at any given point in time. </p>
<p>Because all of the above, this is why any attempt at designing “LSI-Friendly” documents is plain snakeoil. Additional information is provided at:</p>
<p><a href="http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/" rel="nofollow">http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/</a></p>
<p><strong>About the video</strong></p>
<p>With regard to Stompernet’s video, the only thing I share is the author’s position regarding that LSI cannot be used/manipulated by SEOs to optimize web pages. However, the arguments used are plain wrong and it would be hard for others to learn any valid knowledge out of these.</p>
<p>The problem with the video is that, as mentioned above, before trying to debunk something at least we need to know and understand what exactly we are trying to debunk or refute. It appears to me now that Rohdes does not really understand what is LSI or how it works at all, so as many of the posters following him.</p>
<p>Some of the arguments used in his video regarding what is LSI or how it works are false and unnecessarily adds to the confusion many SEOs have regarding what is/is not LSI. For example, that SVD works by reducing a vocabulary and that LSI works at the level of semantics, meaning, etc. The later were ideas stretched in the early LSI literature, dated back 20 years or so. Today we know exactly what the LSI algorithm does and why. </p>
<p>I cannot prejudge at this point whether Stompernet put out a confusing message to make some noise. If that was the intention, it is just marketing 101. It is not the first time some of their “faculty” comes with non sense or misquotes from IR papers, so as many other SEOs looking to put out a product or a service, at the expense of trying to debunk the competition’s hearsay. Debunking incorrect ideas with incorrect ideas is fraudulent teaching simple because the problem is that two wrongs don’t make things right.</p>
<p>Adding to the confusion is the fact that many SEOs, in an effort to market whatever they sell, call LSI something that is not. This is a standard propaganda practice and works as follows: </p>
<p>Let say we have two concepts C1 and C2. C1 is a legit, proven concept. C2 is bogus or made out of thin air. To promote C2, rename it as C1. Any argument against C2 is diverted invoking the underlying true facts of C1. Then, profit out of the phony ideas and easy to impress followers and naive “students”. For the parasites of the truth, this is a sinister way of doing marketing and is snakeoil at its best.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by bnweb</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-746</link>
		<dc:creator>bnweb</dc:creator>
		<pubDate>Fri, 17 Apr 2009 13:10:43 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-746</guid>
		<description>I&#039;d like to take the third way and disagree with both yourself and Leslie Rohde, though not entirely.  I have my own SEO Company and although we don&#039;t make particular claims about LSI we have used LSI on all our projects in the last 18 months.  Although the use of LSI has been in differing amounts, particularly as we&#039;ve developed a stronger strategy to gain better results, what we have seen is that our strategy has worked on every single project.  I don&#039;t claim that our implementation is perfect but it does work.  As I say I don&#039;t totally disagree with what your saying but on the conclusion that LSI is a myth I would.

At this juncture I&#039;d like to point out that I&#039;m familiar with both your work in IR and Stompernet&#039;s work in the internet marketing arena, both of which offer great insight and I&#039;d highly recommend to anyone.

It&#039;s easy to talk about the theories of LSI and IR in general, some SEO&#039;s don&#039;t even seem to understand the basic concept that Google is essence a IR system and the seosphere is a lot better of for your work.  On the other side of the fence all good SEO&#039;s understand that any implementation of IR by Google is simply limited by the mechanics of the internet itself and therefore when IR people like your good self talk about Google&#039;s implementation of IR, when to me anyway there seems no understanding of what those limitations are, you don&#039;t have a proper picture on the full story.

I&#039;m sure you will both disagree with my conlusion however talking about the theory of LSI and what Google is actually doing is one thing when we&#039;ve actually applied what we think Google is in part doing in the physical world when we&#039;ve seen postitive results every time and explaining that is something entirely different altogether.</description>
		<content:encoded><![CDATA[<p>I&#8217;d like to take the third way and disagree with both yourself and Leslie Rohde, though not entirely.  I have my own SEO Company and although we don&#8217;t make particular claims about LSI we have used LSI on all our projects in the last 18 months.  Although the use of LSI has been in differing amounts, particularly as we&#8217;ve developed a stronger strategy to gain better results, what we have seen is that our strategy has worked on every single project.  I don&#8217;t claim that our implementation is perfect but it does work.  As I say I don&#8217;t totally disagree with what your saying but on the conclusion that LSI is a myth I would.</p>
<p>At this juncture I&#8217;d like to point out that I&#8217;m familiar with both your work in IR and Stompernet&#8217;s work in the internet marketing arena, both of which offer great insight and I&#8217;d highly recommend to anyone.</p>
<p>It&#8217;s easy to talk about the theories of LSI and IR in general, some SEO&#8217;s don&#8217;t even seem to understand the basic concept that Google is essence a IR system and the seosphere is a lot better of for your work.  On the other side of the fence all good SEO&#8217;s understand that any implementation of IR by Google is simply limited by the mechanics of the internet itself and therefore when IR people like your good self talk about Google&#8217;s implementation of IR, when to me anyway there seems no understanding of what those limitations are, you don&#8217;t have a proper picture on the full story.</p>
<p>I&#8217;m sure you will both disagree with my conlusion however talking about the theory of LSI and what Google is actually doing is one thing when we&#8217;ve actually applied what we think Google is in part doing in the physical world when we&#8217;ve seen postitive results every time and explaining that is something entirely different altogether.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on A Call to SEOs Claiming to Sell LSI by The Google Myth - LSI Revealed</title>
		<link>http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/#comment-744</link>
		<dc:creator>The Google Myth - LSI Revealed</dc:creator>
		<pubDate>Thu, 16 Apr 2009 16:50:44 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/#comment-744</guid>
		<description>[...] Re: The Google Myth - LSI Revealed      Referential integrity is likely a name Leslie decided to use. After all, the guy is a software engineer and so that&#039;s the way he thinks.  As far as LSI, I have heavily researched it and come to the conclusion that it is not part of Google&#039;s algorithm. The reason why one would think that siloing works is because of the fact that it involves radical changes in PageRank flow and makes use of anchor text optimization. It&#039;s the &quot;cross-theming&quot; part that has no real basis as fact.  If you want more information on the nuts and bolts of LSI in search, you can start with this - A Call to SEOs Claiming to Sell LSI IR Thoughts [...]</description>
		<content:encoded><![CDATA[<p>[...] Re: The Google Myth &#8211; LSI Revealed      Referential integrity is likely a name Leslie decided to use. After all, the guy is a software engineer and so that&#8217;s the way he thinks.  As far as LSI, I have heavily researched it and come to the conclusion that it is not part of Google&#8217;s algorithm. The reason why one would think that siloing works is because of the fact that it involves radical changes in PageRank flow and makes use of anchor text optimization. It&#8217;s the &quot;cross-theming&quot; part that has no real basis as fact.  If you want more information on the nuts and bolts of LSI in search, you can start with this &#8211; A Call to SEOs Claiming to Sell LSI IR Thoughts [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-740</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 10 Apr 2009 18:42:33 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-740</guid>
		<description>Thank you for stopping by. My reply follows. 

First, your effort of putting out a video to debunk SEO Myths in relation with LSI is, as mentioned before, a noble one and should be pursued by others. I see we have some points we can agree on, so I will highlight those throughout this reply, when neccessary. 

&lt;blockquote&gt;
To explain the essentials of LSI and describe how it should appear in search results in just 13 minutes and 39 seconds did require some &quot;short cuts&quot; as you have pointed out.
&lt;/blockquote&gt;

Indeed, it is not possible. Students that took my graduate course on Search Engine Architectures came from different background (computer science engineering, information security, programming, and one or two from marketing/web development). I spent few lectures and computer lab sessions teaching them the theory behind LSI and how to do SVD runs on small cases. To get their feet wet on SVD and LSI is not something to absorb in few minutes from a video.

&lt;blockquote&gt;
But our goal is not the creation of scientists, but instead the development of successful business people, and to accomplish that we need provide primarily two things (1) practical skills that do work and (2) enough theory to protect our students from untruths.  
&lt;/blockquote&gt;

I cannot agree more with (1) and that your goal should not be the creation of scientist. Often those creatures are &quot;created&quot; in real universities and colleges. No offense intended to the organization you belong to.

With respect to (2), more likely, it would be difficult to protect &quot;students from untruths&quot; by resourcing to  theoretical inaccuracies, particularly using flawed arguments. 

Before debunking something we need to know at least what exactly we are trying to debunk. A critical thinking student will challenge any teacher on this one, regardles of the teacher tenure, stature, or reputation. In fact, I encourage my students to do that, so they are not blind followers.

&lt;blockquote&gt;
...I stand by the conclusion that in regards the importance of LSI in the ranking algorithm, the existence of significant differences in search results for singular and plural forms is determative of the question.
&lt;/blockquote&gt;

Of course the form of words matters one way or the other, so as many other things. Many variables affect ranking results. 

&lt;blockquote&gt;
Plurality is a difference in cardinality, not concept, so by the very nature of what LSI does -- &quot;by producing a set of concepts related to the documents and terms&quot; and &quot;comparing the documents in the concept space&quot; -- the results for plural forms *have* to be nearly, if not completely, identical to the results for singular forms.
&lt;/blockquote&gt;

What is being disputed is that implementing/not implementing stemming makes a difference and will affect the outcome of the SVD algorithm used in LSI, simply because tokens (regardless their  cadinality or functionality) are reduced to mere numbers.

&lt;blockquote&gt;
If this is not true, then I contend their concept of &quot;concept&quot; must not be very conceptual after all!  :-)
&lt;/blockquote&gt;

That is precisely the point. LSI is a misnomer and does not derive semantic information (meaning, concepts, etc) as previously claimed in the early literature. 

Most of those claims are 10, 20, years old. Why keep citing outdated research? 

Those papers made reference to a hidden (latent) structure between words masked by &quot;noisy&quot; words and was referred to as the latent semantic structure embedded in a corpus. SVD brings back this structure. &quot;Semantics&quot; in those papers refers to that hidden structure.

What those early LSI manuscript failed to report is that such a structure is the result of high order co-occurrence paths. These are easy to visualize by means of directed graphs. I call these Latent Graphs. To sum up, LSI does not derive semantic information (meanings, concepts, etc).

To quote Kontostathis&#039;s excellent work:

&quot;LSI is a dimensionality reduction approach for modeling documents. It was originally thought to bring out the ‘latent semantics’ within a corpus of documents. However, LSI does not derive ‘semantic’ information per se. It does captures higher order term co-occurrence information [19], and we prefer to state that LSI captures ‘term relationship’ information, rather than ‘latent semantic’ information.&quot; 
(http://csdl2.computer.org/comp/proceedings/hicss/2007/2755/00/27550073c.pdf).

One more thing. Word semantics (meaning, concepts, etc) can be affected by word order (exchangeability of terms), but there is a problem: In most LSI implementations, word order is not accounted for. How then talk about concepts within the context of LSI while ignoring the exchangeability of both tems and documents? 

This is why new models, like LDA (Latent Dirichlet Allocation), have been proposed. A discussion on Vector Space, LSI, PLSI, and LDA is available at http://irthoughts.wordpress.com/2009/04/03/vector-space-probabilistic-lsi-and-lda/</description>
		<content:encoded><![CDATA[<p>Thank you for stopping by. My reply follows. </p>
<p>First, your effort of putting out a video to debunk SEO Myths in relation with LSI is, as mentioned before, a noble one and should be pursued by others. I see we have some points we can agree on, so I will highlight those throughout this reply, when neccessary. </p>
<blockquote><p>
To explain the essentials of LSI and describe how it should appear in search results in just 13 minutes and 39 seconds did require some &#8220;short cuts&#8221; as you have pointed out.
</p></blockquote>
<p>Indeed, it is not possible. Students that took my graduate course on Search Engine Architectures came from different background (computer science engineering, information security, programming, and one or two from marketing/web development). I spent few lectures and computer lab sessions teaching them the theory behind LSI and how to do SVD runs on small cases. To get their feet wet on SVD and LSI is not something to absorb in few minutes from a video.</p>
<blockquote><p>
But our goal is not the creation of scientists, but instead the development of successful business people, and to accomplish that we need provide primarily two things (1) practical skills that do work and (2) enough theory to protect our students from untruths.
</p></blockquote>
<p>I cannot agree more with (1) and that your goal should not be the creation of scientist. Often those creatures are &#8220;created&#8221; in real universities and colleges. No offense intended to the organization you belong to.</p>
<p>With respect to (2), more likely, it would be difficult to protect &#8220;students from untruths&#8221; by resourcing to  theoretical inaccuracies, particularly using flawed arguments. </p>
<p>Before debunking something we need to know at least what exactly we are trying to debunk. A critical thinking student will challenge any teacher on this one, regardles of the teacher tenure, stature, or reputation. In fact, I encourage my students to do that, so they are not blind followers.</p>
<blockquote><p>
&#8230;I stand by the conclusion that in regards the importance of LSI in the ranking algorithm, the existence of significant differences in search results for singular and plural forms is determative of the question.
</p></blockquote>
<p>Of course the form of words matters one way or the other, so as many other things. Many variables affect ranking results. </p>
<blockquote><p>
Plurality is a difference in cardinality, not concept, so by the very nature of what LSI does &#8212; &#8220;by producing a set of concepts related to the documents and terms&#8221; and &#8220;comparing the documents in the concept space&#8221; &#8212; the results for plural forms *have* to be nearly, if not completely, identical to the results for singular forms.
</p></blockquote>
<p>What is being disputed is that implementing/not implementing stemming makes a difference and will affect the outcome of the SVD algorithm used in LSI, simply because tokens (regardless their  cadinality or functionality) are reduced to mere numbers.</p>
<blockquote><p>
If this is not true, then I contend their concept of &#8220;concept&#8221; must not be very conceptual after all!  <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />
</p></blockquote>
<p>That is precisely the point. LSI is a misnomer and does not derive semantic information (meaning, concepts, etc) as previously claimed in the early literature. </p>
<p>Most of those claims are 10, 20, years old. Why keep citing outdated research? </p>
<p>Those papers made reference to a hidden (latent) structure between words masked by &#8220;noisy&#8221; words and was referred to as the latent semantic structure embedded in a corpus. SVD brings back this structure. &#8220;Semantics&#8221; in those papers refers to that hidden structure.</p>
<p>What those early LSI manuscript failed to report is that such a structure is the result of high order co-occurrence paths. These are easy to visualize by means of directed graphs. I call these Latent Graphs. To sum up, LSI does not derive semantic information (meanings, concepts, etc).</p>
<p>To quote Kontostathis&#8217;s excellent work:</p>
<p>&#8220;LSI is a dimensionality reduction approach for modeling documents. It was originally thought to bring out the ‘latent semantics’ within a corpus of documents. However, LSI does not derive ‘semantic’ information per se. It does captures higher order term co-occurrence information [19], and we prefer to state that LSI captures ‘term relationship’ information, rather than ‘latent semantic’ information.&#8221;<br />
(<a href="http://csdl2.computer.org/comp/proceedings/hicss/2007/2755/00/27550073c.pdf" rel="nofollow">http://csdl2.computer.org/comp/proceedings/hicss/2007/2755/00/27550073c.pdf</a>).</p>
<p>One more thing. Word semantics (meaning, concepts, etc) can be affected by word order (exchangeability of terms), but there is a problem: In most LSI implementations, word order is not accounted for. How then talk about concepts within the context of LSI while ignoring the exchangeability of both tems and documents? </p>
<p>This is why new models, like LDA (Latent Dirichlet Allocation), have been proposed. A discussion on Vector Space, LSI, PLSI, and LDA is available at <a href="http://irthoughts.wordpress.com/2009/04/03/vector-space-probabilistic-lsi-and-lda/" rel="nofollow">http://irthoughts.wordpress.com/2009/04/03/vector-space-probabilistic-lsi-and-lda/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by leslierohde</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-739</link>
		<dc:creator>leslierohde</dc:creator>
		<pubDate>Fri, 10 Apr 2009 15:04:29 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-739</guid>
		<description>Thanks for your fine review and feedback.

To explain the essentials of LSI and describe how it should appear in search results in just 13 minutes and 39 seconds did require some &quot;short cuts&quot; as you have pointed out.

But our goal is not the creation of scientists, but instead the development of successful business people, and to accomplish that we need provide primarily two things (1) practical skills that do work and (2) enough theory to protect our students from untruths.  This most recent video provides some of the later, while the next one provides the former.

But theoretical inaccuracies aside, I stand by the conclusion that in regards the importance of LSI in the ranking algorithm, the existence of significant differences in search results for singular and plural forms is determative of the question.

Plurality is a difference in cardinality, not concept, so by the very nature of what LSI does -- &quot;by producing a set of concepts related to the documents and terms&quot; and &quot;comparing the documents in the concept space&quot; -- the results for plural forms *have* to be nearly, if not completely, identical to the results for singular forms.

If this is not true, then I contend their concept of &quot;concept&quot; must not be very conceptual after all!  :-)

Thanks again.</description>
		<content:encoded><![CDATA[<p>Thanks for your fine review and feedback.</p>
<p>To explain the essentials of LSI and describe how it should appear in search results in just 13 minutes and 39 seconds did require some &#8220;short cuts&#8221; as you have pointed out.</p>
<p>But our goal is not the creation of scientists, but instead the development of successful business people, and to accomplish that we need provide primarily two things (1) practical skills that do work and (2) enough theory to protect our students from untruths.  This most recent video provides some of the later, while the next one provides the former.</p>
<p>But theoretical inaccuracies aside, I stand by the conclusion that in regards the importance of LSI in the ranking algorithm, the existence of significant differences in search results for singular and plural forms is determative of the question.</p>
<p>Plurality is a difference in cardinality, not concept, so by the very nature of what LSI does &#8212; &#8220;by producing a set of concepts related to the documents and terms&#8221; and &#8220;comparing the documents in the concept space&#8221; &#8212; the results for plural forms *have* to be nearly, if not completely, identical to the results for singular forms.</p>
<p>If this is not true, then I contend their concept of &#8220;concept&#8221; must not be very conceptual after all!  <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Thanks again.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Finally SEOs are getting the LSI Myth! by SEO&#8217;s Get Schooled on LSI &#124; SEO Design Solutions</title>
		<link>http://irthoughts.wordpress.com/2009/04/09/finally-seos-are-getting-the-lsi-myth/#comment-737</link>
		<dc:creator>SEO&#8217;s Get Schooled on LSI &#124; SEO Design Solutions</dc:creator>
		<pubDate>Fri, 10 Apr 2009 04:07:24 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=854#comment-737</guid>
		<description>[...] has had a great deal of dilution on the topic in addition to grand standing and embellishment. This particular response was from Dr. E. Garcia stemming from a video produced from Stompernet on LSI and how it does not [...]</description>
		<content:encoded><![CDATA[<p>[...] has had a great deal of dilution on the topic in addition to grand standing and embellishment. This particular response was from Dr. E. Garcia stemming from a video produced from Stompernet on LSI and how it does not [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Vector Space, Probabilistic LSI, and LDA by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2009/04/03/vector-space-probabilistic-lsi-and-lda/#comment-735</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Wed, 08 Apr 2009 14:29:09 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=839#comment-735</guid>
		<description>Hi, submitera:

Thank you for stopping by. Let me know which forum discusses this.</description>
		<content:encoded><![CDATA[<p>Hi, submitera:</p>
<p>Thank you for stopping by. Let me know which forum discusses this.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Vector Space, Probabilistic LSI, and LDA by submitera</title>
		<link>http://irthoughts.wordpress.com/2009/04/03/vector-space-probabilistic-lsi-and-lda/#comment-734</link>
		<dc:creator>submitera</dc:creator>
		<pubDate>Tue, 07 Apr 2009 19:56:31 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=839#comment-734</guid>
		<description>that&#039;s a gr8 post on this topic..helped me a lot in a forum debate and also cleared my mind...thanx again!</description>
		<content:encoded><![CDATA[<p>that&#8217;s a gr8 post on this topic..helped me a lot in a forum debate and also cleared my mind&#8230;thanx again!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths by SEOs and Their IDF Myths: Part 3 &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2008/06/17/seos-and-their-idf-myths/#comment-731</link>
		<dc:creator>SEOs and Their IDF Myths: Part 3 &#171; IR Thoughts</dc:creator>
		<pubDate>Fri, 20 Mar 2009 15:59:30 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=219#comment-731</guid>
		<description>[...] and Their IDF Myths: Part&#160;3 By E. Garcia  In SEOs and their IDF Myths, we covered how many are mistaking the measure of term specificity known as Inverse Document [...]</description>
		<content:encoded><![CDATA[<p>[...] and Their IDF Myths: Part&nbsp;3 By E. Garcia  In SEOs and their IDF Myths, we covered how many are mistaking the measure of term specificity known as Inverse Document [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Understanding TFIDF by SEOs and Their IDF Myths: Part 3 &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/#comment-730</link>
		<dc:creator>SEOs and Their IDF Myths: Part 3 &#171; IR Thoughts</dc:creator>
		<pubDate>Fri, 20 Mar 2009 15:46:02 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=237#comment-730</guid>
		<description>[...] Understanding TFIDF, we wrote a [...]</description>
		<content:encoded><![CDATA[<p>[...] Understanding TFIDF, we wrote a [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by SEOs and Their IDF Myths: Part 3 &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-729</link>
		<dc:creator>SEOs and Their IDF Myths: Part 3 &#171; IR Thoughts</dc:creator>
		<pubDate>Fri, 20 Mar 2009 15:45:54 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-729</guid>
		<description>[...] SEOs and their IDF Myths: Part 2, we exposed some of [...]</description>
		<content:encoded><![CDATA[<p>[...] SEOs and their IDF Myths: Part 2, we exposed some of [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Similarity, Pearson, and Spearman Coefficients by Centering Data in PCA &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2008/10/29/similarity-pearson-and-spearman-coefficients/#comment-727</link>
		<dc:creator>Centering Data in PCA &#171; IR Thoughts</dc:creator>
		<pubDate>Fri, 13 Mar 2009 14:11:55 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=460#comment-727</guid>
		<description>[...] http://irthoughts.wordpress.com/2008/10/29/similarity-pearson-and-spearman-coefficients/ [...]</description>
		<content:encoded><![CDATA[<p>[...] <a href="http://irthoughts.wordpress.com/2008/10/29/similarity-pearson-and-spearman-coefficients/" rel="nofollow">http://irthoughts.wordpress.com/2008/10/29/similarity-pearson-and-spearman-coefficients/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on On SVD and PCA: Some Applications by Centering Data in PCA &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2007/05/05/on-svd-and-pca-some-applications/#comment-726</link>
		<dc:creator>Centering Data in PCA &#171; IR Thoughts</dc:creator>
		<pubDate>Fri, 13 Mar 2009 14:11:49 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/2007/05/05/on-svd-and-pca-some-applications/#comment-726</guid>
		<description>[...] http://irthoughts.wordpress.com/2007/05/05/on-svd-and-pca-some-applications/ [...]</description>
		<content:encoded><![CDATA[<p>[...] <a href="http://irthoughts.wordpress.com/2007/05/05/on-svd-and-pca-some-applications/" rel="nofollow">http://irthoughts.wordpress.com/2007/05/05/on-svd-and-pca-some-applications/</a> [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
