<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments for IR Thoughts</title>
	<atom:link href="http://irthoughts.wordpress.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://irthoughts.wordpress.com</link>
	<description>News, Papers, and Theses on Information Retrieval, Data Mining, and Search Engine Technologies.</description>
	<pubDate>Fri, 25 Jul 2008 15:21:34 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
		<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by regcharie</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-641</link>
		<dc:creator>regcharie</dc:creator>
		<pubDate>Fri, 25 Jul 2008 04:41:54 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-641</guid>
		<description>Thanks Dr. Garcia. 

You have given me untold hours of interesting reading. 

:)

Reg</description>
		<content:encoded><![CDATA[<p>Thanks Dr. Garcia. </p>
<p>You have given me untold hours of interesting reading. </p>
<p> <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Reg</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-640</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 25 Jul 2008 02:26:53 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-640</guid>
		<description>Hi, Reg Charie:

Thank you for stopping by.

Indeed. And to compute IDF, thus tf-IDF, one must know N, the size of the entire collection of a search engine. 

Over the years some IRs have tried to estimate N by resourcing to the term independence assumption. The result are estimated N values that vary so wild that they cannot be trusted; at least not with Web search engines.

The term independence assumption is the source of all kind of inconsitencies in IR scoring functions, including vector space models. When one thinks about it thoroughly, the notion of term specificity itself is inherently divorced from term independence.

On other matters. Here are some links to post somehow related with this thread posts.

http://irthoughts.wordpress.com/2008/07/21/seos-and-their-exhaustivity-search-myths/

http://irthoughts.wordpress.com/2008/07/14/claps-and-slaps/

http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/

http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/

http://irthoughts.wordpress.com/2007/07/19/seos-and-still-their-lsi-misconceptions/

http://irthoughts.wordpress.com/2007/05/03/latest-seo-incoherences-lsi/</description>
		<content:encoded><![CDATA[<p>Hi, Reg Charie:</p>
<p>Thank you for stopping by.</p>
<p>Indeed. And to compute IDF, thus tf-IDF, one must know N, the size of the entire collection of a search engine. </p>
<p>Over the years some IRs have tried to estimate N by resourcing to the term independence assumption. The result are estimated N values that vary so wild that they cannot be trusted; at least not with Web search engines.</p>
<p>The term independence assumption is the source of all kind of inconsitencies in IR scoring functions, including vector space models. When one thinks about it thoroughly, the notion of term specificity itself is inherently divorced from term independence.</p>
<p>On other matters. Here are some links to post somehow related with this thread posts.</p>
<p><a href="http://irthoughts.wordpress.com/2008/07/21/seos-and-their-exhaustivity-search-myths/" rel="nofollow">http://irthoughts.wordpress.com/2008/07/21/seos-and-their-exhaustivity-search-myths/</a></p>
<p><a href="http://irthoughts.wordpress.com/2008/07/14/claps-and-slaps/" rel="nofollow">http://irthoughts.wordpress.com/2008/07/14/claps-and-slaps/</a></p>
<p><a href="http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/" rel="nofollow">http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/</a></p>
<p><a href="http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/" rel="nofollow">http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/</a></p>
<p><a href="http://irthoughts.wordpress.com/2007/07/19/seos-and-still-their-lsi-misconceptions/" rel="nofollow">http://irthoughts.wordpress.com/2007/07/19/seos-and-still-their-lsi-misconceptions/</a></p>
<p><a href="http://irthoughts.wordpress.com/2007/05/03/latest-seo-incoherences-lsi/" rel="nofollow">http://irthoughts.wordpress.com/2007/05/03/latest-seo-incoherences-lsi/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by regcharie</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-639</link>
		<dc:creator>regcharie</dc:creator>
		<pubDate>Thu, 24 Jul 2008 19:27:00 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-639</guid>
		<description>What an interesting thread.
I like your style Dr. Garcia. 

When going through the series of posts these things struck me. 

1). andyed Says:
July 3, 2008 at 7:33 pm
“Worse than that, your reference to me saying TF-IDF is new is entirely inaccurate. In no way did I suggest this is a new concept. My quote was “The buzzword in IR”, not the new buzzword.” 
My eye was caught on the last sentence. 
“I was reviewing a new website doing some interesting word frequency analyses.”

As a 12+ year practitioner of SEO, it never ceases to amaze me with what some of the fertile minds can come up with to inflate their own value. 

Let me put this in context as I interpreted it. 

“I was reviewing a new website doing some interesting word frequency analyses”

And just HOW were you doing this analysis andyed? Using TF-IDF as you imply? 
Are you privy to Google’s algorithms? 
Using any other technology is simply a waste of the client’s money. 

I am not saying KW analysis is a waste of time, but having to dissect each (new) client’s site and weigh the value of the current composition is ludicrous. 
ANY practitioner of SEO worth their salt knows how to build the site properly from a results driven “template”. Stop trying to “game” the engines. 

2). danthies Says:
July 4, 2008 at 1:21 am
I have to agree with danthies. 
andyed is using TF-IDF as “A word or phrase connected with a specialized field or group that usually sounds important or technical and is used primarily to impress laypersons.” With emphasis on “usually sounds important or technical and is used primarily to impress laypersons”.
The more you can impress, the more you can charge. Build the BS mystique. 

3). E. Garcia Says:
July 4, 2008 at 11:37 am
In IR we know exactly what tf-idf means, stands for, and what it does.

Of course, but the general population does not. And the article is for the general population. 

4). E. Garcia Says:
July 4, 2008 at 2:15 pm

“He has stated that TFIDF is a buzzword in IR and I contend that is not and explained why.”

Of course he would. To have it so would imply more importance to his use of the term for the general market. 

5). E. Garcia Says:
July 4, 2008 at 10:06 pm
“Still, I want to be fair. Please go ahead and start discussing the merits of the tool here.”

I used TheRarestWords.com on a couple of my sites and learned nothing useful. 
Results were as expected. 

6). Going back to your article on understanding TF-IDF I agree strongly that “IDF as the TFIDF product, aij = tfij*IDFi, does not estimate term importance either. The importance of a term, a string, a passage, a message, etc is linked to many things like its meaning (semantics) and amount of information carried  (entropy). A TF-IDF product does not evaluate either one.” 

The semantics are the key. 

Even if TF-IDF were to be the end-all and be-all of evaluating keyword and keyword phrases over the scope of the database, it still means nothing without the exact vectors used by the search engine(s).  Using it as a buzzword is simply hyperbole. 

Writing for the search engines is not rocket science. 
Writing the search engines is. 

Reg Charie.</description>
		<content:encoded><![CDATA[<p>What an interesting thread.<br />
I like your style Dr. Garcia. </p>
<p>When going through the series of posts these things struck me. </p>
<p>1). andyed Says:<br />
July 3, 2008 at 7:33 pm<br />
“Worse than that, your reference to me saying TF-IDF is new is entirely inaccurate. In no way did I suggest this is a new concept. My quote was “The buzzword in IR”, not the new buzzword.”<br />
My eye was caught on the last sentence.<br />
“I was reviewing a new website doing some interesting word frequency analyses.”</p>
<p>As a 12+ year practitioner of SEO, it never ceases to amaze me with what some of the fertile minds can come up with to inflate their own value. </p>
<p>Let me put this in context as I interpreted it. </p>
<p>“I was reviewing a new website doing some interesting word frequency analyses”</p>
<p>And just HOW were you doing this analysis andyed? Using TF-IDF as you imply?<br />
Are you privy to Google’s algorithms?<br />
Using any other technology is simply a waste of the client’s money. </p>
<p>I am not saying KW analysis is a waste of time, but having to dissect each (new) client’s site and weigh the value of the current composition is ludicrous.<br />
ANY practitioner of SEO worth their salt knows how to build the site properly from a results driven “template”. Stop trying to “game” the engines. </p>
<p>2). danthies Says:<br />
July 4, 2008 at 1:21 am<br />
I have to agree with danthies.<br />
andyed is using TF-IDF as “A word or phrase connected with a specialized field or group that usually sounds important or technical and is used primarily to impress laypersons.” With emphasis on “usually sounds important or technical and is used primarily to impress laypersons”.<br />
The more you can impress, the more you can charge. Build the BS mystique. </p>
<p>3). E. Garcia Says:<br />
July 4, 2008 at 11:37 am<br />
In IR we know exactly what tf-idf means, stands for, and what it does.</p>
<p>Of course, but the general population does not. And the article is for the general population. </p>
<p>4). E. Garcia Says:<br />
July 4, 2008 at 2:15 pm</p>
<p>“He has stated that TFIDF is a buzzword in IR and I contend that is not and explained why.”</p>
<p>Of course he would. To have it so would imply more importance to his use of the term for the general market. </p>
<p>5). E. Garcia Says:<br />
July 4, 2008 at 10:06 pm<br />
“Still, I want to be fair. Please go ahead and start discussing the merits of the tool here.”</p>
<p>I used TheRarestWords.com on a couple of my sites and learned nothing useful.<br />
Results were as expected. </p>
<p>6). Going back to your article on understanding TF-IDF I agree strongly that “IDF as the TFIDF product, aij = tfij*IDFi, does not estimate term importance either. The importance of a term, a string, a passage, a message, etc is linked to many things like its meaning (semantics) and amount of information carried  (entropy). A TF-IDF product does not evaluate either one.” </p>
<p>The semantics are the key. </p>
<p>Even if TF-IDF were to be the end-all and be-all of evaluating keyword and keyword phrases over the scope of the database, it still means nothing without the exact vectors used by the search engine(s).  Using it as a buzzword is simply hyperbole. </p>
<p>Writing for the search engines is not rocket science.<br />
Writing the search engines is. </p>
<p>Reg Charie.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on A Call to SEOs Claiming to Sell LSI by Claps and Slaps &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/#comment-634</link>
		<dc:creator>Claps and Slaps &#171; IR Thoughts</dc:creator>
		<pubDate>Mon, 14 Jul 2008 12:32:51 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/#comment-634</guid>
		<description>[...] is something funny about SEOs that sell snake oil ( http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/ ) They get angry to their bones when we expose their myths and lies through IR knowledge, but they [...]</description>
		<content:encoded><![CDATA[<p>[...] is something funny about SEOs that sell snake oil ( <a href="http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/" rel="nofollow">http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/</a> ) They get angry to their bones when we expose their myths and lies through IR knowledge, but they [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on For SEO Spammers: AIRWeb 2008 Presentations by Claps and Slaps &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2008/04/29/for-seo-spammers-airweb-2008-presentations/#comment-633</link>
		<dc:creator>Claps and Slaps &#171; IR Thoughts</dc:creator>
		<pubDate>Mon, 14 Jul 2008 12:29:26 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=206#comment-633</guid>
		<description>[...] http://irthoughts.wordpress.com/2008/04/29/for-seo-spammers-airweb-2008-presentations/ [...]</description>
		<content:encoded><![CDATA[<p>[...] <a href="http://irthoughts.wordpress.com/2008/04/29/for-seo-spammers-airweb-2008-presentations/" rel="nofollow">http://irthoughts.wordpress.com/2008/04/29/for-seo-spammers-airweb-2008-presentations/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-626</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Mon, 07 Jul 2008 16:36:25 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-626</guid>
		<description>Wow. I came back from my two-day Fourth of July vacation and my inbox has all sort of SEO dizzying comments that adds no value to the discussion. 

For your information sir, IR Thoughts is a blog wherein we comment on IR and search engines and debunk search marketing myths like the many promoted by SEOs. We dissect these through IR knowledge, not hearsay.

We didn’t go after you, Dan. You came here voluntarily, with a spaghetti-like defense, throwing all sort of false statements to see what sticks to the wall. Are you still bleeding from NY SES 2005?

Refluxing words just shows you are short of your very own. To waste your will over the beautiful Fourth of July weekend in such manner might suggest that you are a very lonely, depressive man. You need a vacation.

Since your comments add no value to the discussion, you lost your chance, so you and your friends are out of here. BTW, more entertaining than chasing rare terms most users don’t care to search about is reading &lt;a href="http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/" rel="nofollow"&gt;Understanding TFIDF&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Wow. I came back from my two-day Fourth of July vacation and my inbox has all sort of SEO dizzying comments that adds no value to the discussion. </p>
<p>For your information sir, IR Thoughts is a blog wherein we comment on IR and search engines and debunk search marketing myths like the many promoted by SEOs. We dissect these through IR knowledge, not hearsay.</p>
<p>We didn’t go after you, Dan. You came here voluntarily, with a spaghetti-like defense, throwing all sort of false statements to see what sticks to the wall. Are you still bleeding from NY SES 2005?</p>
<p>Refluxing words just shows you are short of your very own. To waste your will over the beautiful Fourth of July weekend in such manner might suggest that you are a very lonely, depressive man. You need a vacation.</p>
<p>Since your comments add no value to the discussion, you lost your chance, so you and your friends are out of here. BTW, more entertaining than chasing rare terms most users don’t care to search about is reading <a href="http://irthoughts.wordpress.com/2008/07/07/understanding-tfidf/" rel="nofollow">Understanding TFIDF</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-624</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Sat, 05 Jul 2008 02:06:10 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-624</guid>
		<description>&lt;blockquote&gt;
Doctor, there's nothing to "defend" - your "nitpick attack" is without merit. I was just pointing out that it's as easy to find fault with your own choice of words, and even easier to question your motivation. Readers will easily see that this is so without me taking it any further.
&lt;/blockquote&gt;

In http://alwaysbetesting.com/abtest/index.cfm/2008/5/24/Term-Frequency-Inverse-Document-Frequency-TDIDF-Exploring-TheRarestWordscom

Edmonds wrote:

"The buzzword in IR is TFIDF, or term frequency inverse document frequency. This is a method for giving more importance to the less common words in a document that match the query. Mid-range frequency words get discounted, but they're likely key terms, if the page is truly relevant, and often repeated."

If your read his first post here, he did not clarify anything, but reassured what he said: "The buzzword in IR is TFIDF". 

There is a difference between buzzword and jargon and IR and SEO.

Then alone came you here. i have seen you doing this before to promote you or your associates.

First you dropped by with a definition defense/diversion. Good try. 

That didn't work. So you switched to a different scenario (SEO buzzwords), then to accusations... 

Talking about nitpick attacks. I think the nickpicker is you after all. Get a 4th of July life!

&lt;blockquote&gt;
So, shall we discuss the merits of the tool? Or if you want to pick something apart, we could take a look at the "auto-SE-wordizer" for fun:
&lt;/blockquote&gt;

Now on the sustantive part:

Their rare parser looks like a wanderer/sampler matching certain words from documents, nothing of a novelty. How TFIDF plays in the picture and what is the document collection size used to compute IDF, if ever used?

I have seen many of these tools coming, going, and eventually ignored, often because add little or zero value to the bottom line of a business.

Still, I want to be fair. Please go ahead and start discussing the merits of the tool here.</description>
		<content:encoded><![CDATA[<blockquote><p>
Doctor, there&#8217;s nothing to &#8220;defend&#8221; - your &#8220;nitpick attack&#8221; is without merit. I was just pointing out that it&#8217;s as easy to find fault with your own choice of words, and even easier to question your motivation. Readers will easily see that this is so without me taking it any further.
</p></blockquote>
<p>In <a href="http://alwaysbetesting.com/abtest/index.cfm/2008/5/24/Term-Frequency-Inverse-Document-Frequency-TDIDF-Exploring-TheRarestWordscom" rel="nofollow">http://alwaysbetesting.com/abtest/index.cfm/2008/5/24/Term-Frequency-Inverse-Document-Frequency-TDIDF-Exploring-TheRarestWordscom</a></p>
<p>Edmonds wrote:</p>
<p>&#8220;The buzzword in IR is TFIDF, or term frequency inverse document frequency. This is a method for giving more importance to the less common words in a document that match the query. Mid-range frequency words get discounted, but they&#8217;re likely key terms, if the page is truly relevant, and often repeated.&#8221;</p>
<p>If your read his first post here, he did not clarify anything, but reassured what he said: &#8220;The buzzword in IR is TFIDF&#8221;. </p>
<p>There is a difference between buzzword and jargon and IR and SEO.</p>
<p>Then alone came you here. i have seen you doing this before to promote you or your associates.</p>
<p>First you dropped by with a definition defense/diversion. Good try. </p>
<p>That didn&#8217;t work. So you switched to a different scenario (SEO buzzwords), then to accusations&#8230; </p>
<p>Talking about nitpick attacks. I think the nickpicker is you after all. Get a 4th of July life!</p>
<blockquote><p>
So, shall we discuss the merits of the tool? Or if you want to pick something apart, we could take a look at the &#8220;auto-SE-wordizer&#8221; for fun:
</p></blockquote>
<p>Now on the sustantive part:</p>
<p>Their rare parser looks like a wanderer/sampler matching certain words from documents, nothing of a novelty. How TFIDF plays in the picture and what is the document collection size used to compute IDF, if ever used?</p>
<p>I have seen many of these tools coming, going, and eventually ignored, often because add little or zero value to the bottom line of a business.</p>
<p>Still, I want to be fair. Please go ahead and start discussing the merits of the tool here.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by danthies</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-622</link>
		<dc:creator>danthies</dc:creator>
		<pubDate>Fri, 04 Jul 2008 23:08:41 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-622</guid>
		<description>Doctor, there's nothing to "defend" - your "nitpick attack" is without merit. I was just pointing out that it's as easy to find fault with your own choice of words, and even easier to question your motivation. Readers will easily see that this is so without me taking it any further.

So, shall we discuss the merits of the tool? Or if you want to pick something apart, we could take a look at the "auto-SE-wordizer" for fun:
http://rarestblog.com/2008/05/auto-sewordizer-automatic-search-engines-words-optimizer/</description>
		<content:encoded><![CDATA[<p>Doctor, there&#8217;s nothing to &#8220;defend&#8221; - your &#8220;nitpick attack&#8221; is without merit. I was just pointing out that it&#8217;s as easy to find fault with your own choice of words, and even easier to question your motivation. Readers will easily see that this is so without me taking it any further.</p>
<p>So, shall we discuss the merits of the tool? Or if you want to pick something apart, we could take a look at the &#8220;auto-SE-wordizer&#8221; for fun:<br />
<a href="http://rarestblog.com/2008/05/auto-sewordizer-automatic-search-engines-words-optimizer/" rel="nofollow">http://rarestblog.com/2008/05/auto-sewordizer-automatic-search-engines-words-optimizer/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-621</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 04 Jul 2008 18:15:07 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-621</guid>
		<description>&lt;blockquote&gt;
Having expended several hundred of your own words so far arguing the case for "slightly inaccurate word use" by Mr. Edmonds, you haven't persuaded me that his comments are "false, inaccurate, and shallow."

Some folks seem to have a need to be "right" all the time. You seem to be such a person, as I've watched you change your methods over the years from simply illuminating the subject to going after people.

This time, you've ignored the actual content of Andy Edmonds' post, over the use of the word "buzzword." 

Which, I still contend, is exactly what tf-idf is when it's "dropped" by SEOs who want to give the impression that they have some sort of secret sauce. LSI is also tossed out as a buzzword.

Frankly, I think you would agree with this, if you weren't trying so hard to be "right."
&lt;/blockquote&gt;

Hey, Dan; this is simple. 

He has stated that TFIDF is a buzzword in IR and I contend that is not and explained why. So far he or you haven't shown any evidence that it is a buzzword in IR.

In your previous post it was you that raised the "definition defense" and I simply show you why that defense simply cannot be sustained to tf-IDF based on the merits of those definitions. That is not going after anyone. 

Now you are switching to TFIDF as buzzword in the SEO circles, which are another twenty bucks and topic. I have to agree with you on that one. But again that is a different scenario. If Edomonds wanted to say "the buzzword in SEO" why not saying it?

After that switching defense, you raises the "motivation defense". Good try and equally wrong about me.

Sorry you are now taking things personal. If I contend that some of the folks within the SEO sphere are incorrect in some assessments or expose some of these guys for selling false claims, misrepresentation, and snake oil, and you don't like it, I am sorry for you.

&lt;blockquote&gt;
What you're doing here is often interesting, often useful, often helpful.  But you're not always right. 
&lt;/blockquote&gt;

I am not always right, by the way, nor pretend to be always right. Take that accusation you know where...

&lt;blockquote&gt;
Should I expend a thousand words dissecting your "false, inaccurate, and shallow" characterization of TheRarestWords.com as a "tf-idf tool," or stick to the subject?
&lt;/blockquote&gt;

It is up to you to expect that. You tell me.

As mentioned before, lets discuss the merits of the tool. Shall we?</description>
		<content:encoded><![CDATA[<blockquote><p>
Having expended several hundred of your own words so far arguing the case for &#8220;slightly inaccurate word use&#8221; by Mr. Edmonds, you haven&#8217;t persuaded me that his comments are &#8220;false, inaccurate, and shallow.&#8221;</p>
<p>Some folks seem to have a need to be &#8220;right&#8221; all the time. You seem to be such a person, as I&#8217;ve watched you change your methods over the years from simply illuminating the subject to going after people.</p>
<p>This time, you&#8217;ve ignored the actual content of Andy Edmonds&#8217; post, over the use of the word &#8220;buzzword.&#8221; </p>
<p>Which, I still contend, is exactly what tf-idf is when it&#8217;s &#8220;dropped&#8221; by SEOs who want to give the impression that they have some sort of secret sauce. LSI is also tossed out as a buzzword.</p>
<p>Frankly, I think you would agree with this, if you weren&#8217;t trying so hard to be &#8220;right.&#8221;
</p></blockquote>
<p>Hey, Dan; this is simple. </p>
<p>He has stated that TFIDF is a buzzword in IR and I contend that is not and explained why. So far he or you haven&#8217;t shown any evidence that it is a buzzword in IR.</p>
<p>In your previous post it was you that raised the &#8220;definition defense&#8221; and I simply show you why that defense simply cannot be sustained to tf-IDF based on the merits of those definitions. That is not going after anyone. </p>
<p>Now you are switching to TFIDF as buzzword in the SEO circles, which are another twenty bucks and topic. I have to agree with you on that one. But again that is a different scenario. If Edomonds wanted to say &#8220;the buzzword in SEO&#8221; why not saying it?</p>
<p>After that switching defense, you raises the &#8220;motivation defense&#8221;. Good try and equally wrong about me.</p>
<p>Sorry you are now taking things personal. If I contend that some of the folks within the SEO sphere are incorrect in some assessments or expose some of these guys for selling false claims, misrepresentation, and snake oil, and you don&#8217;t like it, I am sorry for you.</p>
<blockquote><p>
What you&#8217;re doing here is often interesting, often useful, often helpful.  But you&#8217;re not always right.
</p></blockquote>
<p>I am not always right, by the way, nor pretend to be always right. Take that accusation you know where&#8230;</p>
<blockquote><p>
Should I expend a thousand words dissecting your &#8220;false, inaccurate, and shallow&#8221; characterization of TheRarestWords.com as a &#8220;tf-idf tool,&#8221; or stick to the subject?
</p></blockquote>
<p>It is up to you to expect that. You tell me.</p>
<p>As mentioned before, lets discuss the merits of the tool. Shall we?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by danthies</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-620</link>
		<dc:creator>danthies</dc:creator>
		<pubDate>Fri, 04 Jul 2008 16:55:40 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-620</guid>
		<description>Having expended several hundred of your own words so far arguing the case for "slightly inaccurate word use" by Mr. Edmonds, you haven't persuaded me that his comments are "false, inaccurate, and shallow."

Some folks seem to have a need to be "right" all the time. You seem to be such a person, as I've watched you change your methods over the years from simply illuminating the subject to going after people.

This time, you've ignored the actual content of Andy Edmonds' post, over the use of the word "buzzword." 

Which, I still contend, is exactly what tf-idf is when it's "dropped" by SEOs who want to give the impression that they have some sort of secret sauce. LSI is also tossed out as a buzzword.

Frankly, I think you would agree with this, if you weren't trying so hard to be "right."

What you're doing here is often interesting, often useful, often helpful.  But you're not always right. Should I expend a thousand words dissecting your "false, inaccurate, and shallow" characterization of TheRarestWords.com as a "tf-idf tool," or stick to the subject?

If you're going to go after people, Dr. Garcia, please don't lose credibility and do your homework. &lt;a href="http://www.google.com" rel="nofollow"&gt;Start here&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Having expended several hundred of your own words so far arguing the case for &#8220;slightly inaccurate word use&#8221; by Mr. Edmonds, you haven&#8217;t persuaded me that his comments are &#8220;false, inaccurate, and shallow.&#8221;</p>
<p>Some folks seem to have a need to be &#8220;right&#8221; all the time. You seem to be such a person, as I&#8217;ve watched you change your methods over the years from simply illuminating the subject to going after people.</p>
<p>This time, you&#8217;ve ignored the actual content of Andy Edmonds&#8217; post, over the use of the word &#8220;buzzword.&#8221; </p>
<p>Which, I still contend, is exactly what tf-idf is when it&#8217;s &#8220;dropped&#8221; by SEOs who want to give the impression that they have some sort of secret sauce. LSI is also tossed out as a buzzword.</p>
<p>Frankly, I think you would agree with this, if you weren&#8217;t trying so hard to be &#8220;right.&#8221;</p>
<p>What you&#8217;re doing here is often interesting, often useful, often helpful.  But you&#8217;re not always right. Should I expend a thousand words dissecting your &#8220;false, inaccurate, and shallow&#8221; characterization of TheRarestWords.com as a &#8220;tf-idf tool,&#8221; or stick to the subject?</p>
<p>If you&#8217;re going to go after people, Dr. Garcia, please don&#8217;t lose credibility and do your homework. <a href="http://www.google.com" rel="nofollow">Start here</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-619</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 04 Jul 2008 15:37:43 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-619</guid>
		<description>For those interested, here is the link to Wikipedia: http://en.wikipedia.org/wiki/Buzzword

One more thing, for definition lovers and pseudo teachers out there. This by virtue of wikipedia and previous definitions:

The statement that "the buzzword &lt;strong&gt;in IR&lt;/strong&gt; is TFIDF" is by all means false, innaccurate, and shallow. &lt;strong&gt;In IR&lt;/strong&gt; tf-idf is not a buzzword or "the buzzword", nor is obscure, a new concept, trivial, a camouflage for saying nothing in particular, used to impress, etc, etc. &lt;strong&gt;In IR&lt;/strong&gt; we know exactly what tf-idf means, stands for, and what it does.</description>
		<content:encoded><![CDATA[<p>For those interested, here is the link to Wikipedia: <a href="http://en.wikipedia.org/wiki/Buzzword" rel="nofollow">http://en.wikipedia.org/wiki/Buzzword</a></p>
<p>One more thing, for definition lovers and pseudo teachers out there. This by virtue of wikipedia and previous definitions:</p>
<p>The statement that &#8220;the buzzword <strong>in IR</strong> is TFIDF&#8221; is by all means false, innaccurate, and shallow. <strong>In IR</strong> tf-idf is not a buzzword or &#8220;the buzzword&#8221;, nor is obscure, a new concept, trivial, a camouflage for saying nothing in particular, used to impress, etc, etc. <strong>In IR</strong> we know exactly what tf-idf means, stands for, and what it does.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-617</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 04 Jul 2008 14:03:20 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-617</guid>
		<description>Hi, Dan:

Thank you for stopping by. Last time we exchanged few words was back in SES, NY 2005, I believe. I hope you are doing well.

Dan, you are more than welcome to disagree.

&lt;blockquote&gt;...a "buzzword" is exactly what tf-IDF is.&lt;/blockquote&gt;

Sorry, Dan. I understand what you are coming from, but I have to disagree with you as well. Referring to IDF as a "buzzword" should be avoided. Why?

Buzzword is often used in reference to something that is new and obscure. He might not have literaly used the qualifier, but by using "buzzword" the message that came across was one of presenting tf-idf as something new and obscure.

Dan, since you want to go with the "definition defense", a definition is as good and accurate as who proposes it -Google in this case.

Merriam-webster dictionary (http://www.merriam-webster.com/dictionary/buzzword) lists this for "buzzword": (emphasis added to stress intention)

1: an important-sounding usually technical word or phrase &lt;strong&gt;often of little meaning&lt;/strong&gt; used chiefly to impress laymen 
2: &lt;strong&gt;a voguish word&lt;/strong&gt; or phrase —called also buzz phrase 

tf-idf stands for term frequency*inverse document frequency as a clear meaning. It is not of little meaning used chiefly to impress, nor is a voguish word or phrase. The scoring model it defines is quite simple: 

tf = number of times a term appears in a document
idf = log(N/ni)

where N is the collection size and ni number of documents mentioning query term i. It is not possible to compute IDF, nor tf*idf without knowing the number of documents N in the collection. If assuming term independence, any computed estimate of N, IDF, and tf*idf can be meaningless, especially with a dumb tool that does not use large-scale web collections.

Wikipedia has this line about "buzzword" (emphasis added to stress intention)

Buzzwords differ from jargon in that they have &lt;strong&gt;the function of impressing or of obscuring meaning&lt;/strong&gt;, while jargon (ideally) has a well-defined technical meaning, if only to specialists. However, the hype surrounding &lt;strong&gt;new technologies&lt;/strong&gt; often turns technical terms into buzzwords. 

Wikipedia also lists these "Reasons for using buzzwords" (emphasis added to stress intention)

Reasons for using buzzwords

With any stipulative neologism, such as "quark," &lt;strong&gt;to describe new concepts&lt;/strong&gt;, without the danger of over-simplification and confusion that can arise from using words and phrases with previously established, commonplace meanings. 

To control thought by being &lt;strong&gt;intentionally vague&lt;/strong&gt;. In management, stating organizational goals by using &lt;strong&gt;words with unclear meanings&lt;/strong&gt; but positive connotations prevents anybody from questioning the directions and intentions of these decisions, especially if many such words are used.[2] (See also newspeak.) 
To boost creativity among listeners by compelling them to think of the applications and particulars on their own. 
&lt;strong&gt;To make something trivial&lt;/strong&gt; seem to have greater import and stature. 
&lt;strong&gt;To impress&lt;/strong&gt; a judge or examiner by seeming familiar with a theory or principle by dint of mere name-dropping, as with "cognitive dissonance" or the "Heisenberg Uncertainty Principle." 
&lt;strong&gt;To provide a camouflage for saying nothing in particular.&lt;/strong&gt; 

Indeed, none of the above "definitions" applies to IDF or tf-idf, and quite honest is insulting and a bad descriptor used by someone that pretends to be a teacher, which is why your apparent "definition defense" cannot be sustained.

Let's discuss now the merits of the tf-idf tool reviewed.</description>
		<content:encoded><![CDATA[<p>Hi, Dan:</p>
<p>Thank you for stopping by. Last time we exchanged few words was back in SES, NY 2005, I believe. I hope you are doing well.</p>
<p>Dan, you are more than welcome to disagree.</p>
<blockquote><p>&#8230;a &#8220;buzzword&#8221; is exactly what tf-IDF is.</p></blockquote>
<p>Sorry, Dan. I understand what you are coming from, but I have to disagree with you as well. Referring to IDF as a &#8220;buzzword&#8221; should be avoided. Why?</p>
<p>Buzzword is often used in reference to something that is new and obscure. He might not have literaly used the qualifier, but by using &#8220;buzzword&#8221; the message that came across was one of presenting tf-idf as something new and obscure.</p>
<p>Dan, since you want to go with the &#8220;definition defense&#8221;, a definition is as good and accurate as who proposes it -Google in this case.</p>
<p>Merriam-webster dictionary (http://www.merriam-webster.com/dictionary/buzzword) lists this for &#8220;buzzword&#8221;: (emphasis added to stress intention)</p>
<p>1: an important-sounding usually technical word or phrase <strong>often of little meaning</strong> used chiefly to impress laymen<br />
2: <strong>a voguish word</strong> or phrase —called also buzz phrase </p>
<p>tf-idf stands for term frequency*inverse document frequency as a clear meaning. It is not of little meaning used chiefly to impress, nor is a voguish word or phrase. The scoring model it defines is quite simple: </p>
<p>tf = number of times a term appears in a document<br />
idf = log(N/ni)</p>
<p>where N is the collection size and ni number of documents mentioning query term i. It is not possible to compute IDF, nor tf*idf without knowing the number of documents N in the collection. If assuming term independence, any computed estimate of N, IDF, and tf*idf can be meaningless, especially with a dumb tool that does not use large-scale web collections.</p>
<p>Wikipedia has this line about &#8220;buzzword&#8221; (emphasis added to stress intention)</p>
<p>Buzzwords differ from jargon in that they have <strong>the function of impressing or of obscuring meaning</strong>, while jargon (ideally) has a well-defined technical meaning, if only to specialists. However, the hype surrounding <strong>new technologies</strong> often turns technical terms into buzzwords. </p>
<p>Wikipedia also lists these &#8220;Reasons for using buzzwords&#8221; (emphasis added to stress intention)</p>
<p>Reasons for using buzzwords</p>
<p>With any stipulative neologism, such as &#8220;quark,&#8221; <strong>to describe new concepts</strong>, without the danger of over-simplification and confusion that can arise from using words and phrases with previously established, commonplace meanings. </p>
<p>To control thought by being <strong>intentionally vague</strong>. In management, stating organizational goals by using <strong>words with unclear meanings</strong> but positive connotations prevents anybody from questioning the directions and intentions of these decisions, especially if many such words are used.[2] (See also newspeak.)<br />
To boost creativity among listeners by compelling them to think of the applications and particulars on their own.<br />
<strong>To make something trivial</strong> seem to have greater import and stature.<br />
<strong>To impress</strong> a judge or examiner by seeming familiar with a theory or principle by dint of mere name-dropping, as with &#8220;cognitive dissonance&#8221; or the &#8220;Heisenberg Uncertainty Principle.&#8221;<br />
<strong>To provide a camouflage for saying nothing in particular.</strong> </p>
<p>Indeed, none of the above &#8220;definitions&#8221; applies to IDF or tf-idf, and quite honest is insulting and a bad descriptor used by someone that pretends to be a teacher, which is why your apparent &#8220;definition defense&#8221; cannot be sustained.</p>
<p>Let&#8217;s discuss now the merits of the tf-idf tool reviewed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by danthies</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-616</link>
		<dc:creator>danthies</dc:creator>
		<pubDate>Fri, 04 Jul 2008 05:21:36 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-616</guid>
		<description>I hate to be disagreeable, Dr. Garcia, but a "buzzword" is exactly what tf-IDF is. Here's a definition offered by Google when I searched for buzzword. :D

"A word or phrase connected with a specialized field or group that usually sounds important or technical and is used primarily to impress laypersons."

Isn't that exactly what you've been talking about? And isn't it, in fact, quite accurate?</description>
		<content:encoded><![CDATA[<p>I hate to be disagreeable, Dr. Garcia, but a &#8220;buzzword&#8221; is exactly what tf-IDF is. Here&#8217;s a definition offered by Google when I searched for buzzword. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_biggrin.gif' alt=':D' class='wp-smiley' /> </p>
<p>&#8220;A word or phrase connected with a specialized field or group that usually sounds important or technical and is used primarily to impress laypersons.&#8221;</p>
<p>Isn&#8217;t that exactly what you&#8217;ve been talking about? And isn&#8217;t it, in fact, quite accurate?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-615</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 04 Jul 2008 02:31:33 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-615</guid>
		<description>It appears that some are getting their knowledge from Wikipedia, which not always gives accurate definitions.

For instance, Wikipedia (http://en.wikipedia.org/wiki/Tf-idf) incorrectly describes tf-idf as a measure used to evaluate how important a word is to a document in a collection.

“The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document.”

It is not true that tf*idf is used to evaluate term importance. A term weight score does not equate to term importance. Personally I don’t know of any current IR colleague that claims such thing.

The other problem with Wikipedia’s definition is that assumes a function f of the form f(term importance) = tf*idf and therefore that “the importance increases proportionally to the number of times a word appears in the document.”

Obviously, this is an incorrect definition. The importance of a term does not necessarily increases proportionally to the number of times a word appears in a document. A term repeated x times is not x time more important. Similarly a document repeating a term x times is not x times more pertinent to the term. 

In addition, taking the very same document and placing it in different collections, might change its tf*idf weight in the collection, but the importance of a term to a document remains the same, contradicting the proportionality assumption between term importance and the number of times a word appears in the document.

If that crap is what SEOs are teaching to their peers, they are messengers of misinformation.</description>
		<content:encoded><![CDATA[<p>It appears that some are getting their knowledge from Wikipedia, which not always gives accurate definitions.</p>
<p>For instance, Wikipedia (http://en.wikipedia.org/wiki/Tf-idf) incorrectly describes tf-idf as a measure used to evaluate how important a word is to a document in a collection.</p>
<p>“The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document.”</p>
<p>It is not true that tf*idf is used to evaluate term importance. A term weight score does not equate to term importance. Personally I don’t know of any current IR colleague that claims such thing.</p>
<p>The other problem with Wikipedia’s definition is that assumes a function f of the form f(term importance) = tf*idf and therefore that “the importance increases proportionally to the number of times a word appears in the document.”</p>
<p>Obviously, this is an incorrect definition. The importance of a term does not necessarily increases proportionally to the number of times a word appears in a document. A term repeated x times is not x time more important. Similarly a document repeating a term x times is not x times more pertinent to the term. </p>
<p>In addition, taking the very same document and placing it in different collections, might change its tf*idf weight in the collection, but the importance of a term to a document remains the same, contradicting the proportionality assumption between term importance and the number of times a word appears in the document.</p>
<p>If that crap is what SEOs are teaching to their peers, they are messengers of misinformation.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-613</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 04 Jul 2008 00:26:48 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-613</guid>
		<description>&lt;blockquote&gt;
Since you first referenced my post on TF-IDF, I’ve read your writings on it, and realized that I did not speak especially clearly in my post. It’s been on my list to revise it, while preserving the original, to make my message more clear.
&lt;/blockquote&gt;

I hope you do.

&lt;blockquote&gt;
I’ll address your original post once I’ve rewritten it to make my points more clear, but it’s obvious you didn’t do your homework on me  
Worse than that, your reference to me saying TF-IDF is new is entirely inaccurate. In no way did I suggest this is a new concept. My quote was “The buzzword in IR”, not the new buzzword. 
&lt;/blockquote&gt;

IDF and tf-idf is still not a buzzword in IR. Good try.

&lt;blockquote&gt;
Inaccurate and shallow reading like this makes me doubt the quality of all your writings.
&lt;/blockquote&gt;

Sorry to hear that. I might have to reciprocate your feelings about all your writings as well.</description>
		<content:encoded><![CDATA[<blockquote><p>
Since you first referenced my post on TF-IDF, I’ve read your writings on it, and realized that I did not speak especially clearly in my post. It’s been on my list to revise it, while preserving the original, to make my message more clear.
</p></blockquote>
<p>I hope you do.</p>
<blockquote><p>
I’ll address your original post once I’ve rewritten it to make my points more clear, but it’s obvious you didn’t do your homework on me<br />
Worse than that, your reference to me saying TF-IDF is new is entirely inaccurate. In no way did I suggest this is a new concept. My quote was “The buzzword in IR”, not the new buzzword.
</p></blockquote>
<p>IDF and tf-idf is still not a buzzword in IR. Good try.</p>
<blockquote><p>
Inaccurate and shallow reading like this makes me doubt the quality of all your writings.
</p></blockquote>
<p>Sorry to hear that. I might have to reciprocate your feelings about all your writings as well.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths: Part 2 by andyed</title>
		<link>http://irthoughts.wordpress.com/2008/07/03/seos-and-their-idf-myths-part-2/#comment-611</link>
		<dc:creator>andyed</dc:creator>
		<pubDate>Thu, 03 Jul 2008 23:33:00 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=235#comment-611</guid>
		<description>Dr. Garcia,

First of all, thanks for your work communicating about the challenges of search in a very detailed way.

Since you first referenced my post on TF-IDF, I've read your writings on it, and realized that I did not speak especially clearly in my post.  It's been on my list to revise it, while preserving the original, to make my message more clear.

I tried to provide direct feedback through several channels -- but only received email bounces.  It's a tough email world with spam, I get it.

I'll address your original post once I've rewritten it to make my points more clear, but it's obvious you didn't do your homework on me :)

Worse than that, your reference to me saying TF-IDF is new is entirely inaccurate.  In no way did I suggest this is a new concept.  My quote was "The buzzword in IR", not the new buzzword.  I was reviewing a new website doing some interesting word frequency analyses.

Inaccurate and shallow reading like this makes me doubt the quality of all your writings.

Best Regards,
Andy Edmonds</description>
		<content:encoded><![CDATA[<p>Dr. Garcia,</p>
<p>First of all, thanks for your work communicating about the challenges of search in a very detailed way.</p>
<p>Since you first referenced my post on TF-IDF, I&#8217;ve read your writings on it, and realized that I did not speak especially clearly in my post.  It&#8217;s been on my list to revise it, while preserving the original, to make my message more clear.</p>
<p>I tried to provide direct feedback through several channels &#8212; but only received email bounces.  It&#8217;s a tough email world with spam, I get it.</p>
<p>I&#8217;ll address your original post once I&#8217;ve rewritten it to make my points more clear, but it&#8217;s obvious you didn&#8217;t do your homework on me <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Worse than that, your reference to me saying TF-IDF is new is entirely inaccurate.  In no way did I suggest this is a new concept.  My quote was &#8220;The buzzword in IR&#8221;, not the new buzzword.  I was reviewing a new website doing some interesting word frequency analyses.</p>
<p>Inaccurate and shallow reading like this makes me doubt the quality of all your writings.</p>
<p>Best Regards,<br />
Andy Edmonds</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on A Call to SEOs Claiming to Sell LSI by SEOs and their IDF Myths: Part 2 &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/#comment-610</link>
		<dc:creator>SEOs and their IDF Myths: Part 2 &#171; IR Thoughts</dc:creator>
		<pubDate>Thu, 03 Jul 2008 13:26:18 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/2007/07/09/a-call-to-seos-claiming-to-sell-lsi/#comment-610</guid>
		<description>[...] seos have repeated like parrots such misinformation. It is not the first time. It reminds me of Wall&#8217;s claims about LSI, a topic he wrote extensively on until its ignorance about the topic was exposed in several blogs [...]</description>
		<content:encoded><![CDATA[<p>[...] seos have repeated like parrots such misinformation. It is not the first time. It reminds me of Wall&#8217;s claims about LSI, a topic he wrote extensively on until its ignorance about the topic was exposed in several blogs [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths by SEOs and their IDF Myths: Part 2 &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2008/06/17/seos-and-their-idf-myths/#comment-609</link>
		<dc:creator>SEOs and their IDF Myths: Part 2 &#171; IR Thoughts</dc:creator>
		<pubDate>Thu, 03 Jul 2008 13:24:02 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=219#comment-609</guid>
		<description>[...] and their IDF Myths: Part&#160;2  This post is a continuation of a previous one on the topic of SEO non sense in relation with inverse document frequency (IDF). IDF has a long [...]</description>
		<content:encoded><![CDATA[<p>[...] and their IDF Myths: Part&nbsp;2  This post is a continuation of a previous one on the topic of SEO non sense in relation with inverse document frequency (IDF). IDF has a long [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on SEOs and their IDF Myths by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/06/17/seos-and-their-idf-myths/#comment-603</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Wed, 18 Jun 2008 16:52:25 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=219#comment-603</guid>
		<description>An example of such non sense is given in &lt;a href="http://alwaysbetesting.com/abtest/index.cfm/2008/5/24/Term-Frequency-Inverse-Document-Frequency-TDIDF-Exploring-TheRarestWordscom" rel="nofollow"&gt;http://alwaysbetesting.com/abtest/index.cfm/2008/5/24/Term-Frequency-Inverse-Document-Frequency-TDIDF-Exploring-TheRarestWordscom&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>An example of such non sense is given in <a href="http://alwaysbetesting.com/abtest/index.cfm/2008/5/24/Term-Frequency-Inverse-Document-Frequency-TDIDF-Exploring-TheRarestWordscom" rel="nofollow">http://alwaysbetesting.com/abtest/index.cfm/2008/5/24/Term-Frequency-Inverse-Document-Frequency-TDIDF-Exploring-TheRarestWordscom</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Vector Space Models and Search Engines by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/04/21/vector-space-models-and-search-engines/#comment-602</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Sun, 15 Jun 2008 13:50:48 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=202#comment-602</guid>
		<description>Hi, Pavel:

Thank you for expressing your opinion.

I’m inclined to agree with you, but only partially. The correctness of referring to log(N/n) as Inverse Document Frequency is a legacy from Dr. Stephen Robertson and the late Karen Sparck Jones who invented this measure of term specificity. For details, see The IDF Page over at 

http://www.soi.city.ac.uk/~ser/idf.html

I always recommend my students to read link 3 of The IDF Page, to really understand what is/is not IDF; i.e., "Understanding Document Frequency: on theoretical arguments for IDF".

http://www.soi.city.ac.uk/~ser/idfpapers/Robertson_idf_JDoc.pdf 

The N/n is “inverse document frequency”. 

In my opinion, there is nothing wrong with rescaling this using log notation and still referring to it as IDF. Honestly this is a matter of taste, in the same way is a matter of taste referring to pH as the negative log of the hydrogen ion concentration, pH = -log[H+], which even many chemists don’t agree with. But you can have your own opinions on this, just like Robertson and me.

Nevertheless, logs are used simply because they are additive and play well to two notions:

1. That document scoring functions tend to be additive.
2. The term independence assumption.

Incidentally, I am finishing the next issue of IRW newsletter which precisely revisits Robertson-Sparck Jones’s IDF concept and presents a new model for term specificity. I think you will like it.</description>
		<content:encoded><![CDATA[<p>Hi, Pavel:</p>
<p>Thank you for expressing your opinion.</p>
<p>I’m inclined to agree with you, but only partially. The correctness of referring to log(N/n) as Inverse Document Frequency is a legacy from Dr. Stephen Robertson and the late Karen Sparck Jones who invented this measure of term specificity. For details, see The IDF Page over at </p>
<p><a href="http://www.soi.city.ac.uk/~ser/idf.html" rel="nofollow">http://www.soi.city.ac.uk/~ser/idf.html</a></p>
<p>I always recommend my students to read link 3 of The IDF Page, to really understand what is/is not IDF; i.e., &#8220;Understanding Document Frequency: on theoretical arguments for IDF&#8221;.</p>
<p><a href="http://www.soi.city.ac.uk/~ser/idfpapers/Robertson_idf_JDoc.pdf" rel="nofollow">http://www.soi.city.ac.uk/~ser/idfpapers/Robertson_idf_JDoc.pdf</a> </p>
<p>The N/n is “inverse document frequency”. </p>
<p>In my opinion, there is nothing wrong with rescaling this using log notation and still referring to it as IDF. Honestly this is a matter of taste, in the same way is a matter of taste referring to pH as the negative log of the hydrogen ion concentration, pH = -log[H+], which even many chemists don’t agree with. But you can have your own opinions on this, just like Robertson and me.</p>
<p>Nevertheless, logs are used simply because they are additive and play well to two notions:</p>
<p>1. That document scoring functions tend to be additive.<br />
2. The term independence assumption.</p>
<p>Incidentally, I am finishing the next issue of IRW newsletter which precisely revisits Robertson-Sparck Jones’s IDF concept and presents a new model for term specificity. I think you will like it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Vector Space Models and Search Engines by textanalytic</title>
		<link>http://irthoughts.wordpress.com/2008/04/21/vector-space-models-and-search-engines/#comment-600</link>
		<dc:creator>textanalytic</dc:creator>
		<pubDate>Sat, 14 Jun 2008 19:51:18 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=202#comment-600</guid>
		<description>Hi Dr. Garcia,

 I liked your post, except for the IDF business, and the TF discussion. Specifically something that really bugs me is this line:

IDF = log(N/n), which I have seen quite a few times in the literature.

We cannot do this anymore. Inverse document frequency has to be defined as 1/document frequency. Its really an insult to students with good math/physics background to use the log(N/n) definition.

And taking a log of IDF, we get the information content of the keyword. And in our dark ages very few people are aware of Shannon's information theory, so a small discussion on logs and entropy is very warranted.

I'm planning to teach my first course on IR and text mining soon, so I got really interested in your blog.

thx
Pavel

P.S. Wikipedia has the same definition, and you can almost read IDF = log( IDF )</description>
		<content:encoded><![CDATA[<p>Hi Dr. Garcia,</p>
<p> I liked your post, except for the IDF business, and the TF discussion. Specifically something that really bugs me is this line:</p>
<p>IDF = log(N/n), which I have seen quite a few times in the literature.</p>
<p>We cannot do this anymore. Inverse document frequency has to be defined as 1/document frequency. Its really an insult to students with good math/physics background to use the log(N/n) definition.</p>
<p>And taking a log of IDF, we get the information content of the keyword. And in our dark ages very few people are aware of Shannon&#8217;s information theory, so a small discussion on logs and entropy is very warranted.</p>
<p>I&#8217;m planning to teach my first course on IR and text mining soon, so I got really interested in your blog.</p>
<p>thx<br />
Pavel</p>
<p>P.S. Wikipedia has the same definition, and you can almost read IDF = log( IDF )</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on For SEO Spammers: AIRWeb 2008 Presentations by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/04/29/for-seo-spammers-airweb-2008-presentations/#comment-595</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Wed, 14 May 2008 20:01:15 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=206#comment-595</guid>
		<description>Thank you for sharing.</description>
		<content:encoded><![CDATA[<p>Thank you for sharing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 9 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/05/09/search-engines-architecture-week-9/#comment-594</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Wed, 14 May 2008 18:54:21 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=210#comment-594</guid>
		<description>Hi, Gina:

By now you should have copy of the practice exam. I am emailing copy of Lab 1 answers, concentrating in those the class found a bit hardy.</description>
		<content:encoded><![CDATA[<p>Hi, Gina:</p>
<p>By now you should have copy of the practice exam. I am emailing copy of Lab 1 answers, concentrating in those the class found a bit hardy.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 9 by gina</title>
		<link>http://irthoughts.wordpress.com/2008/05/09/search-engines-architecture-week-9/#comment-592</link>
		<dc:creator>gina</dc:creator>
		<pubDate>Tue, 13 May 2008 00:52:05 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=210#comment-592</guid>
		<description>Hello Professor,

Please remember to send us the practice exam and the Lab 1 answers.

Thanks,
Gina</description>
		<content:encoded><![CDATA[<p>Hello Professor,</p>
<p>Please remember to send us the practice exam and the Lab 1 answers.</p>
<p>Thanks,<br />
Gina</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on For SEO Spammers: AIRWeb 2008 Presentations by notgerner</title>
		<link>http://irthoughts.wordpress.com/2008/04/29/for-seo-spammers-airweb-2008-presentations/#comment-591</link>
		<dc:creator>notgerner</dc:creator>
		<pubDate>Mon, 12 May 2008 01:42:19 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=206#comment-591</guid>
		<description>Dr. Garcia,  
Your post prompted me to try my hand at a very naive approach to spam detection using features present only in the domain name, if only to see how difficult it would be to get started on this problem.  I thought you might be interested:

http://www.seomoz.org/blog/building-a-better-spam-detector

In any-case, you continue to be an inspiration.  Thanks!</description>
		<content:encoded><![CDATA[<p>Dr. Garcia,<br />
Your post prompted me to try my hand at a very naive approach to spam detection using features present only in the domain name, if only to see how difficult it would be to get started on this problem.  I thought you might be interested:</p>
<p><a href="http://www.seomoz.org/blog/building-a-better-spam-detector" rel="nofollow">http://www.seomoz.org/blog/building-a-better-spam-detector</a></p>
<p>In any-case, you continue to be an inspiration.  Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Building the Porter Stemmer by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/04/22/building-the-porter-stemmer/#comment-584</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Sun, 04 May 2008 15:12:23 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=203#comment-584</guid>
		<description>Hi, Gina:

Correct.</description>
		<content:encoded><![CDATA[<p>Hi, Gina:</p>
<p>Correct.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Building the Porter Stemmer by gina</title>
		<link>http://irthoughts.wordpress.com/2008/04/22/building-the-porter-stemmer/#comment-583</link>
		<dc:creator>gina</dc:creator>
		<pubDate>Sun, 04 May 2008 13:29:44 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=203#comment-583</guid>
		<description>Hello Professor,

I understand we will be recycling the query normalizer from Lab 4. On that exercise, we only removed trailing stops. However, I am guessing that for this exercise, we should completely tokenize the query. Is this right?

Thanks,
Gina</description>
		<content:encoded><![CDATA[<p>Hello Professor,</p>
<p>I understand we will be recycling the query normalizer from Lab 4. On that exercise, we only removed trailing stops. However, I am guessing that for this exercise, we should completely tokenize the query. Is this right?</p>
<p>Thanks,<br />
Gina</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Understanding Search Engines by Vector Space Models and Search Engines &#171; IR Thoughts</title>
		<link>http://irthoughts.wordpress.com/2008/04/03/understanding-search-engines/#comment-582</link>
		<dc:creator>Vector Space Models and Search Engines &#171; IR Thoughts</dc:creator>
		<pubDate>Mon, 21 Apr 2008 15:02:00 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=191#comment-582</guid>
		<description>[...] Vector Space Models and Search&#160;Engines  This 23rd, I&#8217;ll be at UPRB.edu presenting the talk Understanding Search Engines. http://irthoughts.wordpress.com/2008/04/03/understanding-search-engines/ [...]</description>
		<content:encoded><![CDATA[<p>[...] Vector Space Models and Search&nbsp;Engines  This 23rd, I&#8217;ll be at UPRB.edu presenting the talk Understanding Search Engines. <a href="http://irthoughts.wordpress.com/2008/04/03/understanding-search-engines/" rel="nofollow">http://irthoughts.wordpress.com/2008/04/03/understanding-search-engines/</a> [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Experiment in Parsing Techniques by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/04/14/experiment-in-parsing-techniques/#comment-581</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Tue, 15 Apr 2008 15:13:26 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=197#comment-581</guid>
		<description>1. Correction:

During the last lecture, I gave you a CSS file with block lines ending with semicolons. One of the students (Gina) pointed out to me that this should not be done. She is right as the CSS will not validate nor will be interpreted by Firefox (IE is more lenient in this sense). Thanks, Gina. :)

Please remove the semicolons and you will see the CSS in action in both browsers. When I wrote the CSSs and companion JavaScript instructions my texts editor was checked to end all lines with a semicolon. This explains the outcome. I forgot to double check the output in both browsers. My fault.

Talking about semicolons, Yahoo!'s Douglas Crockford and creator of JavaScript Object Notation (JSON) recommends ending functions blocks with a semicolon, especially when part of prototype lines. http://yuiblog.com/blog/2007/01/24/video-crockford-tjpl/

Interestingly, he also suggests the use of === instead of == in JavaScript to avoid coercion. My old scripts use ===, but when I run these with the latest version of IE on Windows Vista these seem to trepidate. Using == they seem to work just fine. I'm researching what causes the glitch in my scripts.

2. Reminder:

A reminder that to get full credit for your lab reports you need to turn in both hard and electronic copies of these.

Don't forget to document all findings, including how-to installations if any is needed.</description>
		<content:encoded><![CDATA[<p>1. Correction:</p>
<p>During the last lecture, I gave you a CSS file with block lines ending with semicolons. One of the students (Gina) pointed out to me that this should not be done. She is right as the CSS will not validate nor will be interpreted by Firefox (IE is more lenient in this sense). Thanks, Gina. <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Please remove the semicolons and you will see the CSS in action in both browsers. When I wrote the CSSs and companion JavaScript instructions my texts editor was checked to end all lines with a semicolon. This explains the outcome. I forgot to double check the output in both browsers. My fault.</p>
<p>Talking about semicolons, Yahoo!&#8217;s Douglas Crockford and creator of JavaScript Object Notation (JSON) recommends ending functions blocks with a semicolon, especially when part of prototype lines. <a href="http://yuiblog.com/blog/2007/01/24/video-crockford-tjpl/" rel="nofollow">http://yuiblog.com/blog/2007/01/24/video-crockford-tjpl/</a></p>
<p>Interestingly, he also suggests the use of === instead of == in JavaScript to avoid coercion. My old scripts use ===, but when I run these with the latest version of IE on Windows Vista these seem to trepidate. Using == they seem to work just fine. I&#8217;m researching what causes the glitch in my scripts.</p>
<p>2. Reminder:</p>
<p>A reminder that to get full credit for your lab reports you need to turn in both hard and electronic copies of these.</p>
<p>Don&#8217;t forget to document all findings, including how-to installations if any is needed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 4 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/04/04/search-engines-architecture-week-4/#comment-580</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 11 Apr 2008 15:51:14 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/2008/04/04/search-engines-architecture-week-4/#comment-580</guid>
		<description>Since this is a hands-on course, any roadblock is a learning lesson. 

Try other alternatives. 

1. Google for other perl scripts (there are plenty).
2. Try to use a C code like the old Lycos Scoutget code. 
http://robot-club.com/lti/lycos/scoutget.html 
3. Try any PHP or Java based spider.

Here is a barebone simple pseudocode. The crawler should do this:

1. get a web page via an http request (it can be via AJAX) and send it to a directory.
2. scrape links from that page.
3. pick a link from the page and do step 1.

Modifications:

In step 2, the program sends links to a list of links and in step 3, picks a link from the list instead from the page.

Once we send enough documents to the directory, we can index this with Terrier.</description>
		<content:encoded><![CDATA[<p>Since this is a hands-on course, any roadblock is a learning lesson. </p>
<p>Try other alternatives. </p>
<p>1. Google for other perl scripts (there are plenty).<br />
2. Try to use a C code like the old Lycos Scoutget code.<br />
<a href="http://robot-club.com/lti/lycos/scoutget.html" rel="nofollow">http://robot-club.com/lti/lycos/scoutget.html</a><br />
3. Try any PHP or Java based spider.</p>
<p>Here is a barebone simple pseudocode. The crawler should do this:</p>
<p>1. get a web page via an http request (it can be via AJAX) and send it to a directory.<br />
2. scrape links from that page.<br />
3. pick a link from the page and do step 1.</p>
<p>Modifications:</p>
<p>In step 2, the program sends links to a list of links and in step 3, picks a link from the list instead from the page.</p>
<p>Once we send enough documents to the directory, we can index this with Terrier.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 4 by panzernieves</title>
		<link>http://irthoughts.wordpress.com/2008/04/04/search-engines-architecture-week-4/#comment-579</link>
		<dc:creator>panzernieves</dc:creator>
		<pubDate>Fri, 11 Apr 2008 15:05:44 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/2008/04/04/search-engines-architecture-week-4/#comment-579</guid>
		<description>Hi Prof.:

We've been working with the perl spider for a while now and while it seems to work (at least it runs!!!) it doesnt seems to be working right. It never get the right results (it doesn't get any results, that is). I'd document what I do understand about the code but there not much I can do for now.

G. Nieves</description>
		<content:encoded><![CDATA[<p>Hi Prof.:</p>
<p>We&#8217;ve been working with the perl spider for a while now and while it seems to work (at least it runs!!!) it doesnt seems to be working right. It never get the right results (it doesn&#8217;t get any results, that is). I&#8217;d document what I do understand about the code but there not much I can do for now.</p>
<p>G. Nieves</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 3 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/03/28/search-engines-architecture-week-3/#comment-578</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 04 Apr 2008 18:59:21 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=188#comment-578</guid>
		<description>Hi, Gina:

I just ran a fresh search for inverted file without problem. This is what I did. I'm using the Windows version of Terrier from a USB removable drive. I tested on Vista and on XP.

1. Ran Terrier, indexing its own documentation.
2. Searched for inverted file as a query.
3. Double clicked on bin/interactive_terrier.bat file
4. Searched for inverted file as a query.

After the standard headers, I got. 

Set TERRIER_HOME to be J:\terr
WARNING: The file terrier.prop
rrier\etc\terrier.properties
 Assuming the value of terrier
INFO - time to intialise index
Please enter your query: inverted file

        Displaying 1-82 result
0 326 112 2.908176030920721
1 753 471 2.8691891125133053
2 426 212 2.776936903771998
3 759 477 2.747474445016238
4 424 210 2.691801832968892
5 745 463 2.478234538724751
6 741 459 2.4410476260123852
7 734 452 2.4026089191907287
8 1000 665 2.351714630117981
9 427 213 2.3437764460259336
10 402 188 2.3294244235277524
11 975 640 2.2693842354980465
12 425 211 2.2625182087305125
13 422 208 2.2102952746378635
14 301 87 2.067004898225863
15 429 215 2.06550056347662
16 548 279 2.024848127420526
17 245 78 2.024848127420526
18 197 75 2.007975485950941
19 76 40 1.9954047945790332
20 404 190 1.979005166160501
21 703 421 1.9562281732712443
22 339 125 1.9536831254119893
23 434 220 1.9235037768768704
24 756 474 1.7579234589850747
25 28 10 1.7505691014173455
26 439 225 1.7188073021593937
27 406 192 1.6978730490041303
28 436 222 1.6698926255833755
29 747 465 1.6273903541804449
30 338 124 1.5987128520302278
31 22 5 1.5618933140413125
32 19 2 1.4967606279394443
33 751 469 1.480498642467794
34 124 63 1.4586748548300594
35 437 223 1.4486166892075465
36 749 467 1.424296235607222
37 468 254 1.3813768603244259
38 752 470 1.3636373922215335
39 304 90 1.2695069276928859
40 707 425 1.2595481602180154
41 968 633 1.2427295999967598
42 340 126 1.238697939749856
43 758 476 1.2361806398396333
44 718 436 1.2171117485149614
45 112 51 1.2044548267644286
46 303 89 1.191849419444905
47 1013 678 1.1900218043921074
48 716 434 1.175694715373152
49 305 91 1.1144461445417837
50 373 159 1.071949456510424
51 350 136 1.0442178750271376
52 120 59 1.032249223156422
53 39 21 1.0289066140518082
54 754 472 1.021245562355564
55 431 217 1.0112196888981333
56 40 22 0.9783128316736401
57 33 15 0.952665942151352
58 130 69 0.9323554684113402
59 401 187 0.9298191911149045
60 606 324 0.9031691289936831
61 662 380 0.8991516469240083
62 736 454 0.8835616760231381
63 323 109 0.882295943944168
64 126 65 0.8768326329975594
65 403 189 0.8603341765453334
66 667 385 0.853341277999745
67 742 460 0.8361417417726925
68 668 386 0.8064130750674726
69 412 198 0.7467488439890779
70 123 62 0.745745069118931
71 1007 672 0.6855982352785376
72 121 60 0.6414169643360372
73 115 54 0.612631867415782
74 717 435 0.5569985846392175
75 117 56 0.5425438654855919
76 128 67 0.5311998902670805
77 118 57 0.4688456765488282
78 127 66 0.44246701888818213
79 113 52 0.42745882075231073
80 125 64 0.34905485374098644
81 75 39 0.20774592741474804
Please enter your query:</description>
		<content:encoded><![CDATA[<p>Hi, Gina:</p>
<p>I just ran a fresh search for inverted file without problem. This is what I did. I&#8217;m using the Windows version of Terrier from a USB removable drive. I tested on Vista and on XP.</p>
<p>1. Ran Terrier, indexing its own documentation.<br />
2. Searched for inverted file as a query.<br />
3. Double clicked on bin/interactive_terrier.bat file<br />
4. Searched for inverted file as a query.</p>
<p>After the standard headers, I got. </p>
<p>Set TERRIER_HOME to be J:\terr<br />
WARNING: The file terrier.prop<br />
rrier\etc\terrier.properties<br />
 Assuming the value of terrier<br />
INFO - time to intialise index<br />
Please enter your query: inverted file</p>
<p>        Displaying 1-82 result<br />
0 326 112 2.908176030920721<br />
1 753 471 2.8691891125133053<br />
2 426 212 2.776936903771998<br />
3 759 477 2.747474445016238<br />
4 424 210 2.691801832968892<br />
5 745 463 2.478234538724751<br />
6 741 459 2.4410476260123852<br />
7 734 452 2.4026089191907287<br />
8 1000 665 2.351714630117981<br />
9 427 213 2.3437764460259336<br />
10 402 188 2.3294244235277524<br />
11 975 640 2.2693842354980465<br />
12 425 211 2.2625182087305125<br />
13 422 208 2.2102952746378635<br />
14 301 87 2.067004898225863<br />
15 429 215 2.06550056347662<br />
16 548 279 2.024848127420526<br />
17 245 78 2.024848127420526<br />
18 197 75 2.007975485950941<br />
19 76 40 1.9954047945790332<br />
20 404 190 1.979005166160501<br />
21 703 421 1.9562281732712443<br />
22 339 125 1.9536831254119893<br />
23 434 220 1.9235037768768704<br />
24 756 474 1.7579234589850747<br />
25 28 10 1.7505691014173455<br />
26 439 225 1.7188073021593937<br />
27 406 192 1.6978730490041303<br />
28 436 222 1.6698926255833755<br />
29 747 465 1.6273903541804449<br />
30 338 124 1.5987128520302278<br />
31 22 5 1.5618933140413125<br />
32 19 2 1.4967606279394443<br />
33 751 469 1.480498642467794<br />
34 124 63 1.4586748548300594<br />
35 437 223 1.4486166892075465<br />
36 749 467 1.424296235607222<br />
37 468 254 1.3813768603244259<br />
38 752 470 1.3636373922215335<br />
39 304 90 1.2695069276928859<br />
40 707 425 1.2595481602180154<br />
41 968 633 1.2427295999967598<br />
42 340 126 1.238697939749856<br />
43 758 476 1.2361806398396333<br />
44 718 436 1.2171117485149614<br />
45 112 51 1.2044548267644286<br />
46 303 89 1.191849419444905<br />
47 1013 678 1.1900218043921074<br />
48 716 434 1.175694715373152<br />
49 305 91 1.1144461445417837<br />
50 373 159 1.071949456510424<br />
51 350 136 1.0442178750271376<br />
52 120 59 1.032249223156422<br />
53 39 21 1.0289066140518082<br />
54 754 472 1.021245562355564<br />
55 431 217 1.0112196888981333<br />
56 40 22 0.9783128316736401<br />
57 33 15 0.952665942151352<br />
58 130 69 0.9323554684113402<br />
59 401 187 0.9298191911149045<br />
60 606 324 0.9031691289936831<br />
61 662 380 0.8991516469240083<br />
62 736 454 0.8835616760231381<br />
63 323 109 0.882295943944168<br />
64 126 65 0.8768326329975594<br />
65 403 189 0.8603341765453334<br />
66 667 385 0.853341277999745<br />
67 742 460 0.8361417417726925<br />
68 668 386 0.8064130750674726<br />
69 412 198 0.7467488439890779<br />
70 123 62 0.745745069118931<br />
71 1007 672 0.6855982352785376<br />
72 121 60 0.6414169643360372<br />
73 115 54 0.612631867415782<br />
74 717 435 0.5569985846392175<br />
75 117 56 0.5425438654855919<br />
76 128 67 0.5311998902670805<br />
77 118 57 0.4688456765488282<br />
78 127 66 0.44246701888818213<br />
79 113 52 0.42745882075231073<br />
80 125 64 0.34905485374098644<br />
81 75 39 0.20774592741474804<br />
Please enter your query:</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 3 by gina</title>
		<link>http://irthoughts.wordpress.com/2008/03/28/search-engines-architecture-week-3/#comment-577</link>
		<dc:creator>gina</dc:creator>
		<pubDate>Fri, 04 Apr 2008 17:55:53 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=188#comment-577</guid>
		<description>I already did, still its not returning any results.</description>
		<content:encoded><![CDATA[<p>I already did, still its not returning any results.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 3 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/03/28/search-engines-architecture-week-3/#comment-576</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 04 Apr 2008 17:26:03 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=188#comment-576</guid>
		<description>Hi, Gina:

Try this:

1. Index some files with Terrier Desktop and then do a search.
2. Keeping this one open, try using the interactive_terrier.bat for the same search.</description>
		<content:encoded><![CDATA[<p>Hi, Gina:</p>
<p>Try this:</p>
<p>1. Index some files with Terrier Desktop and then do a search.<br />
2. Keeping this one open, try using the interactive_terrier.bat for the same search.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 3 by gina</title>
		<link>http://irthoughts.wordpress.com/2008/03/28/search-engines-architecture-week-3/#comment-575</link>
		<dc:creator>gina</dc:creator>
		<pubDate>Fri, 04 Apr 2008 15:11:06 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=188#comment-575</guid>
		<description>Hello Professor,

I've been playing around with Terrier, searching the contents of the collection and learned how to index a specific collection. And while the desktop_terrier.bat searches are returning the correct results of my searches, I haven't been able to do the same with the interactive_terrier.bat. It keeps returning "No results".

I renamed the /etc/terrier.properties file and modified it to suit my document paths. However, with or without modifying the file, it does not return results.

I have searched the Terrier wiki and forums, and even Google, but I haven't found a solution. Can you shed any light on this issue?

Thanks,
Gina</description>
		<content:encoded><![CDATA[<p>Hello Professor,</p>
<p>I&#8217;ve been playing around with Terrier, searching the contents of the collection and learned how to index a specific collection. And while the desktop_terrier.bat searches are returning the correct results of my searches, I haven&#8217;t been able to do the same with the interactive_terrier.bat. It keeps returning &#8220;No results&#8221;.</p>
<p>I renamed the /etc/terrier.properties file and modified it to suit my document paths. However, with or without modifying the file, it does not return results.</p>
<p>I have searched the Terrier wiki and forums, and even Google, but I haven&#8217;t found a solution. Can you shed any light on this issue?</p>
<p>Thanks,<br />
Gina</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 3 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/03/28/search-engines-architecture-week-3/#comment-574</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Thu, 03 Apr 2008 12:50:14 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=188#comment-574</guid>
		<description>Hi, Luis:

Yes.</description>
		<content:encoded><![CDATA[<p>Hi, Luis:</p>
<p>Yes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 3 by luisjaniel</title>
		<link>http://irthoughts.wordpress.com/2008/03/28/search-engines-architecture-week-3/#comment-573</link>
		<dc:creator>luisjaniel</dc:creator>
		<pubDate>Thu, 03 Apr 2008 03:01:40 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=188#comment-573</guid>
		<description>Saludos Profesor,

Los labs tenemos que entregarlos también en formato digital pero, ¿en un archivo comprimido para pasarlo de un flash drive a otro? 

Gracias.</description>
		<content:encoded><![CDATA[<p>Saludos Profesor,</p>
<p>Los labs tenemos que entregarlos también en formato digital pero, ¿en un archivo comprimido para pasarlo de un flash drive a otro? </p>
<p>Gracias.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-571</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 28 Mar 2008 12:26:03 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-571</guid>
		<description>Hi, Gerardo;

Double check the naming conventions. Schlen and Smith also use different naming conventions. We can discuss it tomorrow in class.

Cheers</description>
		<content:encoded><![CDATA[<p>Hi, Gerardo;</p>
<p>Double check the naming conventions. Schlen and Smith also use different naming conventions. We can discuss it tomorrow in class.</p>
<p>Cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by panzernieves</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-570</link>
		<dc:creator>panzernieves</dc:creator>
		<pubDate>Fri, 28 Mar 2008 05:28:45 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-570</guid>
		<description>Hi,

In order to test the pca1 program given in the papers, I advice you to use the example given in PCA and SPCA Tutorial by Dr. Garcia. 

In order to obtain the same results than in the tutorial type:

[signals, PC, V] = pca1(data);  


where 

1) data is the transpose of X (first matrix in tutorial, where columns are ordered Weight, Height, Age) 
2) PC is equivalent to V in the tutorial,
3) V is the diagonal elements of matrix S in the tutorial.

Not sure what signals really mean but sure it aint Step 4's YV!!!
Can anybody share any light on this matter???

Cheers!!!

G. Nieves</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>In order to test the pca1 program given in the papers, I advice you to use the example given in PCA and SPCA Tutorial by Dr. Garcia. </p>
<p>In order to obtain the same results than in the tutorial type:</p>
<p>[signals, PC, V] = pca1(data);  </p>
<p>where </p>
<p>1) data is the transpose of X (first matrix in tutorial, where columns are ordered Weight, Height, Age)<br />
2) PC is equivalent to V in the tutorial,<br />
3) V is the diagonal elements of matrix S in the tutorial.</p>
<p>Not sure what signals really mean but sure it aint Step 4&#8217;s YV!!!<br />
Can anybody share any light on this matter???</p>
<p>Cheers!!!</p>
<p>G. Nieves</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by panzernieves</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-567</link>
		<dc:creator>panzernieves</dc:creator>
		<pubDate>Wed, 26 Mar 2008 19:12:31 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-567</guid>
		<description>Correction and more suggestions:

Spectrum := set of all eigenvalues of a given matrix.

Matlab:
[PC, V] = eig(covariance);

Scilab:
[PC,V] = spec(covariance);

In order to use an user-built function you need to load the function first, once the function is written, type in Scilab environment:

getf("C:\...whatever path...\cov.sce");

In this case I was trying to load my function cov.

Once loaded simply type:

&#62;&#62;[c] = cov(A);

PS: If there is an error in the code, fix it and save and load the function again!!!

Cheer!!!

G. Nieves</description>
		<content:encoded><![CDATA[<p>Correction and more suggestions:</p>
<p>Spectrum := set of all eigenvalues of a given matrix.</p>
<p>Matlab:<br />
[PC, V] = eig(covariance);</p>
<p>Scilab:<br />
[PC,V] = spec(covariance);</p>
<p>In order to use an user-built function you need to load the function first, once the function is written, type in Scilab environment:</p>
<p>getf(&#8221;C:\&#8230;whatever path&#8230;\cov.sce&#8221;);</p>
<p>In this case I was trying to load my function cov.</p>
<p>Once loaded simply type:</p>
<p>&gt;&gt;[c] = cov(A);</p>
<p>PS: If there is an error in the code, fix it and save and load the function again!!!</p>
<p>Cheer!!!</p>
<p>G. Nieves</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by panzernieves</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-566</link>
		<dc:creator>panzernieves</dc:creator>
		<pubDate>Wed, 26 Mar 2008 12:12:29 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-566</guid>
		<description>Hi,

Still having problems with:

1)  [PC, V] = eig(covariance) usins scilab, well I found a possible solution for this:

"The Spectrum of a Matrix is the set of all eigenvectors of a matrix"

That is why I suggest you use 

                 [PC, V] = spec(A); 

instead.  If youre using scilab, this function yields the same results (mathematically speaking, that is) than [PC, V] = eig(covariance)!!!

For more information just type help spec in the scilab environment.

2) If you need to express your results in with two decimals, just write format bank in matlab or format 5 in scilab (thanks A. Paris for this one).

3) All functions need to end with endfunction so be careful with the code, Im still working on this. 

More info? Just keep on blogging (is that even a word/verb?).

Cheers!!!

G.Nieves</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>Still having problems with:</p>
<p>1)  [PC, V] = eig(covariance) usins scilab, well I found a possible solution for this:</p>
<p>&#8220;The Spectrum of a Matrix is the set of all eigenvectors of a matrix&#8221;</p>
<p>That is why I suggest you use </p>
<p>                 [PC, V] = spec(A); </p>
<p>instead.  If youre using scilab, this function yields the same results (mathematically speaking, that is) than [PC, V] = eig(covariance)!!!</p>
<p>For more information just type help spec in the scilab environment.</p>
<p>2) If you need to express your results in with two decimals, just write format bank in matlab or format 5 in scilab (thanks A. Paris for this one).</p>
<p>3) All functions need to end with endfunction so be careful with the code, Im still working on this. </p>
<p>More info? Just keep on blogging (is that even a word/verb?).</p>
<p>Cheers!!!</p>
<p>G.Nieves</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Principal Component Analysis Tutorial and Other Algos by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/03/18/principal-component-analysis-tutorial/#comment-565</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Wed, 26 Mar 2008 00:34:01 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=184#comment-565</guid>
		<description>The tutorial is available now 
http://irthoughts.wordpress.com/2008/03/25/pca-and-spca-tutorial/</description>
		<content:encoded><![CDATA[<p>The tutorial is available now<br />
<a href="http://irthoughts.wordpress.com/2008/03/25/pca-and-spca-tutorial/" rel="nofollow">http://irthoughts.wordpress.com/2008/03/25/pca-and-spca-tutorial/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by panzernieves</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-564</link>
		<dc:creator>panzernieves</dc:creator>
		<pubDate>Tue, 25 Mar 2008 03:14:34 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-564</guid>
		<description>Hi,

I did Exercise 4 using the provided codes.  I copy and paste them in Matlab and had to do minor corrections to the code in order to use them.  I didnt use Scilab for this problem but I will try them tomorrow on Scilab and do corrections if necessary and tell you whatever I find out.

Cheers!!!

G. Nieves</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I did Exercise 4 using the provided codes.  I copy and paste them in Matlab and had to do minor corrections to the code in order to use them.  I didnt use Scilab for this problem but I will try them tomorrow on Scilab and do corrections if necessary and tell you whatever I find out.</p>
<p>Cheers!!!</p>
<p>G. Nieves</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-563</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Tue, 25 Mar 2008 00:05:03 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-563</guid>
		<description>Hi, Luis:

It is easier if you can provide EXCEL files and reference these in the report as doc-1.xls, doc-2.xls, etc. Place all files, docs, etc in a single folder and zip it. Provide also hard copy of the report.</description>
		<content:encoded><![CDATA[<p>Hi, Luis:</p>
<p>It is easier if you can provide EXCEL files and reference these in the report as doc-1.xls, doc-2.xls, etc. Place all files, docs, etc in a single folder and zip it. Provide also hard copy of the report.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by luisjaniel</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-562</link>
		<dc:creator>luisjaniel</dc:creator>
		<pubDate>Mon, 24 Mar 2008 22:13:19 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-562</guid>
		<description>Saludos Dr. García,

Le escribo para preguntar si el todo el Lab hay que pasarlo usando el template que usted envió incluyendo los que expecifican EXCEL.

Gracias.</description>
		<content:encoded><![CDATA[<p>Saludos Dr. García,</p>
<p>Le escribo para preguntar si el todo el Lab hay que pasarlo usando el template que usted envió incluyendo los que expecifican EXCEL.</p>
<p>Gracias.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by ageigel</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-560</link>
		<dc:creator>ageigel</dc:creator>
		<pubDate>Thu, 20 Mar 2008 21:52:35 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-560</guid>
		<description>The following is a list of things I have found so far in the Scilab exercise, which can be remedied:

* There are extra ’ symbols inserted in the functions that generate syntax errors

* M-file style comments % convention give errors, I had to substitute with //
 
* All functions must end with endfunction. I had to add these to the code.

* ASCII to number function in Scilab does not exist. The only alternative I found is scanf which in c/c++ can cause lots of problems with overflows.

* Scilex gui is unstable and linking functions to TCL libraries gives errors at compilation time.



These are other errors that I have gotten so far, and I have not been able to solve. Does anybody have a recommendation? 

* On Shlens code: the repmat substitute, ones command found in http://www.scilab.org/product/dic-mat-sci/M2SCI.htm, produces errors in matrix multiplication consistency.

* In Lindsey's program the line: finaleigs = eigenvectors(:,1:dimensions); is giving errors. I looked up the function eigenvectors in scilab, and it does not appear in the site, nor in her code.</description>
		<content:encoded><![CDATA[<p>The following is a list of things I have found so far in the Scilab exercise, which can be remedied:</p>
<p>* There are extra ’ symbols inserted in the functions that generate syntax errors</p>
<p>* M-file style comments % convention give errors, I had to substitute with //</p>
<p>* All functions must end with endfunction. I had to add these to the code.</p>
<p>* ASCII to number function in Scilab does not exist. The only alternative I found is scanf which in c/c++ can cause lots of problems with overflows.</p>
<p>* Scilex gui is unstable and linking functions to TCL libraries gives errors at compilation time.</p>
<p>These are other errors that I have gotten so far, and I have not been able to solve. Does anybody have a recommendation? </p>
<p>* On Shlens code: the repmat substitute, ones command found in <a href="http://www.scilab.org/product/dic-mat-sci/M2SCI.htm" rel="nofollow">http://www.scilab.org/product/dic-mat-sci/M2SCI.htm</a>, produces errors in matrix multiplication consistency.</p>
<p>* In Lindsey&#8217;s program the line: finaleigs = eigenvectors(:,1:dimensions); is giving errors. I looked up the function eigenvectors in scilab, and it does not appear in the site, nor in her code.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-558</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Mon, 17 Mar 2008 13:30:59 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-558</guid>
		<description>Dear Students:

In Lab 1 Part 4, you need to provide a modified SciLab code of PCA. You also need to reproduce the example given in Lindsay Smith's tutorial. Use EXCEL and any SVD calculator. 

Be aware that the accepted formula for covariance uses n - 1 in the denominator, not just n. If your version of EXCEL uses n, you need to correct results by multiplying times n/(n-1).

I am putting together a tutorial on PCA to help you.

Cheers</description>
		<content:encoded><![CDATA[<p>Dear Students:</p>
<p>In Lab 1 Part 4, you need to provide a modified SciLab code of PCA. You also need to reproduce the example given in Lindsay Smith&#8217;s tutorial. Use EXCEL and any SVD calculator. </p>
<p>Be aware that the accepted formula for covariance uses n - 1 in the denominator, not just n. If your version of EXCEL uses n, you need to correct results by multiplying times n/(n-1).</p>
<p>I am putting together a tutorial on PCA to help you.</p>
<p>Cheers</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-557</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Mon, 17 Mar 2008 12:23:35 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-557</guid>
		<description>Hi, Nieves:

Thanks. 

Indeed we don't have classes that week, according to the academic calendar over at http://www.pupr.edu/academiccalendar/ac-wi05.pdf</description>
		<content:encoded><![CDATA[<p>Hi, Nieves:</p>
<p>Thanks. </p>
<p>Indeed we don&#8217;t have classes that week, according to the academic calendar over at <a href="http://www.pupr.edu/academiccalendar/ac-wi05.pdf" rel="nofollow">http://www.pupr.edu/academiccalendar/ac-wi05.pdf</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines Architecture Week 2 by panzernieves</title>
		<link>http://irthoughts.wordpress.com/2008/03/14/search-engines-architecture-week/#comment-556</link>
		<dc:creator>panzernieves</dc:creator>
		<pubDate>Mon, 17 Mar 2008 05:10:22 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=183#comment-556</guid>
		<description>Prof. :

Verifique el calendario academico y segun dice en el, este sabado no hay clases, solo para notificarlo a usted y al resto del grupo. Cheers!!!

G. Nieves</description>
		<content:encoded><![CDATA[<p>Prof. :</p>
<p>Verifique el calendario academico y segun dice en el, este sabado no hay clases, solo para notificarlo a usted y al resto del grupo. Cheers!!!</p>
<p>G. Nieves</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Search Engines for Penetration Testing Course by Nishu</title>
		<link>http://irthoughts.wordpress.com/2008/03/07/search-engines-for-penetration-testing-course/#comment-551</link>
		<dc:creator>Nishu</dc:creator>
		<pubDate>Tue, 11 Mar 2008 09:23:29 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=179#comment-551</guid>
		<description>Thanks for the information :) .. i thought the same but wasnt sure .</description>
		<content:encoded><![CDATA[<p>Thanks for the information <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> .. i thought the same but wasnt sure .</p>
]]></content:encoded>
	</item>
</channel>
</rss>
