<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>IR Thoughts</title>
	<atom:link href="http://irthoughts.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://irthoughts.wordpress.com</link>
	<description>Thoughts on Information Retrieval, Data Mining, and Search Engines</description>
	<lastBuildDate>Fri, 27 Jan 2012 01:34:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='irthoughts.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>IR Thoughts</title>
		<link>http://irthoughts.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://irthoughts.wordpress.com/osd.xml" title="IR Thoughts" />
	<atom:link rel='hub' href='http://irthoughts.wordpress.com/?pushpress=hub'/>
		<item>
		<title>When and Why not to take arithmetic averages</title>
		<link>http://irthoughts.wordpress.com/2012/01/26/when-and-why-not-to-take-arithmetic-averages/</link>
		<comments>http://irthoughts.wordpress.com/2012/01/26/when-and-why-not-to-take-arithmetic-averages/#comments</comments>
		<pubDate>Thu, 26 Jan 2012 17:52:58 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Marketing Research]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>
		<category><![CDATA[Web Mining Course]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/2012/01/26/when-and-why-not-to-take-arithmetic-averages/</guid>
		<description><![CDATA[Correlation coefficients, coefficients of variations, standard deviations, slopes, tangents, cosines, densities, temperatures, dissimilar ratios, and intensive properties in general are &#8230;<p><a href="http://irthoughts.wordpress.com/2012/01/26/when-and-why-not-to-take-arithmetic-averages/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1888&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.miislita.com/statistics/on-the-non-additivity-correlation-coefficients.pdf">Correlation coefficients</a>, coefficients of variations, standard deviations, slopes, tangents, cosines, densities, temperatures, dissimilar ratios, and intensive properties in general are not additive. Therefore, arithmetic averages cannot be computed out of any of these.</p>
<p>Still, from time to time some <a href="http://www.seomoz.org/blog/correlation-data-for-seo-and-social-media-analysis-part-1-whiteboard-friday">&#8220;experts&#8221; and pseudo &#8220;scientists&#8221;</a> do that.</p>
<p>Want to know why this is not mathematically and statistically possible? This is the subject of a paper I wrote and that is about to be published in <a href="http://www.tandf.co.uk/journals/journal.asp?issn=0361-0926&amp;linktype=145">Communications in Statistics &#8211; Theory and Methods</a> (by Taylor &amp; Francis).</p>
<p>Incidentally, I will provide a preview of the topic to the search marketing community. Thanks to my dear friend, Mike Grehan, this will be the topic I&#8217;ll be speaking about at the March, 2012 SES, NY.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/marketing-research/'>Marketing Research</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>, <a href='http://irthoughts.wordpress.com/category/web-mining-course/'>Web Mining Course</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1888/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1888/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1888/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1888&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2012/01/26/when-and-why-not-to-take-arithmetic-averages/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>l&#8217;Hopital&#8217;s Rule and the 0^0 Power Controversy</title>
		<link>http://irthoughts.wordpress.com/2012/01/23/lhopitals-rule-and-the-00-power-controversy/</link>
		<comments>http://irthoughts.wordpress.com/2012/01/23/lhopitals-rule-and-the-00-power-controversy/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 14:29:30 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1737</guid>
		<description><![CDATA[I&#8217;m currently working on some nice formulas that require l&#8217;Hopital&#8217;s Rule (sometimes written as l&#8217;Hospital (with the &#8220;s&#8221; silent. The &#8230;<p><a href="http://irthoughts.wordpress.com/2012/01/23/lhopitals-rule-and-the-00-power-controversy/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1737&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently working on some nice formulas that require l&#8217;Hopital&#8217;s Rule (sometimes written as l&#8217;Hospital (with the &#8220;s&#8221; silent. The &#8220;o&#8221; also goes with a &#8220;hat&#8221;) and came across a note from Professor Stephen A. Fulling, in which he mentions <a href="http://www.askamathematician.com/2010/12/q-what-does-00-zero-raised-to-the-zeroth-power-equal-why-do-mathematicians-and-high-school-teachers-disagree/">the never-ending zero-to-the zero power controversy</a>.</p>
<p><a href="http://www.askamathematician.com/2010/12/q-what-does-00-zero-raised-to-the-zeroth-power-equal-why-do-mathematicians-and-high-school-teachers-disagree/">Not even mathematicians agree</a> on what the result should be. It has been argued that the answer is a matter of convenience, an element controversial -if not contrary- to Mathematics. What&#8217;s your take on the issue?</p>
<p>To learn more about this rule and when it should be applied, see <a href="http://en.wikipedia.org/wiki/L'H%C3%B4pital's_rule">Wikipedia</a> or <a href="http://mathworld.wolfram.com/LHospitalsRule.html">WolframMathWorld</a>.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1737/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1737/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1737/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1737/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1737/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1737/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1737/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1737/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1737/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1737/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1737/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1737/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1737/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1737/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1737&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2012/01/23/lhopitals-rule-and-the-00-power-controversy/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>My IPAM Lost Pictures</title>
		<link>http://irthoughts.wordpress.com/2012/01/20/my-ipam-lost-pictures/</link>
		<comments>http://irthoughts.wordpress.com/2012/01/20/my-ipam-lost-pictures/#comments</comments>
		<pubDate>Fri, 20 Jan 2012 14:32:07 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Latent Semantic Indexing]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1729</guid>
		<description><![CDATA[On January 23-27, 2006 I was at the Institute for Pure and Applied Mathematics, UCLA, California attending a now infamous &#8230;<p><a href="http://irthoughts.wordpress.com/2012/01/20/my-ipam-lost-pictures/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1729&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>On January 23-27, 2006 I was at the Institute for Pure and Applied Mathematics, UCLA, California attending a now infamous Document Space Workshop. I took some pictures, but did not find these until now.</p>
<p>I&#8217;ve posted these in my facebook page, posing with back then IPAM director and with world-recognized LSI expert Dr. Michael Berry and his former students. To learn more about the workshop and the speakers, follow this link <a href="http://www.miislita.com/ipam/ipam-document-space-workshop.pdf">http://www.miislita.com/ipam/ipam-document-space-workshop.pdf</a></p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/latent-semantic-indexing/'>Latent Semantic Indexing</a>, <a href='http://irthoughts.wordpress.com/category/machine-learning/'>Machine Learning</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1729/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1729/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1729/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1729&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2012/01/20/my-ipam-lost-pictures/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Accessing Wikipedia Today</title>
		<link>http://irthoughts.wordpress.com/2012/01/18/accessing-wikipedia-today/</link>
		<comments>http://irthoughts.wordpress.com/2012/01/18/accessing-wikipedia-today/#comments</comments>
		<pubDate>Wed, 18 Jan 2012 15:31:50 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1726</guid>
		<description><![CDATA[I have no problem accessing and navigating the spanish version of wikipedia (http://es.wikipedia.org/wiki/Wikipedia:Portada ). Ha. That much for their Internet &#8230;<p><a href="http://irthoughts.wordpress.com/2012/01/18/accessing-wikipedia-today/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1726&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have no problem accessing and navigating the spanish version of wikipedia (<a href="http://es.wikipedia.org/wiki/Wikipedia:Portada">http://es.wikipedia.org/wiki/Wikipedia:Portada</a> ). Ha. That much for their Internet &#8220;block&#8221; Dark Day.</p>
<p>&nbsp;</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/miscellaneous/'>Miscellaneous</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1726/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1726/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1726/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1726&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2012/01/18/accessing-wikipedia-today/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Happy New Year!</title>
		<link>http://irthoughts.wordpress.com/2011/12/31/happy-new-year/</link>
		<comments>http://irthoughts.wordpress.com/2011/12/31/happy-new-year/#comments</comments>
		<pubDate>Sun, 01 Jan 2012 02:41:10 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/2011/12/31/happy-new-year/</guid>
		<description><![CDATA[We wish you all a great data mining 2012.     Filed under: Data Mining<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1717&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We wish you all a great data mining 2012.</p>
<p> </p>
<p> </p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1717/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1717/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1717/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1717/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1717/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1717/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1717/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1717/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1717/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1717/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1717/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1717/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1717/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1717/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1717&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/12/31/happy-new-year/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>A New Weighting Strategy</title>
		<link>http://irthoughts.wordpress.com/2011/12/27/a-new-weighting-strategy/</link>
		<comments>http://irthoughts.wordpress.com/2011/12/27/a-new-weighting-strategy/#comments</comments>
		<pubDate>Tue, 27 Dec 2011 19:08:34 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Human-Computer Interaction]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Marketing Research]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Quack Science]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>
		<category><![CDATA[Web Mining Course]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1701</guid>
		<description><![CDATA[I received this morning from the editors of Communications in Statistics: Theory and Methods confirmation that they accepted and will &#8230;<p><a href="http://irthoughts.wordpress.com/2011/12/27/a-new-weighting-strategy/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1701&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I received this morning from the editors of <a href="http://www.tandf.co.uk/journals/journal.asp?issn=0361-0926&amp;linktype=145">Communications in Statistics: Theory and Methods</a> confirmation that they accepted and will be publishing my peer reviewed paper on a new model for statistical analysis. It should be out this 2012.</p>
<p>Once published, you will understand the SEO (* SEOmoz, I should say) non-sense of computing arithmetic averages of correlation coefficients and why <span style="text-decoration:underline;">some</span> meta-analysis studies published in the past (* Hunter-Schmidt; Hedges-Olkin) are flawed and invalid.</p>
<p>It took me several meals and research hours to figure it out. I hope that IRs, dataminers, and statistics colleagues find new applications for the model.</p>
<p>The model can be applied to many fields, including marketing, business, risk analysis, data mining, signal processing, engineering, clinical trials, and almost any field or knowledge domain that involves the calculation of weighted statistics. I look forward to discuss it online once it get published.</p>
<p>Happy New Year.</p>
<p>PS. (*) I&#8217;ve edited this post to make these points obvious. So, the issue of arithmetically averaging correlations has been raised and killed for good before the scientific and statistical community.</p>
<p>PS. Just in: Last night (Jan-03-2012) I received news from one of the editors of the journal that the paper was assigned to issue 41 (8). Check for its title: <em>The Self-Weighting Model </em>(in Spanish is something like &#8220;<em>El Modelo de Autoponderacion</em>&#8220;. I forget to mention that this journal is published biweekly; so, things are moving fast. What a way of ending 2011 and starting 2012!!!</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/human-computer-interaction/'>Human-Computer Interaction</a>, <a href='http://irthoughts.wordpress.com/category/machine-learning/'>Machine Learning</a>, <a href='http://irthoughts.wordpress.com/category/marketing-research/'>Marketing Research</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>, <a href='http://irthoughts.wordpress.com/category/quack-science/'>Quack Science</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>, <a href='http://irthoughts.wordpress.com/category/web-mining-course/'>Web Mining Course</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1701/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1701/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1701/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1701/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1701/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1701/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1701/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1701/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1701/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1701/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1701/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1701/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1701/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1701/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1701&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/12/27/a-new-weighting-strategy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Merry Christmas: the google hacking way</title>
		<link>http://irthoughts.wordpress.com/2011/12/13/merry-christmas-the-google-hacking-way/</link>
		<comments>http://irthoughts.wordpress.com/2011/12/13/merry-christmas-the-google-hacking-way/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 15:13:28 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Spam]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1698</guid>
		<description><![CDATA[Yesterday we had a brainstorming session with our programmers on google hacking. It is soooooo easy to grab php codes, &#8230;<p><a href="http://irthoughts.wordpress.com/2011/12/13/merry-christmas-the-google-hacking-way/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1698&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;">Yesterday we had a brainstorming session with our programmers on google hacking. It is soooooo easy to grab php codes, passwords, databases from all over the Web, thanks to sloppy coders. For instance, do a search for</p>
<p style="text-align:left;">index.of<br />
index.of/php<br />
index.of/pswd<br />
index.of/db<br />
index.of/mda<br />
index.of/pgp</p>
<p style="text-align:left;">or check the list at http://www.thenetworkadministrator.com/googlesearches.htm These types of searches will spit out directory trees.</p>
<p style="text-align:left;">There are many &#8220;smart cookies&#8221; posting derivatives of these lists all over the Web.</p>
<p style="text-align:left;">And how about typos?</p>
<p style="text-align:left;">Try filetype command searches with extra characters in extensions like</p>
<p style="text-align:left;">0php<br />
1php<br />
phps<br />
php.</p>
<p style="text-align:left;">etc&#8230;.</p>
<p style="text-align:left;">Servers will spit out entire php codes.</p>
<p style="text-align:left;">The great offenders are large sites like those belonging to .edu, .gov, .org, not to mention large .com and .net sites.</p>
<p style="text-align:left;">Ho, Ho, Ho, Merry Christmas, Santa.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>, <a href='http://irthoughts.wordpress.com/category/spam/'>Spam</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1698/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1698/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1698/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1698&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/12/13/merry-christmas-the-google-hacking-way/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Google early years and LSI</title>
		<link>http://irthoughts.wordpress.com/2011/11/24/google-early-years-and-lsi/</link>
		<comments>http://irthoughts.wordpress.com/2011/11/24/google-early-years-and-lsi/#comments</comments>
		<pubDate>Thu, 24 Nov 2011 15:19:53 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1692</guid>
		<description><![CDATA[For years many SEOs fooled their own peers with the assertion that LSI was something new that Google implemented. Some &#8230;<p><a href="http://irthoughts.wordpress.com/2011/11/24/google-early-years-and-lsi/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1692&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>For years many SEOs fooled their own peers with the assertion that LSI was something new that Google implemented. Some even have claimed LSI was a proprietary algorithm from Google. I&#8217;ve spent sooooo many years debunking all this crap and few other urban legends from unscrupulous SEOs.</p>
<p>In this Thanksgiving Day I thank that all these myths have been debunked to no end: LSI-rank correlations, LDA-rank correlations, KD-rank correlations, additiveness of correlation coefficients, blah, blah, blah&#8230;  I thank also that along came this: <a href="http://infolab.stanford.edu/~sergey/349/">http://infolab.stanford.edu/~sergey/349/</a></p>
<p>LSI?</p>
<p>Known from the onset by Google.</p>
<p>A cost effective implementation in a large scale and dynamic environment as the Web is?</p>
<p>Nope.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1692/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1692/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1692/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1692&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/11/24/google-early-years-and-lsi/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Our Whois Miner is Getting Smarter</title>
		<link>http://irthoughts.wordpress.com/2011/11/21/our-whois-miner-is-getting-smarter/</link>
		<comments>http://irthoughts.wordpress.com/2011/11/21/our-whois-miner-is-getting-smarter/#comments</comments>
		<pubDate>Mon, 21 Nov 2011 19:26:36 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1689</guid>
		<description><![CDATA[Welcome to the new and improved interface of Minerazzi&#8217;s Whois Miner (beta). The Whois Miner is getting smarter. It now gives you &#8230;<p><a href="http://irthoughts.wordpress.com/2011/11/21/our-whois-miner-is-getting-smarter/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1689&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Welcome to the new and improved interface of Minerazzi&#8217;s Whois Miner (beta).</p>
<p>The Whois Miner is getting smarter. It now gives you the alternate name of a whois server service. It also has network mining features, enabling you to mine registrant DNS lookups, Headers, NS/MX records, and contact emails.</p>
<p>In addition, we keep expanding our index of whois servers. Follow link below.</p>
<p>Use it for free while you can.</p>
<p><a href="http://www.minerazzi.com/labs/whois.php">http://www.minerazzi.com/labs/whois.php</a></p>
<p>Play with it and send some feedback.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1689/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1689/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1689/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1689/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1689/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1689/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1689/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1689/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1689/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1689/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1689/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1689/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1689/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1689/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1689&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/11/21/our-whois-miner-is-getting-smarter/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>New Whois User Interface</title>
		<link>http://irthoughts.wordpress.com/2011/10/25/new-whois-user-interface/</link>
		<comments>http://irthoughts.wordpress.com/2011/10/25/new-whois-user-interface/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 14:19:41 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[IR Tools]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1675</guid>
		<description><![CDATA[Last night we uploaded a new user interface (UI) for the Minerazzi Multiple Whois Miner (http://www.minerazzi.com/labs/whois.php). &#160; Added support to: &#8230;<p><a href="http://irthoughts.wordpress.com/2011/10/25/new-whois-user-interface/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1675&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Last night we uploaded a new user interface (UI) for the Minerazzi Multiple Whois Miner (<a href="http://www.minerazzi.com/labs/whois.php">http://www.minerazzi.com/labs/whois.php)</a>.</p>
<p>&nbsp;</p>
<p>Added support to:</p>
<p>1. generic third-level domains (gTLDs).</p>
<p>2. country-code TLDs.</p>
<p>3. subdomain TLDs.</p>
<p>4. status persistency of form fields (without using cookies, sessions, JavaScript, but just pure PHP).</p>
<p>&nbsp;</p>
<p>As we keep improving and adding new TLDs and whois servers to its index, we expect this to become a destination for our regular users.</p>
<p>The tool was designed in such a way that even support to the upcoming dotBrand Revolution is possible.</p>
<p>&nbsp;</p>
<p>Enjoy it</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/ir-tools/'>IR Tools</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1675/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1675/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1675/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1675/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1675/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1675/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1675/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1675/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1675/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1675/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1675/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1675/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1675/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1675/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1675&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/10/25/new-whois-user-interface/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>The myth of SEO myth listers</title>
		<link>http://irthoughts.wordpress.com/2011/10/10/the-myth-of-seo-myth-listers/</link>
		<comments>http://irthoughts.wordpress.com/2011/10/10/the-myth-of-seo-myth-listers/#comments</comments>
		<pubDate>Mon, 10 Oct 2011 17:46:41 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1672</guid>
		<description><![CDATA[http://searchenginewatch.com/article/2115733/Top-10-SEO-Myths-Dispelled Apparently they don&#8217;t get it. And I thought SEO &#8220;statistical studies&#8221; was something out of this world. SEO myth &#8230;<p><a href="http://irthoughts.wordpress.com/2011/10/10/the-myth-of-seo-myth-listers/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1672&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><a href="http://searchenginewatch.com/article/2115733/Top-10-SEO-Myths-Dispelled">http://searchenginewatch.com/article/2115733/Top-10-SEO-Myths-Dispelled</a></p>
<p>Apparently they don&#8217;t get it. And I thought SEO &#8220;statistical studies&#8221; was something out of this world. SEO myth listers are worse.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1672/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1672/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1672/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1672/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1672/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1672/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1672/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1672/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1672/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1672/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1672/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1672/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1672/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1672/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1672&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/10/10/the-myth-of-seo-myth-listers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>New Interface of Minerazzi</title>
		<link>http://irthoughts.wordpress.com/2011/09/29/new-interface-of-minerazzi/</link>
		<comments>http://irthoughts.wordpress.com/2011/09/29/new-interface-of-minerazzi/#comments</comments>
		<pubDate>Thu, 29 Sep 2011 20:56:24 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1667</guid>
		<description><![CDATA[Hello, miners. The new interface of Minerazzi ( http://www.minerazzi.com ) is up and running! Have a nice mining day. Filed &#8230;<p><a href="http://irthoughts.wordpress.com/2011/09/29/new-interface-of-minerazzi/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1667&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Hello, miners. The new interface of Minerazzi ( <a href="http://www.minerazzi.com">http://www.minerazzi.com</a> ) is up and running! Have a nice mining day.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1667/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1667/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1667/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1667&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/09/29/new-interface-of-minerazzi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>The Scope Hypothesis in IR: Who is Right?</title>
		<link>http://irthoughts.wordpress.com/2011/08/13/the-scope-hypothesis-in-ir-who-is-right/</link>
		<comments>http://irthoughts.wordpress.com/2011/08/13/the-scope-hypothesis-in-ir-who-is-right/#comments</comments>
		<pubDate>Sat, 13 Aug 2011 13:22:48 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[IR Tools]]></category>
		<category><![CDATA[IR Tutorials]]></category>
		<category><![CDATA[Queries]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1661</guid>
		<description><![CDATA[In previous posts, we have presented two tutorials on Okapi BM25 and BM25F, which are based on the Verbosity and &#8230;<p><a href="http://irthoughts.wordpress.com/2011/08/13/the-scope-hypothesis-in-ir-who-is-right/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1661&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;">In previous posts, we have presented two tutorials on Okapi BM25 and BM25F, which are based on the Verbosity and Scope Hypotheses.</p>
<p style="text-align:left;"><strong>However&#8230;</strong></p>
<p style="text-align:left;">Here I would like to reference research at both sides of the Scope Hypothesis.</p>
<p style="text-align:left;">In the abstract of &#8221;Revisiting the relationship between document length and relevance&#8221; (<a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.3786&amp;rep=rep1&amp;type=pdf">http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.141.3786&amp;rep=rep1&amp;type=pdf</a>), Losada, D.E., Azzopardi, L. and Baillie, M. (2008) state:</p>
<p style="text-align:left;">&#8220;The scope hypothesis in Information Retrieval (IR) states that a relationship exists between document length and relevance, such that the likelihood of relevance increases with document length. A number of empirical studies have provided statistical evidence supporting the scope hypothesis. However, these studies make the implicit assumption that modern test collections are complete (i.e. all documents are assessed for relevance). As a consequence the observed evidence is misleading. In this paper we perform a deeper analysis of document length and relevance taking into account that test collections are incomplete. We first demonstrate that previous evidence supporting the scope hypothesis was an artefact of the test collection, where there is a bias towards longer documents in the pooling process. We evaluate whether this length bias affects system comparison when using incomplete test collections. The results indicate that test collections are problematic when considering MAP as a measure of effectiveness but are relatively robust when using bpref. The implications of the study indicate that retrieval models should not be tuned to favour longer documents, and that designers of new test collections should take measures against length bias during the pooling process in order to create more reliable and robust test collections.&#8221;</p>
<p style="text-align:left;"><strong>Really&#8230;.?</strong></p>
<p style="text-align:left;">However in the abstract of &#8220;Enhancing ad-hoc relevance weighting using probability density estimation&#8221; (<a href="http://www.sigir2011.org/papershow.asp?PID=104">http://www.sigir2011.org/papershow.asp?PID=104</a>), Zhou, Huang, and He (2011) state:</p>
<p style="text-align:left;">&#8220;Classical probabilistic information retrieval (IR) models, e.g. BM25, deal with document length based on a trade-off between the Verbosity hypothesis, which assumes the independence of a document&#8217;s relevance of its length, and the Scope hypothesis, which assumes the opposite. Despite the effectiveness of the classical probabilistic models, the potential relationship between document length and relevance is not fully explored to improve retrieval performance. In this paper, we conduct an in-depth study of this relationship based on the Scope hypothesis that document length does have its impact on relevance. We study a list of probability density functions and examine which of the density functions fits the best to the actual distribution of the document length. Based on the studied probability density functions, we propose a length-based BM25 relevance weighting model, called BM25L, which incorporates document length as a substantial weighting factor. Extensive experiments conducted on standard TREC collections show that our proposed BM25L markedly outperforms the original BM25 model, even if the latter is optimized.&#8221;</p>
<p style="text-align:left;"><strong>My take&#8230;</strong></p>
<p style="text-align:left;">I haven&#8217;t reviewed BM25L vs. BM25F, yet. Still the question on the Scope Hypothesis is intriguing. For what I can tell (and this is my sole opinion), if an author writes more about a topic or several topics in a given document, more likely he will be using more instances of index terms. A cluster of the top index term density values (IDs) spreaded over said document should give some insight about its scope. We have developed a tool that computes these clusters. We are testing now whether that would translate into an improved relevance.</p>
<p style="text-align:left;">Assuming that Web IR systems out there (e.g,, search engines) use these algorithms or derivatives of these: What would be the implications for content writers trying to understand algos based on the Verbosity and Scope Hypotheses? Hello, copywriters, SEOs, etc. This puppy is nice to watch.</p>
<p style="text-align:left;">
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/ir-tools/'>IR Tools</a>, <a href='http://irthoughts.wordpress.com/category/ir-tutorials/'>IR Tutorials</a>, <a href='http://irthoughts.wordpress.com/category/queries/'>Queries</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1661/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1661/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1661/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1661/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1661/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1661/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1661/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1661/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1661/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1661/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1661/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1661/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1661/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1661/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1661&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/08/13/the-scope-hypothesis-in-ir-who-is-right/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>BM25 and BM25F: Implications to SEO and Web Design</title>
		<link>http://irthoughts.wordpress.com/2011/08/04/bm25-and-bm25f-implications-to-seo-and-web-design/</link>
		<comments>http://irthoughts.wordpress.com/2011/08/04/bm25-and-bm25f-implications-to-seo-and-web-design/#comments</comments>
		<pubDate>Thu, 04 Aug 2011 14:15:01 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[IR Tutorials]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1655</guid>
		<description><![CDATA[Yesterday we published two great tutorials on the BM25 and BM25F algorithms. The &#8220;take away home&#8221; from the theory behind these &#8230;<p><a href="http://irthoughts.wordpress.com/2011/08/04/bm25-and-bm25f-implications-to-seo-and-web-design/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1655&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Yesterday we published two great tutorials on the BM25 and BM25F algorithms.</p>
<p>The &#8220;take away home&#8221; from the theory behind these algorithms:</p>
<p>1. A term (e.g., a keyword) has more information gain when it occurs for the very first time.</p>
<p>2. More likely, a term weights more in a title field than in other fields.</p>
<p>3. The weight of a term and its ocurrence frequency are not linearly related.</p>
<p>4. A linear combination of field scores that destroys term dependencies is contraindicated (See BM25F).</p>
<p>Most SEOs know well about 1 and 2.</p>
<p>As a term has more information gain during its first occurrences, a document about specific terms should mention these at the beginning, particularly in the title tag. For testing purposes and since end user assume that a large headline is the actual title of a document (which is not)  we like to repeat the title tag content in an h1 header that is placed prominently at the beginning of the copy. Keywords from the title are then repeated early in the document body. In this way, one can write for both end users and search engines. If a search engine uses some form of the above algorithms (which we don&#8217;t know if they do), that base is covered, too. You don&#8217;t have to adopt this strategy, unless you want. It is just our way of conducting tests, but is a flexible approach.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/ir-tutorials/'>IR Tutorials</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1655/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1655/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1655/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1655/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1655/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1655/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1655/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1655/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1655/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1655/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1655/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1655/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1655/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1655/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1655&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/08/04/bm25-and-bm25f-implications-to-seo-and-web-design/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>New Tutorials: Okapi BM25F and BM25</title>
		<link>http://irthoughts.wordpress.com/2011/08/03/new-tutorials-okapi-bm25f-and-bm25/</link>
		<comments>http://irthoughts.wordpress.com/2011/08/03/new-tutorials-okapi-bm25f-and-bm25/#comments</comments>
		<pubDate>Wed, 03 Aug 2011 14:21:18 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[IR Tutorials]]></category>
		<category><![CDATA[Queries]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1652</guid>
		<description><![CDATA[We have a new tutorial on Okapi Simple BM25 with Extension to Multiple Fields. http://www.miislita.com/information-retrieval-tutorial/okapi-simple-bm25f-tutorial.pdf Unlike the BM25, this model &#8230;<p><a href="http://irthoughts.wordpress.com/2011/08/03/new-tutorials-okapi-bm25f-and-bm25/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1652&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We have a new tutorial on Okapi Simple BM25 with Extension to Multiple Fields.</p>
<p><a href="http://www.miislita.com/information-retrieval-tutorial/okapi-simple-bm25f-tutorial.pdf">http://www.miislita.com/information-retrieval-tutorial/okapi-simple-bm25f-tutorial.pdf</a></p>
<p>Unlike the BM25, this model (known as Simple BM25F) incorporates the structure of documents into the scoring process.</p>
<p>&nbsp;</p>
<p>In addition, we&#8217;ve uploaded a new, improved, and expanded version of the Okapi Best Match 25 tutorial.</p>
<p><a href="http://www.miislita.com/information-retrieval-tutorial/okapi-bm25-tutorial.pdf">http://www.miislita.com/information-retrieval-tutorial/okapi-bm25-tutorial.pdf</a></p>
<p>&nbsp;</p>
<p>Have a great IR day!</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/ir-tutorials/'>IR Tutorials</a>, <a href='http://irthoughts.wordpress.com/category/queries/'>Queries</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1652/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1652/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1652/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1652&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/08/03/new-tutorials-okapi-bm25f-and-bm25/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Minerazzi Crawler and Whois Updates: Email Addresses, Reverse DNS, IPv4 Mapping, Navigation</title>
		<link>http://irthoughts.wordpress.com/2011/07/11/minerazzi-crawler-and-whois-updates-email-addresses-reverse-dns-ipv4-mapping-navigation/</link>
		<comments>http://irthoughts.wordpress.com/2011/07/11/minerazzi-crawler-and-whois-updates-email-addresses-reverse-dns-ipv4-mapping-navigation/#comments</comments>
		<pubDate>Mon, 11 Jul 2011 13:11:02 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Homeland Security]]></category>
		<category><![CDATA[IR Quizzes]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1648</guid>
		<description><![CDATA[We keep improving the Minerazzi site (http://www.minerazzi.com). We moved all pages to a php format. In addition, here are recent &#8230;<p><a href="http://irthoughts.wordpress.com/2011/07/11/minerazzi-crawler-and-whois-updates-email-addresses-reverse-dns-ipv4-mapping-navigation/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1648&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;">We keep improving the Minerazzi site (<a href="http://www.minerazzi.com">http://www.minerazzi.com</a>). We moved all pages to a php format. In addition, here are recent changelogs for the Web Crawler (<a href="http://www.minerazzi.com/labs/crawlinker.php">http://www.minerazzi.com/labs/crawlinker.php</a>):</p>
<p style="text-align:left;">07-05-11: Email address extraction, deduplication, and sorting capabilities added.<br />
07-04-11: Design and copy changes.<br />
07-03-11: Navigation menu restored and bug fixed.<br />
07-03-11: Navigation menu removed to test bug.<br />
07-02-11: Top-bottom quick navigation menu added.<br />
07-02-11: Day/Time Stamp, Reverse DNS, and IPv4 List capabilities added.<br />
07-02-11: Integration to Whois Tool.</p>
<p style="text-align:left;">The Whois Database Retriever (<a href="http://www.minerazzi.com/labs/whois.php">http://www.minerazzi.com/labs/whois.php</a>) now features suffix/prefix stripping capabilities. This means that users only need to enter a candidate domain name without any alias or extension and the tool scans multiple registrar databases. We expect to add some additional features to this time-saving application.</p>
<p style="text-align:left;">In the meantime, we keep beta testing the engine. Our staff of &#8216;miners&#8217; are doing just a great job.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/homeland-security/'>Homeland Security</a>, <a href='http://irthoughts.wordpress.com/category/ir-quizzes/'>IR Quizzes</a>, <a href='http://irthoughts.wordpress.com/category/machine-learning/'>Machine Learning</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>, <a href='http://irthoughts.wordpress.com/category/software/'>Software</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1648/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1648/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1648/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1648/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1648/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1648/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1648/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1648/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1648/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1648/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1648/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1648/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1648/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1648/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1648&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/07/11/minerazzi-crawler-and-whois-updates-email-addresses-reverse-dns-ipv4-mapping-navigation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>A Tutorial on Okapi BM25</title>
		<link>http://irthoughts.wordpress.com/2011/07/01/a-tutorial-on-okapi-bm25/</link>
		<comments>http://irthoughts.wordpress.com/2011/07/01/a-tutorial-on-okapi-bm25/#comments</comments>
		<pubDate>Fri, 01 Jul 2011 12:41:12 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[IR Tutorials]]></category>
		<category><![CDATA[Queries]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1646</guid>
		<description><![CDATA[We have uploaded a new tutorial: Okapi BM25. See http://www.miislita.com/information-retrieval-tutorial/okapi-bm25-tutorial.pdf This is a tutorial on the classic Okapi Best Match &#8230;<p><a href="http://irthoughts.wordpress.com/2011/07/01/a-tutorial-on-okapi-bm25/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1646&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We have uploaded a new tutorial: Okapi BM25. See <a href="http://www.miislita.com/information-retrieval-tutorial/okapi-bm25-tutorial.pdf">http://www.miislita.com/information-retrieval-tutorial/okapi-bm25-tutorial.pdf</a></p>
<p>This is a tutorial on the classic Okapi Best Match 25.</p>
<p>Enjoy it.</p>
<p>&nbsp;</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/ir-tutorials/'>IR Tutorials</a>, <a href='http://irthoughts.wordpress.com/category/queries/'>Queries</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1646/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1646/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1646/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1646/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1646/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1646/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1646/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1646/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1646/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1646/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1646/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1646/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1646/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1646/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1646&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/07/01/a-tutorial-on-okapi-bm25/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Minerazzi Web Crawler: Now Individually Crawling Form Fields</title>
		<link>http://irthoughts.wordpress.com/2011/06/27/minerazzi-web-crawler-now-individually-crawling-form-fields/</link>
		<comments>http://irthoughts.wordpress.com/2011/06/27/minerazzi-web-crawler-now-individually-crawling-form-fields/#comments</comments>
		<pubDate>Mon, 27 Jun 2011 13:46:29 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Queries]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1643</guid>
		<description><![CDATA[These are additional updates made over the weekend to our crawler at http://www.minerazzi.com/labs/crawlinker.php Changes made include the ability to detect information &#8230;<p><a href="http://irthoughts.wordpress.com/2011/06/27/minerazzi-web-crawler-now-individually-crawling-form-fields/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1643&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>These are additional updates made over the weekend to our crawler at <a href="http://www.minerazzi.com/labs/crawlinker.php">http://www.minerazzi.com/labs/crawlinker.php</a></p>
<p>Changes made include the ability to detect information contained inside selection menus, textareas, and input fields. These changes simplify the examination of name/value pairs used by forms. Very useful when a document contains multiple forms, when query mechanisms of target databases must be identified, or when one need to assess whether a database is susceptible to query injections or script infections. In the latter, a security component is more than obvious.</p>
<p>We also moved both the Markup and Robots Text File Reports to the Source Code section. This is a new section that is now listed as the last reports of the application.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>, <a href='http://irthoughts.wordpress.com/category/queries/'>Queries</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1643/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1643/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1643/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1643&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/06/27/minerazzi-web-crawler-now-individually-crawling-form-fields/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Minerazzi Web Crawler: Spidering Forms</title>
		<link>http://irthoughts.wordpress.com/2011/06/23/minerazzi-web-crawler-spidering-forms/</link>
		<comments>http://irthoughts.wordpress.com/2011/06/23/minerazzi-web-crawler-spidering-forms/#comments</comments>
		<pubDate>Thu, 23 Jun 2011 12:51:06 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1640</guid>
		<description><![CDATA[This is an update on our Minerazzi Web Crawler. (http://www.minerazzi.com/labs/crawlinker.php) Our crawler is now detecting form tags. Changelogs are given &#8230;<p><a href="http://irthoughts.wordpress.com/2011/06/23/minerazzi-web-crawler-spidering-forms/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1640&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is an update on our Minerazzi Web Crawler. (<a href="http://www.minerazzi.com/labs/crawlinker.php">http://www.minerazzi.com/labs/crawlinker.php</a>)</p>
<p>Our crawler is now detecting form tags. Changelogs are given below:</p>
<p>06-23-11: Form tags detection capabilities added.</p>
<p>05-08-11: New user interface.</p>
<p>05-06-11: Log file access.</p>
<p>05-01-11: Copy changes.</p>
<p>04-29-11: Script tags detection capabilities added.</p>
<p>04-22-11: DocType, Base, and Link tags detection capabilities added. HTML parsing changes.</p>
<p>04-19-11: Robots Text File detection capabilities added. Meta data parsing changes.</p>
<p>04-18-11: Title and Meta Tags detection capabilities added. Layout changes.</p>
<p>04-16-11: User Environment detection capabilities added.</p>
<p>04-14-11: Timer capabilities added.</p>
<p>04-10-11: Deduplication capabilities added.</p>
<p>04-09-11: Color palette reporting capabilities added.</p>
<p>04-05-11: DNS and MX reporting capabilities added.</p>
<p>04-03-11: Source code reporting capabilities added.</p>
<p>03-30-11: Relative URL resolving capabilities added.</p>
<p>03-28-11: Hypertext wrapping, ip, and headers reporting capabilities added.</p>
<p>&nbsp;</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>, <a href='http://irthoughts.wordpress.com/category/software/'>Software</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1640/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1640/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1640/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1640/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1640/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1640/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1640/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1640/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1640/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1640/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1640/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1640/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1640/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1640/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1640&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/06/23/minerazzi-web-crawler-spidering-forms/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>IFLA World Congress</title>
		<link>http://irthoughts.wordpress.com/2011/06/20/ifla-world-congress/</link>
		<comments>http://irthoughts.wordpress.com/2011/06/20/ifla-world-congress/#comments</comments>
		<pubDate>Mon, 20 Jun 2011 14:21:58 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Conferences]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1638</guid>
		<description><![CDATA[The International Federation of Library Associations and Institutions (IFLA) is the leading international body representing the interests of library and &#8230;<p><a href="http://irthoughts.wordpress.com/2011/06/20/ifla-world-congress/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1638&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The International Federation of Library Associations and Institutions (IFLA) is the leading international body representing the interests of library and information services and their users.</p>
<p>They are having their world congress here in Centro de Convenciones de Puerto Rico from 13-18 of August. See <a href="http://www.ifla.org/">http://www.ifla.org/</a></p>
<p>See also conference program at <a href="http://www.ifla.org/en/news/ifla-wlic-2011-final-programme">http://www.ifla.org/en/news/ifla-wlic-2011-final-programme</a></p>
<p>We will see you there.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/conferences/'>Conferences</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1638/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1638/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1638/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1638/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1638/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1638/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1638/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1638/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1638/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1638/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1638/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1638/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1638/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1638/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1638&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/06/20/ifla-world-congress/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Federated Searches and The Search Paradox</title>
		<link>http://irthoughts.wordpress.com/2011/06/14/federated-searches-and-the-search-paradox/</link>
		<comments>http://irthoughts.wordpress.com/2011/06/14/federated-searches-and-the-search-paradox/#comments</comments>
		<pubDate>Tue, 14 Jun 2011 19:36:15 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Queries]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1633</guid>
		<description><![CDATA[The problem with federated searches (aka, parallel, broadcast, or meta searches) is that it is implemented under the assumption that &#8230;<p><a href="http://irthoughts.wordpress.com/2011/06/14/federated-searches-and-the-search-paradox/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1633&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The problem with federated searches (aka, parallel, broadcast, or meta searches) is that it is implemented under the assumption that the more it is  searched at one time, the better for the user. This is not necessarily the case. Easy search results does not equate to right results. Submitting a query on a given topic to dissimilar databases with dissimilar content or about dissimilar topics often produces off-topic or irrelevant results. Indeed, easy searching not necessarily translates into a relevance experience.</p>
<p>And there is also the question of how to rank the results. Two strategies are frequently used: (a) data appending and (b) data fusion.</p>
<p>Some of the early forms of federated search engines for the Web used to do (a) and were soon called meta search engines. These search tools simply returned a long list of results by appending the top N ranked results from each databases in a tandem fashion. Obviously this strategy failed to recognized the top M relevant results from this huge list and soon was phased out in favor of (b).</p>
<p>In (b), arithmetic or weighted averages from the top N results from the different databases are computed. The problem with this approach is that is very subjective. In order to compute an arithmetic or weighted relevance score, who decides how much weight should be assigned to a given ranked result from a given database? No matter which weighting criteria are used, at the end it is still a subjective score and one that not necessarily improves the end-user search experience. Just the opposite.</p>
<p>This leads to what I call &#8220;The Search Paradox&#8221;: Information gateways as information roadblocks.</p>
<p>To learn more about this, visit this old link: <a href="http://www.accessmylibrary.com/article-1G1-182034526/federated-search-101-alexis.html">http://www.accessmylibrary.com/article-1G1-182034526/federated-search-101-alexis.html</a></p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/queries/'>Queries</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1633/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1633/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1633/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1633/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1633/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1633/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1633/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1633/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1633/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1633/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1633/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1633/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1633/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1633/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1633&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/06/14/federated-searches-and-the-search-paradox/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Intute: Another Scholarly Search Project Closing Operations</title>
		<link>http://irthoughts.wordpress.com/2011/05/31/intute-another-scholarly-search-project-closing-operations/</link>
		<comments>http://irthoughts.wordpress.com/2011/05/31/intute-another-scholarly-search-project-closing-operations/#comments</comments>
		<pubDate>Tue, 31 May 2011 18:25:04 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1626</guid>
		<description><![CDATA[The fate of any good technology idea disconnected from a business/revenue model is not promisory. This is true for commercial and academic projects &#8230;<p><a href="http://irthoughts.wordpress.com/2011/05/31/intute-another-scholarly-search-project-closing-operations/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1626&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The fate of any good technology idea disconnected from a business/revenue model is not promisory. This is true for commercial and academic projects or for any project for that matter. Soon or later, even grant-funded projects will have their reality-check-day. Consider the case of Intute, which will be closing by July 2011; i.e., in about a month.</p>
<p>According to their FAQs page at <a href="http://www.intute.ac.uk/faq.html">http://www.intute.ac.uk/faq.html</a>,</p>
<blockquote><p>Why have JISC made this decision?</p>
<p>As stated in the JISC statement about the Intute review, when services &#8220;reach the end of their existing funding cycle it is always intended, wherever possible, that they move from being fully funded to being part-funded or fully sustained by other sources&#8221;. Unfortunately in the current economic climate no realistic alternative funding model for Intute as it currently stands has been identified.</p>
<p>However, we are working to ensure that the legacy of Intute lives on, and we are working with other organisations in the sector to find a new home for Intute content.</p>
<p>Why can&#8217;t Intute continue without JISC funding?</p>
<p>Over the last three years, we have investigated alternative funding models for Intute, including alternative grant funding, subscription and advertising/sponsorship, and we have spoken to librarians, academics and students to find out what they think. Unfortunately, we have been unable to find a model that will sustain Intute in its current form into the future.</p></blockquote>
<p>and</p>
<blockquote><p>Can you open up Intute for community updating and contributions? This model may be a better fit now with the rise of social /community web 2.0 ways of working.</p>
<p>We have looked at the possibility of facilitating a community generated resource catalogue, and investigated exporting all of our resources to Delicious. However, in December 2010 reports circulated that Yahoo will be shutting down Delicious. With Intute funding ending in July 2011, the uncertainty surrounding Delicious means that further investigations are unlikely.</p>
<p>Is there any way to save Intute? What about an internet fundraising drive or trying to raise funds from institutions, foundations or advertising?</p>
<p>In principle &#8211; maybe, but in practice we have investigated alternative funding models for Intute, including alternative grant funding, subscription and advertising/sponsorship, and we have been unable to find a model that will sustain Intute in its current form into the future. Intute as it stands costs over 1 million a year to run excluding the contributions associated with housing staff at our different partner institutions.</p></blockquote>
<p> Intute was created by a consortium of seven universities, working together with a whole host of partners.</p>
<p>The Intute consortium was:</p>
<ul>
<li>University of Birmingham</li>
<li>University of Bristol</li>
<li>Heriot-Watt University</li>
<li>The University of Manchester</li>
<li>Manchester Metropolitan University</li>
<li>University of Nottingham</li>
<li>University of Oxford</li>
</ul>
<p>Amazing that with so much human resources talent their fate is as described above.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/miscellaneous/'>Miscellaneous</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1626/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1626/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1626/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1626&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/05/31/intute-another-scholarly-search-project-closing-operations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Building a Better Whois Tool</title>
		<link>http://irthoughts.wordpress.com/2011/05/10/building-a-better-whois-tool/</link>
		<comments>http://irthoughts.wordpress.com/2011/05/10/building-a-better-whois-tool/#comments</comments>
		<pubDate>Tue, 10 May 2011 17:18:24 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1623</guid>
		<description><![CDATA[At minerazzi, we have built a better version of our whois tool. Unlike similar tools, ours can access multiple registrar databases. Feel &#8230;<p><a href="http://irthoughts.wordpress.com/2011/05/10/building-a-better-whois-tool/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1623&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>At minerazzi, we have built a better version of our whois tool.</p>
<p>Unlike similar tools, ours can access multiple registrar databases.</p>
<p>Feel free to play with it at</p>
<p><a href="http://www.minerazzi.com/labs/whois.php">http://www.minerazzi.com/labs/whois.php</a></p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/software/'>Software</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1623/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1623/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1623/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1623&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/05/10/building-a-better-whois-tool/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>The minerazzi crawler: now crawling script tags and few more tags</title>
		<link>http://irthoughts.wordpress.com/2011/04/29/the-minerazzi-crawler-now-crawling-script-tags-and-few-more-tags/</link>
		<comments>http://irthoughts.wordpress.com/2011/04/29/the-minerazzi-crawler-now-crawling-script-tags-and-few-more-tags/#comments</comments>
		<pubDate>Fri, 29 Apr 2011 20:19:14 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[IR Tools]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1619</guid>
		<description><![CDATA[We are pleased to enable new crawling capabilities for the minerazzi web crawler: 04-29-11: Script tags detection capabilities added. 04-22-11: &#8230;<p><a href="http://irthoughts.wordpress.com/2011/04/29/the-minerazzi-crawler-now-crawling-script-tags-and-few-more-tags/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1619&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>We are pleased to enable new crawling capabilities for the minerazzi web crawler:</p>
<p>04-29-11: Script tags detection capabilities added.<br />
04-22-11: DocType, Base, and Link tags detection capabilities added.</p>
<p>If you are a Web developer, these new features of our crawler can help you to mine programming &#8220;gems&#8221; by examining, isolating, and collecting script lines that have been embedded in the source code of documents. And by crawling or accessing URLs of external files or link tags, programmers can view and dissect hidden scripts.</p>
<p>To learn more about these new features or previously added features, visit</p>
<p><a href="http://www.minerazzi.com/labs/crawlinker.php">http://www.minerazzi.com/labs/crawlinker.php</a></p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/ir-tools/'>IR Tools</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>, <a href='http://irthoughts.wordpress.com/category/software/'>Software</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1619/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1619/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1619/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1619/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1619/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1619/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1619/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1619/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1619/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1619/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1619/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1619/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1619/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1619/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1619&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/04/29/the-minerazzi-crawler-now-crawling-script-tags-and-few-more-tags/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>New additions to the minerazzi web crawler</title>
		<link>http://irthoughts.wordpress.com/2011/04/18/new-additions-to-the-minerazzi-web-crawler/</link>
		<comments>http://irthoughts.wordpress.com/2011/04/18/new-additions-to-the-minerazzi-web-crawler/#comments</comments>
		<pubDate>Mon, 18 Apr 2011 20:09:33 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[IR Tools]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1604</guid>
		<description><![CDATA[Back in 03-25-11 we released the minerazzi web crawler and link checker tool. As a beta, it is not perfect &#8230;<p><a href="http://irthoughts.wordpress.com/2011/04/18/new-additions-to-the-minerazzi-web-crawler/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1604&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Back in 03-25-11 we released the <a href="http://www.minerazzi.com/labs/crawlinker.php">minerazzi web crawler and link checker tool</a>. As a beta, it is not perfect and needs improvements. Actually, this is the online version of the crawler used by the minerazzi search architecture (beta). Hopefully, it will evolved into a diagnostic tool.</p>
<p>We mentioned that the online version will undergo several changes, all intended to provide an online &#8221;web crawler for the masses&#8221;. The idea is to put users in control of the crawling process since current crawlers lack of human intuition with regard to the next URL to crawl from a to-do list.</p>
<p>We are pleased to announce the following changes.</p>
<p><strong>Changelogs</strong><br />
04-19-11: Robots Text File detection capabilities added. Meta data parsing changes. (*)<br />
04-18-11: Title and Meta Tags detection capabilities added. Layout changes.<br />
04-16-11: User Environment detection capabilities added.<br />
04-14-11: Timer capabilities added.<br />
04-10-11: Deduplication capabilities added.<br />
04-09-11: Color palette reporting capabilities added.<br />
04-05-11: DNS and MX reporting capabilities added.<br />
04-03-11: Source code reporting capabilities added.<br />
03-30-11: Relative URL resolving capabilities added.<br />
03-28-11: Hypertext wrapping, ip, and headers reporting capabilities added.</p>
<p>(*) Just added.</p>
<p>There is something for everyone here.</p>
<p>Web Designers: Want to use a color palette from another site or tweak yours? Easy. Launch a crawl to a css file already discovered by the crawler.</p>
<p>Web Developers: Want to see diamonds? View the source of any file, including PDFs.</p>
<p>Data miners:  Need to mine links? Crawl a document. Better: launch a crawl to a site map file already discovered by the crawler.</p>
<p>Researchers: Want to check system configurations? Check IPs, DNS, MX and header traces (including data from cookies/sessions, etc)</p>
<p>More updates are coming soon!</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/ir-tools/'>IR Tools</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>, <a href='http://irthoughts.wordpress.com/category/software/'>Software</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1604/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1604/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1604/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1604/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1604/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1604/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1604/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1604/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1604/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1604/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1604/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1604/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1604/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1604/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1604&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/04/18/new-additions-to-the-minerazzi-web-crawler/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>An online Crawler for the masses</title>
		<link>http://irthoughts.wordpress.com/2011/04/11/an-online-crawler-for-the-masses/</link>
		<comments>http://irthoughts.wordpress.com/2011/04/11/an-online-crawler-for-the-masses/#comments</comments>
		<pubDate>Mon, 11 Apr 2011 11:48:43 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Human-Computer Interaction]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1601</guid>
		<description><![CDATA[Since at this time we haven&#8217;t launch an official blog, this post goes&#8230; We are excited to announce several updates &#8230;<p><a href="http://irthoughts.wordpress.com/2011/04/11/an-online-crawler-for-the-masses/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1601&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Since at this time we haven&#8217;t launch an official blog, this post goes&#8230;</p>
<p>We are excited to announce several updates to the <a href="http://www.minerazzi.com/labs/crawlinker.php">minerazzi crawler</a>. This is the online version of the indexing crawler used by the minerazzi search engine (beta).</p>
<p>The long-term goal is to turn this version into a multifunctional mining platform and a crawler for the IT masses; i.e., a crawler to be used by IR researchers, data miners, webmasters, developers, etc. That is, a crawler that even Web designers and the average public can use.</p>
<p>You&#8217;re welcome to give it a try. Keep in mind the tool is still in beta. While you are there, feel also free to test the <a href="http://www.minerazzi.com/labs/whois.php">multiple whois domain name tool</a>.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/human-computer-interaction/'>Human-Computer Interaction</a>, <a href='http://irthoughts.wordpress.com/category/machine-learning/'>Machine Learning</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>, <a href='http://irthoughts.wordpress.com/category/software/'>Software</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1601/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1601/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1601/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1601/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1601/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1601/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1601/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1601/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1601/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1601/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1601/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1601/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1601/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1601/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1601&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/04/11/an-online-crawler-for-the-masses/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>IRW-2011-04-03: The minerazzi project</title>
		<link>http://irthoughts.wordpress.com/2011/04/05/irw-2011-04-03-the-minerazzi-project/</link>
		<comments>http://irthoughts.wordpress.com/2011/04/05/irw-2011-04-03-the-minerazzi-project/#comments</comments>
		<pubDate>Tue, 05 Apr 2011 18:15:21 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Newsletters]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1594</guid>
		<description><![CDATA[The current issue of IRW is out and should arrive to subscribers inboxes today. In this issue of IRW, we &#8230;<p><a href="http://irthoughts.wordpress.com/2011/04/05/irw-2011-04-03-the-minerazzi-project/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1594&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" src="http://www.miislita.com/irw/minerazzi-project.png" alt="minerazzi" /></p>
<p style="text-align:left;">The current issue of IRW is out and should arrive to subscribers inboxes today.</p>
<p style="text-align:left;">In this issue of IRW, we introduce the minerazzi project and two useful tools available at <a href="http://www.minerazzi.com">minerazzi.com</a></p>
<p style="text-align:left;">Web Crawler and Link Checker Tool (<a href="http://www.minerazzi.com/labs/crawlinker.php">http://www.minerazzi.com/labs/crawlinker.php</a>).</p>
<p style="text-align:left;">Multiple Whois Domain Name Tool (<a href="http://www.minerazzi.com/labs/whois.php">http://www.minerazzi.com/labs/whois.php</a>).</p>
<p style="text-align:left;">This is a research project conducted in association with several scholars and the private sector.</p>
<p style="text-align:left;">Enjoy it.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1594/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1594/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1594/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1594/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1594/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1594/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1594/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1594/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1594/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1594/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1594/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1594/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1594/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1594/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1594&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/04/05/irw-2011-04-03-the-minerazzi-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>

		<media:content url="http://www.miislita.com/irw/minerazzi-project.png" medium="image">
			<media:title type="html">minerazzi</media:title>
		</media:content>
	</item>
		<item>
		<title>On Telneting and other nifty protocols</title>
		<link>http://irthoughts.wordpress.com/2011/02/28/on-telneting-and-other-nifty-protocols/</link>
		<comments>http://irthoughts.wordpress.com/2011/02/28/on-telneting-and-other-nifty-protocols/#comments</comments>
		<pubDate>Mon, 28 Feb 2011 13:55:41 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Internet Engineering]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1590</guid>
		<description><![CDATA[I&#8217;ve installed a new server and few services using Windows Vista. These do not come pre-activated and must be installed. &#8230;<p><a href="http://irthoughts.wordpress.com/2011/02/28/on-telneting-and-other-nifty-protocols/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1590&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;">I&#8217;ve installed a new server and few services using Windows Vista. These do not come pre-activated and must be installed. This morning I feel like sharing so this post goes.</p>
<p style="text-align:left;"> </p>
<p style="text-align:left;">In Windows Vista, you need to install the Telnet Client:</p>
<p style="text-align:left;">1. Navigate to Start &gt; Control Panel &gt; Programs &gt; Programs and Features &gt; Turn Windows features on or off.  If you are prompted for an administrator password or confirmation, type the password or provide confirmation.</p>
<p style="text-align:left;">2. In the Windows Features dialog box, select the Telnet Client check box.</p>
<p style="text-align:left;">3. Click OK. The installation might take several minutes.</p>
<p style="text-align:left;"> </p>
<p style="text-align:left;">Other nifty installs available are</p>
<p style="text-align:left;">RIP Listener</p>
<p style="text-align:left;">SNMP</p>
<p style="text-align:left;">Simple TCPIP services</p>
<p style="text-align:left;">Telnet Server</p>
<p style="text-align:left;">TFTP Client</p>
<p style="text-align:left;"> </p>
<p style="text-align:left;"> </p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/internet-engineering/'>Internet Engineering</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1590/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1590/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1590/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1590&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/02/28/on-telneting-and-other-nifty-protocols/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>IRW 2011-4-2: n-Grams and Association Measures</title>
		<link>http://irthoughts.wordpress.com/2011/02/18/irw-2011-4-2-n-grams-and-association-measures/</link>
		<comments>http://irthoughts.wordpress.com/2011/02/18/irw-2011-4-2-n-grams-and-association-measures/#comments</comments>
		<pubDate>Fri, 18 Feb 2011 13:37:02 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Newsletters]]></category>
		<category><![CDATA[Web Mining Course]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1587</guid>
		<description><![CDATA[  The current issue of IRW should reach subscribers inboxes during the day. This is Part Two of the series &#8230;<p><a href="http://irthoughts.wordpress.com/2011/02/18/irw-2011-4-2-n-grams-and-association-measures/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1587&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" src="http://www.miislita.com/irw/n-grams-and-association-measures.png" alt="n-grams-and-association-measures" /></p>
<p style="text-align:left;"> </p>
<p style="text-align:left;">The current issue of IRW should reach subscribers inboxes during the day.</p>
<p style="text-align:left;">This is Part Two of the series on statistical analysis of n-grams. This is a text mining analysis technique widely used in information retrieval and data mining in general. In this issue we cover the implementation of association measures derived from contingency tables.</p>
<p style="text-align:left;">The QA section explains how to conduct a Chi Square Test for tables with many items; i.e., beyond the usual 2 x 2 contingency tables.</p>
<p style="text-align:left;">Enjoy it.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>, <a href='http://irthoughts.wordpress.com/category/web-mining-course/'>Web Mining Course</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1587/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1587/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1587/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1587&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/02/18/irw-2011-4-2-n-grams-and-association-measures/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>

		<media:content url="http://www.miislita.com/irw/n-grams-and-association-measures.png" medium="image">
			<media:title type="html">n-grams-and-association-measures</media:title>
		</media:content>
	</item>
		<item>
		<title>Are we near the end of hardcopy scholarly journals?</title>
		<link>http://irthoughts.wordpress.com/2011/02/14/are-we-near-the-end-of-hardcopy-scholarly-journals/</link>
		<comments>http://irthoughts.wordpress.com/2011/02/14/are-we-near-the-end-of-hardcopy-scholarly-journals/#comments</comments>
		<pubDate>Mon, 14 Feb 2011 15:46:08 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Marketing Research]]></category>
		<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1578</guid>
		<description><![CDATA[According to Lang (2010), we could ask the question whether hardcopy scholarly journals are near the end.  I know, I &#8230;<p><a href="http://irthoughts.wordpress.com/2011/02/14/are-we-near-the-end-of-hardcopy-scholarly-journals/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1578&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;">According to Lang (2010), we could ask the question whether hardcopy scholarly journals are near the end.  I know, I know. This is kind of an elephant in the middle of a room. </p>
<p style="text-align:left;">Lang <a></a><a></a><a></a><a></a><a></a><a></a>raises the question based on the following bullet points:</p>
<p style="text-align:left;">1. Forty-page Articles Are Dead.<br />
2. Survey Articles Are Dead.<br />
3. Journal Issues Are Dead.<br />
4. Page Numbers Are Dead.<br />
5. Copy Editing Is Dead.<br />
6. Peer Reviewing Might Be Dying Too.<br />
7. The Article as a Unit of Publication Is Dead.</p>
<p style="text-align:left;">Lang then concludes with a question and call to action.</p>
<p style="text-align:left;"><strong>A New Beginning for Scholarly Publishing?</strong></p>
<p style="text-align:left;">&#8220;So let’s abandon all the 20th-century baggage of traditional journals, and move to a more rational model for scholarly publication, with no copy editors, no reviewers, no redundancy, and no unnecessary delays. A concrete step would be to give each ACL member a DOI for a unipaper, and then ask them to non-redundantly populate this with a sequence, or a tree, of numbered paragraphs that consolidate all their work on a topic. Then, to get things moving, the present journal could insist that some proportion of citations be to paragraphs within these unipapers, with hyperlinks embedded right there in the citations. What are we waiting for?&#8221;</p>
<p style="text-align:left;">Feel free to take issues with any of the above points.</p>
<p style="text-align:left;">My opinion? Lang has very good arguments. However, &#8230;. I would say that due to the changing times -read here smart phones, I-tablets, blogs, social networks, etc- many hardcopy scholarly journals are actually evolving while the weakers or unfit to changes are dying as a natural e-phenomenon observed in online ecosystems. This is not unique of scholarly journals. Actually the same is true for any piece of hardcopy journal, newspaper, magazine, newsletters. </p>
<p>With more retailers giving discounts and even freebies just for showing a tweet about their products or services at their store, who knows what will be the fate of  flyers, coupons, etc.</p>
<p style="text-align:left;">Publishers that don&#8217;t adjust their business models to the changing times are deemed to become the next  LPs, 8-tracks, cassette tapes, etc.   </p>
<p style="text-align:left;">Lang, N. (2010) Are We Near the End of the Journal. Computational Linguistics Volume 36, Number 4.  Retrieved from <a href="http://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00019">http://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00019</a>  </p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/marketing-research/'>Marketing Research</a>, <a href='http://irthoughts.wordpress.com/category/miscellaneous/'>Miscellaneous</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1578/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1578/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1578/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1578/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1578/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1578/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1578/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1578/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1578/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1578/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1578/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1578/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1578/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1578/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1578&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/02/14/are-we-near-the-end-of-hardcopy-scholarly-journals/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Spreading some news</title>
		<link>http://irthoughts.wordpress.com/2011/02/10/spreading-some-news/</link>
		<comments>http://irthoughts.wordpress.com/2011/02/10/spreading-some-news/#comments</comments>
		<pubDate>Thu, 10 Feb 2011 12:39:45 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Newsletters]]></category>
		<category><![CDATA[Vector Space Models]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1575</guid>
		<description><![CDATA[Back to blogging. I&#8217;ve been very busy putting together a paper on a weighting model and answering feedback received from &#8230;<p><a href="http://irthoughts.wordpress.com/2011/02/10/spreading-some-news/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1575&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;">Back to blogging. I&#8217;ve been very busy putting together a paper on a weighting model and answering feedback received from colleagues on it.</p>
<p style="text-align:left;">So this might explain why the January IRW newsletter is delayed. It should arrive subscribers inboxes during the day. The February issue will be out in about one week. These are back to back issues on Statistical Analysis of N-Grams.</p>
<p style="text-align:left;">Part 1:N-Grams &amp; Contingency Tables</p>
<p style="text-align:left;">Part 2: N-Grams &amp; Association Measures</p>
<p style="text-align:left;">On other matters, a PhD student published few years ago an excellent application of the Vector Space Model applied to Protein Analysis. You can revisit the post at <a href="http://irthoughts.wordpress.com/2008/11/12/vector-space-model-and-protein-retrieval/">http://irthoughts.wordpress.com/2008/11/12/vector-space-model-and-protein-retrieval/</a> .</p>
<p style="text-align:left;">If others have other applications of VSM in other disciplines, let me know. I&#8217;m interested in multidisciplinary stuff.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>, <a href='http://irthoughts.wordpress.com/category/vector-space-models/'>Vector Space Models</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1575/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1575/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1575/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1575/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1575/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1575/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1575/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1575/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1575/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1575/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1575/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1575/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1575/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1575/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1575&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/02/10/spreading-some-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>On the Non-Additivity of Correlation Coefficients</title>
		<link>http://irthoughts.wordpress.com/2011/01/07/on-the-non-additivity-of-correlation-coefficients/</link>
		<comments>http://irthoughts.wordpress.com/2011/01/07/on-the-non-additivity-of-correlation-coefficients/#comments</comments>
		<pubDate>Fri, 07 Jan 2011 15:08:34 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[IR Tutorials]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1569</guid>
		<description><![CDATA[Regardless of your research field, soon or later you need to generate average statistics, for instance a weighted correlation coefficient &#8230;<p><a href="http://irthoughts.wordpress.com/2011/01/07/on-the-non-additivity-of-correlation-coefficients/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1569&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Regardless of your research field, soon or later you need to generate average statistics, for instance a weighted correlation coefficient between any two variables, x and y.</p>
<p>Computing weighted averages of correlation coefficients depends on the weighting strategy used: unit weights, sample size, optimal weights, and within/between study variances, etc. Most text books advocate the use of Fisher&#8217;s Z Transformation for, for instance compute confidence intervals and average correlations.</p>
<p>One thing that has been bothering me for a long time now is this: what would be the discriminatory power of such weighting strategies if we are in the presence of identical data sets of correlation values, but coming from samples with different variabilities effects in the dependent variable?</p>
<p>Research conducted for the last four months let me to realize an alternate approach to the above weighting strategies.</p>
<p>At that time I was putting together a new tutorial series on meta-analysis, so this problem diverted my attention and was always in the back of my head.</p>
<p>So after many meals and long nights, I finally decided to include my research findings as Part 1 of the tutorial series, which you can read here: <a href="http://www.miislita.com/statistics/on-the-non-additivity-correlation-coefficients.pdf">On the Non-Additivity of Correlation Coefficients</a>.</p>
<p>I hope you like it. Since this is relevant to many research areas, please send your feedback through private, confidential email and not through this blog.</p>
<p>PS. If others are interested in testing how the proposed approach compares with other weighting strategies, feel free to contacting me. I&#8217;m interested in testing with real data (non-simulated) from any field: science, engineering, education, behavioral &amp; social sciences, allied health, literature, politics, marketing, etc.)</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/ir-tutorials/'>IR Tutorials</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1569/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1569/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1569&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/01/07/on-the-non-additivity-of-correlation-coefficients/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Starting 2011 with a great research</title>
		<link>http://irthoughts.wordpress.com/2011/01/06/starting-2011-with-a-great-research/</link>
		<comments>http://irthoughts.wordpress.com/2011/01/06/starting-2011-with-a-great-research/#comments</comments>
		<pubDate>Thu, 06 Jan 2011 19:37:57 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Newsletters]]></category>
		<category><![CDATA[SEO Myths]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1566</guid>
		<description><![CDATA[Nothing better than starting 2011 with more research work. Check this blog tomorrow as there will be a value-added good news &#8230;<p><a href="http://irthoughts.wordpress.com/2011/01/06/starting-2011-with-a-great-research/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1566&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Nothing better than starting 2011 with more research work.</p>
<p>Check this blog tomorrow as there will be a value-added good news for those interested in conducting research at the interface of information retrieval, statistical analysis, and applied mathematics. You&#8217;re welcome to grab a copy of this four-month investigation, for use in your own research, as a teaching tool, or to chase away SEO snakeoil.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>, <a href='http://irthoughts.wordpress.com/category/seo-myths/'>SEO Myths</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1566/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1566/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1566/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1566&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2011/01/06/starting-2011-with-a-great-research/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>IRW:2010-12:Visualizing Fischer&#8217;s Z Transformation</title>
		<link>http://irthoughts.wordpress.com/2010/12/31/irw2010-12visualizing-fischers-z-transformation/</link>
		<comments>http://irthoughts.wordpress.com/2010/12/31/irw2010-12visualizing-fischers-z-transformation/#comments</comments>
		<pubDate>Fri, 31 Dec 2010 14:31:26 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Newsletters]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1562</guid>
		<description><![CDATA[The current issue of the Information Retrieval Newsletter is out! Due to the Holiday Season, it is a short issue. &#8230;<p><a href="http://irthoughts.wordpress.com/2010/12/31/irw2010-12visualizing-fischers-z-transformation/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1562&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-medium wp-image-1563" title="fisher-Z-transformation" src="http://irthoughts.files.wordpress.com/2010/12/fisher-z-transformation.png?w=400&#038;h=306" alt="Fisher's Z Transformation" width="400" height="306" /></p>
<p>The current issue of the Information Retrieval Newsletter is out! Due to the Holiday Season, it is a short issue.</p>
<p>The article section is dedicated to <em>Fisher&#8217;s Z Transformation</em>, its origins, advantages, and limitations.</p>
<p>We have included a less known visualization of it. Using a geometrical interpretation helps one to understand how the transformation works.</p>
<p>Enjoy it and Happy New Year!</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1562/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1562/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1562/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1562/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1562/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1562/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1562/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1562/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1562/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1562/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1562/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1562/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1562/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1562/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1562&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/12/31/irw2010-12visualizing-fischers-z-transformation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>

		<media:content url="http://irthoughts.files.wordpress.com/2010/12/fisher-z-transformation.png?w=400" medium="image">
			<media:title type="html">fisher-Z-transformation</media:title>
		</media:content>
	</item>
		<item>
		<title>DOS Attacks with Links using URL Shorteners</title>
		<link>http://irthoughts.wordpress.com/2010/12/23/dos-attacks-with-links-using-url-shorteners/</link>
		<comments>http://irthoughts.wordpress.com/2010/12/23/dos-attacks-with-links-using-url-shorteners/#comments</comments>
		<pubDate>Thu, 23 Dec 2010 14:08:23 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Hacking]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1560</guid>
		<description><![CDATA[This is interesting: How to launch a DOS attack without ever having to infect a single PC machine. This is &#8230;<p><a href="http://irthoughts.wordpress.com/2010/12/23/dos-attacks-with-links-using-url-shorteners/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1560&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is interesting: How to launch a DOS attack without ever having to infect a single PC machine. This is done through URL shorteners:</p>
<p><a href="http://www.switched.com/2010/12/23/doz-me-can-launch-ddos-attacks-ben-schmidt">Doz.me Can Launch DDoS Attacks Using Shortened URLs</a></p>
<p><a href="http://securitywatch.eweek.com/ddos/url_shortener_is_also_a_ddos_tool.html">URL Shortener Is Also A DDOS Tool</a></p>
<p>Beware of clicking on links with URLs shortened.</p>
<p>Ho, Ho, Ho.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/hacking/'>Hacking</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1560/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1560/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1560/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1560/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1560/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1560/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1560/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1560/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1560/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1560/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1560/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1560/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1560/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1560/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1560&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/12/23/dos-attacks-with-links-using-url-shorteners/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Upcoming work</title>
		<link>http://irthoughts.wordpress.com/2010/12/20/upcoming-work/</link>
		<comments>http://irthoughts.wordpress.com/2010/12/20/upcoming-work/#comments</comments>
		<pubDate>Mon, 20 Dec 2010 17:29:57 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Newsletters]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1558</guid>
		<description><![CDATA[I&#8217;m putting together a 4-part series on meta-analysis in information retrieval, with feedback from several researchers. It will be an &#8230;<p><a href="http://irthoughts.wordpress.com/2010/12/20/upcoming-work/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1558&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m putting together a 4-part series on meta-analysis in information retrieval, with feedback from several researchers. It will be an interesting series to follow. Several myths will be dispelled for good and once for all. The current issue of IRW provides a sneak preview. The newsletter and the first article of the series will be probably out this or next week.</p>
<p>Happy Holidays.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1558/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1558/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1558/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1558/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1558/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1558/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1558/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1558/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1558/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1558/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1558/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1558/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1558/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1558/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1558&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/12/20/upcoming-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Understanding Correlation</title>
		<link>http://irthoughts.wordpress.com/2010/11/29/understanding-correlation/</link>
		<comments>http://irthoughts.wordpress.com/2010/11/29/understanding-correlation/#comments</comments>
		<pubDate>Mon, 29 Nov 2010 20:31:08 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[SEO Myths]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1548</guid>
		<description><![CDATA[I came across Professor R.J. Rummel page on Understanding Correlation. This is an old, but still relevant book-like Web page &#8230;<p><a href="http://irthoughts.wordpress.com/2010/11/29/understanding-correlation/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1548&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I came across Professor R.J. Rummel page on <a href="http://www.mega.nu/ampp/rummel/uc.htm">Understanding Correlation</a>. This is an old, but still relevant book-like Web page on how to interpret properly correlation coefficients.</p>
<p>In Chapter 4 he discusses on the proper way of looking at correlation coefficient values. He writes and quote (emphasis added in boldfaces):</p>
<p>&#8220;As a matter of routine<strong> it is the squared correlations that should be interpreted</strong>. This is because the correlation coefficient is misleading in suggesting the existence of more covariation than exists, and this problem gets worse as the correlation approaches zero. Consider the following correlations and their squares.&#8221;</p>
<p><img src="http://www.mega.nu/ampp/rummel/uc.rtab.gif" alt="" /></p>
<p>&#8220;Note that as the correlation r decrease by tenths, the r<sup>2</sup> decreases by much more. A correlation of .50 only shows that 25 percent variance is in common; a correlation of .20 shows 4 percent in common; and a correlation of .10 shows 1 percent in common (or 99 percent not in common). Thus, squaring should be a healthy corrective to the tendency to consider<strong> low correlations, such as .20 and .30, </strong>as indicating <strong>a meaningful or practical covariation.</strong> &#8220;</p>
<p>Rummel&#8217;s page is very relevant these days where SEOs from SEOMOZ and few other snakeoil marketing sites are buying the bogus discourse from Fishkin and Hendrickson that low correlation coefficients in about that range are evidence of LDA scores and Google ranks being &#8220;highly&#8221; correlated.</p>
<p>As mentioned before at this blog, SEO marketers are good at selling that kind of snakeoil or &#8220;quack&#8221; science.</p>
<p>Statistical significance does not equate to high correlation. For large enough sample sizes even very low r values (0.1, 0.01, etc) eventually become significant, but these do not equate to high correlation.</p>
<p>On a side note, I&#8217;m reading an IR thesis wherein Spearman&#8217;s and Kendall&#8217;s coefficients are used. Quite interesting.</p>
<p>First PS</p>
<p>According to a Sloan Consulting article at <a href="http://www.isixsigma.com/index.php?option=com_k2&amp;view=item&amp;id=1335:understanding-scatter-diagrams-and-correlation-analysis&amp;Itemid=204">ISIXSIGMA.COM</a> site and quote (emphasis added)</p>
<p>&#8220;As a rule of thumb <strong>a strong correlation </strong>or relationship has an <strong><em>r</em></strong>-value range of between 0.85 to 1, or -0.85 to -1. In a moderate correlation, the <strong><em>r</em></strong>-value ranges from 0.75 to 0.85 or, -0.75 to -0.85. In a <strong>weak correlation, one that is not a very helpful predictor</strong>, <strong><em>r</em></strong> ranges from 0.60 to 0.74 or -0.60 to 0.74. Though an entirely random relationship equals, 0.00, <strong>any relationship that has a correlation <em>r</em>-value that is 0.59 and below is not considered to be a reliable predictor</strong>.&#8221;</p>
<p>According to this <a href="http://www.schoolnet.org.za/twt/06/M6_Understanding_Correlation.pdf">Intel Teach Program</a> a correlation between 0 and 0.19 is a very weak one while one between 0.2 and 0.39 weak enough.</p>
<p>True that there are many correlation charts out there and some do not agree in specific degrees or ranges, but they all tend to agree in one thing: that a correlation value below 0.20 is a very, very weak correlation, never deemed as evidence of variables being &#8220;highly correlated&#8221; as claimed by SEOMOZ in their LDA fiasco posts.</p>
<p>Second PS</p>
<p>Here is a list of reference links wherein these marketers make correlation claims based on quite weak correlation values and in the process keep misleading naive peers and the public. &#8220;Highly correlated&#8221;? &#8220;Remarkably well correlated?&#8221; Evidently Statistics is a Loss for SEOs.</p>
<p><a href="http://www.seomoz.org/blog/lda-correlation-017-not-032">http://www.seomoz.org/blog/lda-correlation-017-not-032</a></p>
<p><a href="http://www.seomoz.org/blog/lda-and-googles-rankings-well-correlated">http://www.seomoz.org/blog/lda-and-googles-rankings-well-correlated</a></p>
<p><a href="http://www.seomoz.org/blog/google-vs-bing-correlation-analysis-of-ranking-elements">http://www.seomoz.org/blog/google-vs-bing-correlation-analysis-of-ranking-elements</a></p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/seo-myths/'>SEO Myths</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1548/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1548/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1548/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1548/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1548/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1548/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1548/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1548/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1548/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1548/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1548/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1548/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1548/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1548/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1548&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/11/29/understanding-correlation/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>

		<media:content url="http://www.mega.nu/ampp/rummel/uc.rtab.gif" medium="image" />
	</item>
		<item>
		<title>IRW:11-2010: Tables of Correlation Features</title>
		<link>http://irthoughts.wordpress.com/2010/11/26/irw11-2010-tables-of-correlation-features/</link>
		<comments>http://irthoughts.wordpress.com/2010/11/26/irw11-2010-tables-of-correlation-features/#comments</comments>
		<pubDate>Fri, 26 Nov 2010 18:29:41 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Newsletters]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1544</guid>
		<description><![CDATA[The current issue of IRW is out and should reach subscribers during the day.  In this issue we delve into our Tables &#8230;<p><a href="http://irthoughts.wordpress.com/2010/11/26/irw11-2010-tables-of-correlation-features/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1544&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter" src="http://www.miislita.com/irw/tables-of-correlation-features.png" alt="tables of correlation features" /></p>
<p>The current issue of IRW is out and should reach subscribers during the day. </p>
<p>In this issue we delve into our Tables of Correlation Features.  </p>
<p>The QA section addresses the question on how university administrators can allocate faculty to programs using an interesting formula.</p>
<p>The Who&#8217;s Who in CS is dedicated to one of my heroes: Wesley A. Clark.</p>
<p>Enjoy it and happy Holiday Season.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1544/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1544/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1544/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1544&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/11/26/irw11-2010-tables-of-correlation-features/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>

		<media:content url="http://www.miislita.com/irw/tables-of-correlation-features.png" medium="image">
			<media:title type="html">tables of correlation features</media:title>
		</media:content>
	</item>
		<item>
		<title>Expanding on Tables of Correlation Features</title>
		<link>http://irthoughts.wordpress.com/2010/11/25/expanding-on-tables-of-correlation-features/</link>
		<comments>http://irthoughts.wordpress.com/2010/11/25/expanding-on-tables-of-correlation-features/#comments</comments>
		<pubDate>Thu, 25 Nov 2010 12:32:30 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1541</guid>
		<description><![CDATA[I&#8217;ve updated and expanded the Tables of Correlation Features article to include: 1. How-to instructions for reproducing the tables. 2. &#8230;<p><a href="http://irthoughts.wordpress.com/2010/11/25/expanding-on-tables-of-correlation-features/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1541&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve updated and expanded the <a href="http://www.miislita.com/statistics/tables-of-correlation-features.pdf">Tables of Correlation Features</a> article to include:</p>
<p>1. How-to instructions for reproducing the tables.<br />
2. Additional statistics theory.<br />
3. Working examples on how to use the tables.<br />
4. An example on how results compare with G*Power software.</p>
<p>Enjoy it and Happy Thank-Day.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1541/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1541/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1541/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1541&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/11/25/expanding-on-tables-of-correlation-features/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Tables of Correlation Features</title>
		<link>http://irthoughts.wordpress.com/2010/11/15/tables-of-correlation-features/</link>
		<comments>http://irthoughts.wordpress.com/2010/11/15/tables-of-correlation-features/#comments</comments>
		<pubDate>Mon, 15 Nov 2010 18:58:00 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1535</guid>
		<description><![CDATA[I have constructed several Tables of Correlation Features . I found these quite useful for quickly determining statistical significance of r &#8230;<p><a href="http://irthoughts.wordpress.com/2010/11/15/tables-of-correlation-features/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1535&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I have constructed several <a href="http://www.miislita.com/statistics/tables-of-correlation-features.pdf">Tables of Correlation Features</a> . I found these quite useful for quickly determining statistical significance of r values and for discriminating between several correlation features. They are great for data mining and for use in other disciplines or fields.</p>
<p>These are presented for the convenience of analysts and for use with statistical and practical significance tests. Readers requiring additional theory or statistical data corresponding to confidence levels and/or degrees of freedom not covered in the tables are referred to the literature.</p>
<p>In the future I will provide some practical applications of these. I hope you like the tables.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1535/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1535/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1535/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1535&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/11/15/tables-of-correlation-features/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>On Statistical Significance and SEO Statistical “Studies”</title>
		<link>http://irthoughts.wordpress.com/2010/11/08/on-statistical-significance-and-seo-statistical-%e2%80%9cstudies%e2%80%9d/</link>
		<comments>http://irthoughts.wordpress.com/2010/11/08/on-statistical-significance-and-seo-statistical-%e2%80%9cstudies%e2%80%9d/#comments</comments>
		<pubDate>Mon, 08 Nov 2010 15:13:06 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Quack Science]]></category>
		<category><![CDATA[SEO Myths]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1518</guid>
		<description><![CDATA[Back in 2008, Jan M. Hoem wrote an interesting reflexion paper titled “The reporting of statistical significance in scientific journals” &#8230;<p><a href="http://irthoughts.wordpress.com/2010/11/08/on-statistical-significance-and-seo-statistical-%e2%80%9cstudies%e2%80%9d/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1518&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;">Back in 2008, Jan M. Hoem wrote an interesting reflexion paper titled “The reporting of statistical significance in scientific journals” (VOLUME 18, ARTICLE 15, PAGES 437-442; 03 JUNE 2008 <a href="http://www.demographic-research.org/volumes/vol18/15/18-15.pdf">http://www.demographic-research.org/volumes/vol18/15/18-15.pdf</a> . The piece was an expanded version of a previous paper (<a href="http://www.demogr.mpg.de/papers/working/wp-2007-037.pdf">http://www.demogr.mpg.de/papers/working/wp-2007-037.pdf</a> )</p>
<p style="text-align:left;">He wrote (and I quote):</p>
<p style="text-align:left;">“Scientific journals in most empirical disciplines have regulations about how authors should report the precision of their estimates of model parameters and other model elements. Some journals that overlap fully or partly with the field of demography demand as a strict prerequisite for publication that a p-value, a confidence interval, or a standard deviation accompany any parameter estimate. I feel that this rule is sometimes applied in an overly mechanical manner. Standard deviations and p-values produced routinely by general-purpose software are taken at face value and included without questioning, and features that have too high a p-value or too large a standard deviation are too easily disregarded as being without interest because they appear not to be statistically significant. In my opinion authors should be discouraged from adhering to this practice, and flexibility rather than rigidity should be encouraged in the reporting of statistical significance. One should also encourage thoughtful rather than mechanical use of p-values, standard deviations, confidence intervals, and the like. Here is why:”</p>
<p style="text-align:left;">Hoem then dissects five points related with misusing statistical significance results and automatic software solutions. I’m listing these below.</p>
<ol style="text-align:left;">
<li>The scientific importance of an empirical finding depends much more on its contribution to the development or falsification of a substantive theory than on the values of indicators of statistical significance.</li>
<li>Measures of statistical significance may be misleading. When a model has been developed through repeated use of tests of significance to include and exclude covariates, to split or combine levels on categorical covariates, and to determine other model features, the user often loses control over statistical-significance values, and the values computed by standard software may be completely misleading.</li>
<li>Standard p-values can be insufficiently precise indicators of statistical significance, particularly if their values are given only in grouped levels, which are often indicated by asterisks beside parameter estimates (“* = p&lt;0.1, ** = p&lt;0.05, *** = p&lt;0.01”, and so on).</li>
<li>It may be more important for an understanding of demographic behavior or other phenomena studied to know whether the inclusion of a categorical covariate in its entirety contributes significantly to an improvement of the model than to know the significance indicators of each of its levels.</li>
<li>Standard deviations, when used, should be reported for interesting contrasts, not for features selected automatically by statistical software.</li>
</ol>
<p style="text-align:left;">I completely agree with Hoem.</p>
<p style="text-align:left;"><strong>SEOMOZ and their statistical “studies”</strong></p>
<p style="text-align:left;">These days search engine optimization marketers (SEOs/SEMs) keep misinterpreting statistical results spitted from software without stopping and thinking about the significance-behind-the-significance, especially when it comes to a correlation coefficient, r (Pearson, Spearman, etc).</p>
<p style="text-align:left;">When one reads SEO hearsays and urban legends at SEOMOZ about very small correlation coefficients (0.17, 0.32, etc) derived from large sample sizes, as evidence that variables are “highly correlated” or “well correlated”, it is time to stop and put into question such “studies”. For reference see the following links</p>
<p style="text-align:left;"><a href="http://www.seomoz.org/blog/lda-and-googles-rankings-well-correlated">http://www.seomoz.org/blog/lda-and-googles-rankings-well-correlated</a><br />
<a href="http://www.seomoz.org/blog/lda-correlation-017-not-032">http://www.seomoz.org/blog/lda-correlation-017-not-032</a><br />
<a href="http://irthoughts.wordpress.com/2010/04/23/beware-of-seo-statistical-studies/">http://irthoughts.wordpress.com/2010/04/23/beware-of-seo-statistical-studies/</a>  </p>
<p style="text-align:left;">Fortunately, leading search marketers like Danny Sullivan has put into question those “studies” at a recent search engine conference</p>
<p style="text-align:left;"><a href="http://outspokenmedia.com/internet-marketing-conferences/evening-forum-with-danny-sullivan/">http://outspokenmedia.com/internet-marketing-conferences/evening-forum-with-danny-sullivan/</a> ,</p>
<p style="text-align:left;">and that was even before SEOMOZ admitted the 0.32 result has to be recanted as 0.17.</p>
<p style="text-align:left;">Sean Golliher, founder and publisher of the Search Engine Marketing Journal (SEMJ.org) has also put into question their results (<a href="http://www.seangolliher.com/2010/uncategorized/185/">http://www.seangolliher.com/2010/uncategorized/185/</a> ), which Hendrickson from SEOMOZ still insists in defending.</p>
<p style="text-align:left;">Since then they have never disclosed the source of the mistake, dismissing it just as a programming error. Unfortunately, they are still claiming that a 0.15 – 0.30 range validates their “studies” (<a href="http://www.seo.co.uk/seo-news/seo-tools/the-seomoz-lda-tool-%E2%80%93-our-disappointing-findings.html">http://www.seo.co.uk/seo-news/seo-tools/the-seomoz-lda-tool-%E2%80%93-our-disappointing-findings.html</a>) .</p>
<p style="text-align:left;"><strong>Small and Large Sample Sizes</strong></p>
<p style="text-align:left;">When William Sealy Gosset (aka “Student”) proposed, and Ronald A. Fisher expanded on, the test later termed Student’s t-test of significance, the test was meant to be used to assess information from small samples, not from large samples. I have discussed the case of small sample sizes in another post (<a href="http://irthoughts.wordpress.com/2010/10/18/on-correlation-coefficients-and-sample-size/">http://irthoughts.wordpress.com/2010/10/18/on-correlation-coefficients-and-sample-size/</a> ).</p>
<p style="text-align:left;">In order to apply a t-test (and other small sample analysis tests) to large samples, divide-and-conquer techniques, like stratification, were eventually developed. In the case of correlation and regression, the reason for doing this is that applying something like a t-test to, for instance, a single correlation coefficient coming from a huge sample can produce misleading results. Let see why.</p>
<p style="text-align:left;">For large enough sample sizes eventually any correlation coefficient, even the smaller ones, will always be significant (t-observed &gt; t-table). At that point it might be tempting to assume that the variables in question are highly correlated. Wrong assumption!</p>
<p style="text-align:left;">The fact is that statistical significance does not necessarily equate to variables being highly correlated and vice versa. Let address this point in two parts: (1) the question of statistical significance and (2) the question of high correlation.</p>
<p style="text-align:left;"><strong>Statistical Significance: Bigger not always is better</strong></p>
<p style="text-align:left;">As noted in a Wikipedia entry, “given a sufficiently large sample size, a statistical comparison will always show a significant difference unless the population effect size is exactly zero. (<a href="http://en.wikipedia.org/wiki/Effect_size">http://en.wikipedia.org/wiki/Effect_size</a> ). I have discussed effect size and power analysis in a previous post (<a href="http://irthoughts.wordpress.com/2010/10/21/on-power-analysis-and-seo-quack-science/">http://irthoughts.wordpress.com/2010/10/21/on-power-analysis-and-seo-quack-science/</a>  ).</p>
<p style="text-align:left;">The reason for the above effect has a lot to do with the definition of statistical significance itself. Statistical significance is the confidence one has in a given result and that such a result is not by random chance.</p>
<p style="text-align:left;">In mathematical terms, the confidence that a result is not by random chance is given by the following formula by Sackett (<a href="http://en.wikipedia.org/wiki/Statistical_significance">http://en.wikipedia.org/wiki/Statistical_significance</a> , <a href="http://www.cmaj.ca/cgi/content/full/165/9/1226">http://www.cmaj.ca/cgi/content/full/165/9/1226</a> ):</p>
<p style="text-align:left;">Confidence = (Signal/Noise)*Sqrt[Sample Size]</p>
<p style="text-align:left;">This simple expression or derivatives of it appears in many different scenarios and disciplines. It describes a generic Confidence Function, F, in terms of a Signal, a Noise, and a Sample Size; that is, F(Signal, Noise, Sample Size). In general, such a generic function tells us that:</p>
<ul style="text-align:left;">
<li>Confidence is proportional to a Signal source (S).</li>
<li>Confidence is inversely proportional to a Noise source (N).</li>
<li>Confidence is proportional to a Signal-to-Noise ratio (S/N).</li>
<li>Confidence is proportional to a Sample Size.</li>
</ul>
<p style="text-align:left;">Let’s apply a version of this expression to correlation. To do this, let Y be the dependent variable and let X be the independent variable. Let also make the following substitutions:</p>
<ul style="text-align:left;">
<li>Confidence: expressed as t<sup>2</sup></li>
<li>Signal: expressed as r<sup>2</sup>; i.e., fraction of explained variations in Y (due to X).</li>
<li>Noise: expressed as 1 – r<sup>2</sup>; i.e., fraction of unexplained variations in Y.</li>
<li>Sample Size: expressed as degrees of freedom; i.e., n – 2 for a two-tailed test.</li>
</ul>
<p style="text-align:left;">F(Signal, Noise, Sample Size) = t-observed<sup>2</sup> = [r<sup>2</sup>/(1 – r<sup>2</sup>] [n – 2]</p>
<p style="text-align:left;">Taking the square root (Sqrt) at both sides, we obtain the so-called formula for a two-tailed t-test.</p>
<p style="text-align:left;">t-observed = r*Sqrt[(n – 2)/(1 – r<sup>2</sup>)]</p>
<p style="text-align:left;">Evidently for a given r value, t-observed increases when n increases. By rearranging this expression, it is possible to compute for a large enough sample size a critical value above which r values will be significant. For very large samples at a 95% confidence level, t-table= 1.96 (<a href="http://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values">http://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values</a> ). Replacing arbitrarily this value in the above expression (t-observed = t-table = t = 1.96) and solving for r, we obtain that the critical r value is given by</p>
<p style="text-align:left;">r = t/Sqrt[(n – 2) + t<sup>2</sup>]</p>
<p style="text-align:left;">The following table lists values for very small r values and huge sample sizes. I’m intentionally using several decimal places and ignoring significant figure rules since I want to make a point on the small values used. I’m also using a 0.95 confidence level for illustration purposes, but for the large samples I could and <span style="text-decoration:underline;">should</span> use other confidence levels as well.</p>
<div style="text-align:left;">
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="bottom">n</td>
<td valign="bottom">n &#8211; 2</td>
<td valign="bottom">t</td>
<td valign="bottom">r</td>
<td valign="bottom">S = r*r</td>
<td valign="bottom">N = 1 &#8211; r*r</td>
<td valign="bottom">S/N</td>
</tr>
<tr>
<td>1,000</td>
<td>998</td>
<td>1.96</td>
<td>0.0619</td>
<td>0.003835</td>
<td>0.996165</td>
<td valign="bottom">0.003849</td>
</tr>
<tr>
<td>10,000</td>
<td>9998</td>
<td>1.96</td>
<td>0.0196</td>
<td>0.000384</td>
<td>0.999616</td>
<td valign="bottom">0.000384</td>
</tr>
<tr>
<td>100,000</td>
<td>99998</td>
<td>1.96</td>
<td>0.0062</td>
<td>0.000038</td>
<td>0.999962</td>
<td valign="bottom">0.000038</td>
</tr>
<tr>
<td>1,000,000</td>
<td>999998</td>
<td>1.96</td>
<td>0.0020</td>
<td>0.000004</td>
<td>0.999996</td>
<td valign="bottom">0.000004</td>
</tr>
</tbody>
</table>
</div>
<p style="text-align:left;">For a sample size of 10,000 observations the critical r is 0.0196 or about 0.02, meaning that for such a huge sample size any r value above this small and critical r value will be significant. However, something interesting is observed from this table: (PS See footnote update)</p>
<p style="text-align:left;">When one moves to large sample sizes the Noise becomes greater than the Signal. For instance, at n = 10,000 the amount of Signal is very small (0.000384) while the amount of Noise is above 0.9996… or 99.96…%, giving a quite trivial S/N ratio. A similar reasoning can be applied for r = 0.17 (S = 0.0289, N = 0.9711) and r = 0.32 (S = 0.1024, N = 0.8976). The corresponding S/N ratios are trivial.</p>
<p style="text-align:left;">One can also solve the above expression for n to find sample sizes for some small r values and arbitrary t as shown in the following table (PS See footnote update).</p>
<div style="text-align:left;">
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="bottom"><strong>t</strong></td>
<td valign="bottom"><strong>r</strong></td>
<td valign="bottom">S = r*r</td>
<td valign="bottom">N = 1 &#8211; r*r</td>
<td valign="bottom"><strong>S/N</strong></td>
<td valign="bottom"><strong>n</strong></td>
<td valign="bottom"><strong>n &#8211; 2</strong></td>
</tr>
<tr>
<td valign="bottom">1.96</td>
<td valign="bottom">0.0200</td>
<td valign="bottom">0.000400</td>
<td valign="bottom">0.999600</td>
<td valign="bottom">0.000400</td>
<td valign="bottom">9602</td>
<td>9600</td>
</tr>
<tr>
<td valign="bottom">1.96</td>
<td valign="bottom">0.1500</td>
<td valign="bottom">0.022500</td>
<td valign="bottom">0.977500</td>
<td valign="bottom">0.023018</td>
<td valign="bottom">169</td>
<td>167</td>
</tr>
<tr>
<td valign="bottom">1.96</td>
<td valign="bottom">0.1700</td>
<td valign="bottom">0.028900</td>
<td valign="bottom">0.971100</td>
<td valign="bottom">0.029760</td>
<td valign="bottom">131</td>
<td>129</td>
</tr>
<tr>
<td valign="bottom">1.96</td>
<td valign="bottom">0.3000</td>
<td valign="bottom">0.090000</td>
<td valign="bottom">0.910000</td>
<td valign="bottom">0.098901</td>
<td valign="bottom">41</td>
<td>39</td>
</tr>
<tr>
<td valign="bottom">1.96</td>
<td valign="bottom">0.3200</td>
<td valign="bottom">0.102400</td>
<td valign="bottom">0.897600</td>
<td valign="bottom">0.114082</td>
<td valign="bottom">36</td>
<td>34</td>
</tr>
</tbody>
</table>
</div>
<p style="text-align:left;">Still, note that the amount of Noise completely overcomes the Signal, producing trivial S/N ratios. In general for small r and large n values significance is achieved at the cost of Noise masking the Signal. When this occurs the statistical significance is not a practical guideline for drawing useful conclusions from the data at hand.</p>
<p style="text-align:left;">This drives the present discussion to the substantive part of the problem missed by SEOs, and that is …</p>
<p style="text-align:left;"><strong>Statistical Significance Does Not Necessarily Mean Highly Correlated Results</strong></p>
<p style="text-align:left;">Simply stated, statistical significance does not necessarily imply that the X, Y variables are highly correlated.</p>
<p style="text-align:left;">A simple scatterplot will convince anyone that for the above small r values there will be no pattern or trend in the data set. The corresponding regression model will be useless for forecasting or inferring anything of value, except that the data spreads so wildly that it has no method to its chaos. What else is to be expected from a data set with a large Noise and small S/N ratio?</p>
<p style="text-align:left;">This is something that SEOs/SEMs still don’t seem to understand: t-observed &gt; t-table not necessarily means high correlation, and vice versa. I don’t have any personal stake (or take) against them, but when folks like Hendrickson, Fishkin, and others from SEOMOZ ignore Signal-to-Noise ratios and start referring to small r values as evidence that experimental variables are “highly” or “well” correlated, it is more than fair to call such “studies” Quack “Science”. That label might sound harsh, but in this case is appropriate.</p>
<p style="text-align:left;">Search engine marketers might be good at selling snakeoil, publishing sloppy “studies”, or recanting on overhyped statements, but not at doing real Science. They should know better; i.e., that</p>
<ul style="text-align:left;">
<li>“significance” does not mean “correlation”.</li>
<li>“significance” does not mean “important”.</li>
<li>“insignificance” does not mean “unimportant”.</li>
</ul>
<p style="text-align:left;">Statistical “significance” only means that any confidence in the data is not by random chance. Therefore, a significant correlation does not necessarily mean a “high”, “well”, or “strong” correlation between variables.</p>
<p style="text-align:left;">To understand all this we need to distinguish between statistical significance and practical significance.</p>
<p style="text-align:left;"><strong>Statistical Significance vs. Practical Significance</strong></p>
<p style="text-align:left;">As stated at this Wikipedia entry (emphasis added in boldfaces) <a href="http://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Criticism">http://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Criticism</a> ):</p>
<blockquote><p>A common misconception is that a statistically significant result is always of practical significance, or demonstrates a large effect in the population. Unfortunately, this problem is commonly encountered in scientific writing. <strong>Given a sufficiently large sample</strong>, extremely <strong>small</strong> and non-notable differences can be found to be <strong>statistically significant</strong>, and statistical significance says nothing about the practical significance of a difference.</p>
<p>Use of the statistical significance test has been called seriously flawed and unscientific by authors Deirdre McCloskey and Stephen Ziliak. They point out that &#8220;insignificance&#8221; does not mean unimportant, and propose that the scientific community should abandon usage of the test altogether, as it can cause false hypotheses to be accepted and true hypotheses to be rejected.</p>
<p>Some statisticians have commented that pure &#8220;significance testing&#8221; has what is actually a rather strange goal of detecting the existence of a &#8220;real&#8221; difference between two populations. <strong>In practice a difference can almost always be found given a large enough sample.</strong> The typically more relevant goal of science is a determination of <strong>causal effect size</strong>. The amount and nature of the difference, in other words, is what should be studied. Many researchers also feel that hypothesis testing is something of a misnomer. In practice a single statistical test in a single study never &#8220;proves&#8221; anything.</p></blockquote>
<p style="text-align:left;">That pretty much settles the question of discerning between statistical significance and practical significance of correlation coefficients, but does not tell us how to quantitatively discern between the two concepts. In an upcoming article, I will derive expressions that might help to quantitatively assess these.</p>
<p style="text-align:left;">Since the tutorial on correlation coefficients <a href="http://www.miislita.com/information-retrieval-tutorial/a-tutorial-on-correlation-coefficients.pdf">http://www.miislita.com/information-retrieval-tutorial/a-tutorial-on-correlation-coefficients.pdf</a> has been updated several times and is getting too long, I will put that upcoming material on a separate pdf file. As a sneak preview, we will be examining extreme cases (too high/low r values, too high/low sample sizes, and too high/low signal-to-noise ratios, etc.).</p>
<p style="text-align:left;">PS. I updated this post to fix some little typos.</p>
<p style="text-align:left;">Footnote. I found erroneous including the entries for n = 10 and n = 100 in the first table so I removed these altogether and limited the discussion to the large n values. A reader asked why I used t-table = 1.96 for all entries.  I thought it was clear from the discussion that the above tables are meant to show calculations for arbitrarily set t-values.  In a real test, you would need to use the actual t values from statistical tables. For instance, for n = 10 you would have to use a t-table value of  t = 2.306 at the 0.95 level. You should get</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td valign="top"><strong>n</strong></td>
<td valign="top"><strong>n-2</strong></td>
<td valign="top"><strong>t </strong></td>
<td valign="top"><strong>r</strong></td>
<td valign="top"><strong>S = r*r</strong></td>
<td valign="top"><strong>N = 1 &#8211; r*r</strong></td>
<td valign="top"><strong>S/N</strong></td>
</tr>
<tr>
<td valign="top">10</td>
<td valign="top">8</td>
<td valign="top">2.306</td>
<td valign="top">0.6319</td>
<td valign="top">0.399293</td>
<td valign="top">0.600707</td>
<td valign="top">0.664705</td>
</tr>
</tbody>
</table>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/quack-science/'>Quack Science</a>, <a href='http://irthoughts.wordpress.com/category/seo-myths/'>SEO Myths</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1518/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1518/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1518/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1518/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1518/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1518/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1518/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1518/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1518/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1518/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1518/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1518/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1518/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1518/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1518&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/11/08/on-statistical-significance-and-seo-statistical-%e2%80%9cstudies%e2%80%9d/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>IRW-10-2010:Inverted Index Architectures Part Three</title>
		<link>http://irthoughts.wordpress.com/2010/10/25/irw-10-2010inverted-index-architectures-part-three/</link>
		<comments>http://irthoughts.wordpress.com/2010/10/25/irw-10-2010inverted-index-architectures-part-three/#comments</comments>
		<pubDate>Mon, 25 Oct 2010 14:45:21 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Newsletters]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1512</guid>
		<description><![CDATA[The current issue of IRW should arrive today to subscribers inbox. It is full of meaty stuff. The featuring article &#8230;<p><a href="http://irthoughts.wordpress.com/2010/10/25/irw-10-2010inverted-index-architectures-part-three/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1512&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:center;"><img class="aligncenter" style="width:100%;" src="http://www.miislita.com/irw/inverted-index-3.png" alt="Subcribe to IRW" /></p>
<div style="font-size:1em;">
<p>The current issue of IRW should arrive today to subscribers inbox. It is full of meaty stuff. The featuring article is Part Three of the series on inverted index architectures. We cover positional inverted indexes. It is shown with a simple example how these indexes processs advance searches (AND, NEAR, and EXACT) in order to retrieve documents.</p>
<p>The QA column covers hypothesis testing with correlation coefficients at a given sample size and confidence level.</p>
<p>Enjoy it!</p>
</div>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/machine-learning/'>Machine Learning</a>, <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1512/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1512/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1512/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1512/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1512/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1512/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1512/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1512/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1512/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1512/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1512/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1512/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1512/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1512/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1512&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/10/25/irw-10-2010inverted-index-architectures-part-three/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>

		<media:content url="http://www.miislita.com/irw/inverted-index-3.png" medium="image">
			<media:title type="html">Subcribe to IRW</media:title>
		</media:content>
	</item>
		<item>
		<title>On Power Analysis and SEO Quack Science</title>
		<link>http://irthoughts.wordpress.com/2010/10/21/on-power-analysis-and-seo-quack-science/</link>
		<comments>http://irthoughts.wordpress.com/2010/10/21/on-power-analysis-and-seo-quack-science/#comments</comments>
		<pubDate>Thu, 21 Oct 2010 16:47:06 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[IR Tools]]></category>
		<category><![CDATA[Quack Science]]></category>
		<category><![CDATA[SEO Myths]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1505</guid>
		<description><![CDATA[One of the trickiest aspects of publishing statistical studies is the sample size to be used. Not stipulating a valid &#8230;<p><a href="http://irthoughts.wordpress.com/2010/10/21/on-power-analysis-and-seo-quack-science/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1505&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;">One of the trickiest aspects of publishing statistical studies is the sample size to be used. Not stipulating a valid procedure for estimating a proper sample size can hurt, for instance, a grant proposal. Ethical committees are concerned about the right number of observations in a study, asking submitters to justify on statistical grounds how they arrived at a given sample size. Research projects with too few or too many observations or no sample size methodology at all often get rejected. This is something those conducting SEO quack &#8220;science&#8221; don&#8217;t seem to understand or are not aware of.</p>
<p style="text-align:left;">Too small samples are unethical, because the researcher cannot be specific enough about the size of, for example, the effect of a drug in a population. Too large samples are also unethical, because represent a waste of funding. True that a large sample improves precision, but it might involve an unjustified cost. Stratification is preferred, but it gets too complicated with huge sample sizes, not to mention that statistical significance not necessarily scales between samples.</p>
<p style="text-align:left;">As Rahul Dodhia from RavenAnalytics (<a href="http://ravenanalytics.com/Articles/Sample_Size_Calculations.htm">http://ravenanalytics.com/Articles/Sample_Size_Calculations.htm</a> ) indicates: a 2000-sample might not be very different from a 20000-sample, but a 200-sample maybe very different from a 2000-sample even when in each case the sample ratio is 10. So, a large sample not always is justified, even if such a sample size improves statistical significance and precision.</p>
<p style="text-align:left;">Consider the case of search engine ranking results. Upon a query, search engines are capable of finding many results, frequently in the range of thousand or million results per query. Still search engines and retrieval systems show to users a limited answer set. For instance, Google limits its viewable answer set to a maximum of 1,000 results (100 pages, 10 results/page).</p>
<p style="text-align:left;">Like in most retrieval systems, relevant results are accumulated at the first few result pages forming clusters. This is in agreement with Rijsbergen’s Cluster Hypothesis, which states that documents that cluster together have a similar relevance to a given query. Moving down the list of search results one often find cluster transitions wherein the quality and aboutness of documents is polluted with off-topic content.</p>
<p style="text-align:left;">Documents buried in a list of results often contain content irrelevant to the initial query or full of spam techniques. If one wants to conduct a statistical study of ranking results versus a particular document feature, one can do better by considering a sample from the first few result pages than from the entire answer set of 1,000 results.</p>
<p style="text-align:left;">In general, in a non-search engine scenario one cannot just arbitrarily select large samples to “force” the statistical significance of very low correlation coefficients and then use those values to draw conclusions. Furthermore, what is the selection criterion for using 1,000 or 10,000 results?</p>
<p style="text-align:left;">Simply stated: If 10,000 observations are arbitrarily selected, why not use 100,000 or 1,000,000 instead? We already know that very small correlation coefficients between any two arbitrary pair of random variables will be significant at those huge sample levels, anyway. And?</p>
<p style="text-align:left;">As noted in a Wikipedia entry, “given a sufficiently large sample size, a statistical comparison <strong><span style="text-decoration:underline;">will always</span></strong> show a significant difference unless the population <strong><span style="text-decoration:underline;">effect size</span></strong> is exactly zero. (<a href="http://en.wikipedia.org/wiki/Effect_size">http://en.wikipedia.org/wiki/Effect_size</a> ).</p>
<p style="text-align:left;">For example, a correlation coefficient of r = 0.04 would be significant at a 95% confidence level if coming from a 10,000-sample (t-calc = 4.003 &gt;&gt; t-table = 1.96) while a correlation coefficient of r = 0.01 would be significant at a 95% confidence level if coming from a 100,000-sample (t-calc = 3.162 &gt;&gt; t-table = 1.96). And? This proves nothing, especially when the magnitude of a “signal” approaches the magnitude of its “noise”.</p>
<p style="text-align:left;">As noted at the above Wikipedia entry, a correlation coefficient of 0.1 is strongly statistically significant when sample size is 1000, (t-calc = 3.175 &gt;&gt; t-table = 1.96) but reporting only the small p-value from this analysis could be misleading if a correlation of 0.1 is too small to be of interest in a particular application. (<a href="http://en.wikipedia.org/wiki/Effect_size">http://en.wikipedia.org/wiki/Effect_size</a> ).</p>
<p style="text-align:left;">Statistical significance of extremely small r values is not surprising as is just a mathematical consequence of the fact that a t-value is a function (F) of a weighted ratio: the ratio of explained-to-unexplained variations weighted by the number of degree of freedoms:</p>
<p style="text-align:left;">F(r, n) = t = SQRT[(r<sup>2</sup>/(1 – r<sup>2</sup>))*(n – 2)]<br />
F(r, n) = t = r*SQRT[((n – 2)/(1 – r<sup>2</sup>))]</p>
<p style="text-align:left;">For a given r value, increasing n increases t. No surprise here. One thing is what a math equation tells you and another different thing is what the nature and obvious boundaries of a physical system tell you.</p>
<p style="text-align:left;">At trivially low r values any claim with regards to the statistical significance or strength of some results proves nothing and one cannot do much with such trivial r values. For instance for r = 0.04, r<sup>2</sup> = 0.0016, meaning that 1 – r<sup>2</sup> = 0.9984 or 99.84% of the variations in the dependent variable (y) are not explained by variations in the independent variable (x).</p>
<p style="text-align:left;">In such a scenario, assessing the effect of x on y is a futile exercise. Such a model would be useless for drawing conclusions or predicting anything. And here is the point that many SEOs at SEOMOZ (<a href="http://www.seo.co.uk/seo-news/seo-tools/the-seomoz-lda-tool-%E2%80%93-our-disappointing-findings.html">http://www.seo.co.uk/seo-news/seo-tools/the-seomoz-lda-tool-%E2%80%93-our-disappointing-findings.html</a> , Fishkin, Hendrickson, and others elsewhere) don’t seem to grasp:</p>
<p style="text-align:center;">When a correlation coefficient is useless for all practical purposes.</p>
<p style="text-align:left;">If the raw data constantly changes, that’s another “Chaos Layer” that compounds the problem.</p>
<p style="text-align:left;"><strong>Enters Cohen’s Power</strong></p>
<p style="text-align:left;">According to Cohen’s work, when conducting a sample size study of correlation coefficients, one needs to consider the required confidence level and power of the test, the desired probability for Type I and Type II Errors, and the hypothesized or anticipated correlation coefficient (<a href="http://www.medcalc.be/manual/correlation_coefficient.php">http://www.medcalc.be/manual/correlation_coefficient.php</a> ). One cannot just use an arbitrary sample size for testing things.</p>
<p style="text-align:left;">In general, given any three of the following, the fourth one can be determined (<a href="http://www.statmethods.net/stats/power.html">http://www.statmethods.net/stats/power.html</a> ):</p>
<p style="text-align:left;">1. sample size<br />
2. effect size<br />
3. significance level = P(Type I error) = probability of finding an effect that is not there<br />
4. power = 1 &#8211; P(Type II error) = probability of finding an effect that is there</p>
<p style="text-align:left;">One also needs to consider what is the statistical parameter that is undergoing the power analysis. One needs to ask questions like the following:</p>
<p style="text-align:left;">Are we testing means from a given group? <a href="http://www.nss.gov.au/nss/home.nsf/pages/Sample+Size+Calculator+Description?OpenDocument">http://www.nss.gov.au/nss/home.nsf/pages/Sample+Size+Calculator+Description?OpenDocument</a></p>
<p style="text-align:left;">Are we testing means from different groups? <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC137461/">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC137461/</a></p>
<p style="text-align:left;">Are we testing correlation coefficients? Read Simon’s take on the impact of sample size on the desired level of precision in correlation coefficients (<a href="http://www.childrens-mercy.org/stats/weblog2005/CorrelationCoefficient.asp">http://www.childrens-mercy.org/stats/weblog2005/CorrelationCoefficient.asp</a> ).</p>
<p style="text-align:left;">Are we interested in significance level, effect size, sample effect, or power?</p>
<p style="text-align:left;">When conducting an effect size analysis one must keep in mind that effect sizes estimate the strength of a possible relationship, rather than assigning a significance level. However, effect sizes do not determine significance levels, or vice-versa.</p>
<p style="text-align:left;"><strong>So, how do we go about implementing Power Analysis?</strong></p>
<p style="text-align:left;">For those interested in implementing power analysis written in the R Language, I recommend the libraries at <a href="http://www.statmethods.net/stats/power.html">http://www.statmethods.net/stats/power.html</a></p>
<p style="text-align:left;">Software for conducting power analysis is also available elsewhere, as shown in the following table. My favorites are G*Power and SPSS SamplePower (<a href="http://www.spss.com/software/statistics/samplepower/">http://www.spss.com/software/statistics/samplepower/</a>).</p>
<table style="float:left;" border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td colspan="2" width="535" valign="top"><strong>Power Analysis Software</strong>Source: <a href="http://www.epibiostat.ucsf.edu/biostat/sampsize.html">http://www.epibiostat.ucsf.edu/biostat/sampsize.html</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>Software</strong></td>
<td width="432" valign="top"><strong>Remarks</strong></td>
</tr>
<tr>
<td width="103" valign="top"><strong>G*Power</strong> License: Free</td>
<td width="432" valign="top">Uses both exact and approximate methods to calculate power. It will deal with sample size/power calculations for t-tests, 1-way ANOVAs, regression, correlation, and chi-square goodness of fit. For t-tests and ANOVAs you find the effect size by supplying mean and variance information. For correlation coefficients the effect size is a function of r<sup>2</sup>. <a href="http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/">http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>PC-Size</strong> License: Free</td>
<td width="432" valign="top">Deals with sample size/power calculations for t-tests, 1-way and 2-way ANOVA, simple regression, correlation, and comparison of proportions. <a href="http://www.esf.edu/efb/gibbs/monitor/usingDSTPLANandPCSIZE.pdf">http://www.esf.edu/efb/gibbs/monitor/usingDSTPLANandPCSIZE.pdf</a><br />
<a href="ftp://ftp.simtel.net/pub/simtelnet/msdos/statstcs/size102.zip">ftp://ftp.simtel.net/pub/simtelnet/msdos/statstcs/size102.zip</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>DSTPLAN</strong> License: Free</td>
<td width="432" valign="top">Uses approximate methods to calculate power. It will calculate sample size/power for t-tests, correlation, a difference in proportions, 2xN contingency tables, and various survival analysis designs. <a href="http://biostatistics.mdanderson.org/SoftwareDownload/SingleSoftware.aspx?Software_Id=41">http://biostatistics.mdanderson.org/SoftwareDownload/SingleSoftware.aspx?Software_Id=41</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>PS</strong> License: Free</td>
<td width="432" valign="top">Performs sample size/power calculations for t-tests, Chi-square, Fisher&#8217;s exact, McNemar&#8217;s, simple regression, and survival analysis. <a href="http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize">http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>Tibco Spoffire S+</strong><strong> </strong>License: Paid</td>
<td width="432" valign="top">The only commercially-supported statistical analysis software that delivers a cross-platform IDE for the award-winning S programming language, the ability to analyze gigabyte class data sets on the desktop, and a package system for sharing, reuse and deployment of analytics in the enterprise and in validated environments. Used widely in validated production environments (e.g., 21 CFR Part 11).<a href="http://spotfire.tibco.com/products/s-plus/statistical-analysis-software.aspx">http://spotfire.tibco.com/products/s-plus/statistical-analysis-software.aspx</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>NQuery Advisor</strong> License: Paid</td>
<td width="432" valign="top">Performs sample size/ power calculations for t-tests, 1 and 2 way ANOVAS, tests of contrasts in 1-way ANOVAs, univariate repeated measures designs, regression (simple, multiple and logistic), correlation, difference of proportions, 2XN contingency tables, and survival analyses. <a href="http://www.statsol.ie/nquery/nquery.htm">http://www.statsol.ie/nquery/nquery.htm</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>PASS</strong> License: Paid</td>
<td width="432" valign="top">Performs sample size/power calculations for z-tests, t-tests, 1, 2, and 3-way ANOVAs, univariate repeated measures designs, regression (simple, multiple and logistic), correlations, difference in proportions, 2xN contingency tables, survival analyses and simple non-parametric analyses.  <a href="http://www.ncss.com/pass.html">http://www.ncss.com/pass.html</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>Stata</strong> License: Paid</td>
<td width="432" valign="top">It has some simple built-in power and sample size functions. <a href="http://www.stata.com/%22">http://www.stata.com/</a></td>
</tr>
<tr>
<td width="103" valign="top"><strong>SPSS SamplePower</strong><strong> </strong>License: Paid<strong> </strong></td>
<td width="432" valign="top">If your sample size is too small, you could miss important research findings. If it&#8217;s too large, you could waste valuable time and resources. Finds the right sample size for your research in minutes and test the possible results before you begin your study, with IBM SPSS SamplePower. Strikes the right balance among confidence level, statistical power, effect size, and sample size using IBM SPSS SamplePower. Compares the effects of different study parameters with its flexible analytical tools. <a href="http://www.spss.com/software/statistics/samplepower/">http://www.spss.com/software/statistics/samplepower/</a>  </td>
</tr>
</tbody>
</table>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/ir-tools/'>IR Tools</a>, <a href='http://irthoughts.wordpress.com/category/quack-science/'>Quack Science</a>, <a href='http://irthoughts.wordpress.com/category/seo-myths/'>SEO Myths</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1505/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1505/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1505/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1505/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1505/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1505/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1505/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1505/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1505/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1505/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1505/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1505/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1505/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1505/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1505&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/10/21/on-power-analysis-and-seo-quack-science/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>On Correlation Coefficients and Sample Size</title>
		<link>http://irthoughts.wordpress.com/2010/10/18/on-correlation-coefficients-and-sample-size/</link>
		<comments>http://irthoughts.wordpress.com/2010/10/18/on-correlation-coefficients-and-sample-size/#comments</comments>
		<pubDate>Mon, 18 Oct 2010 15:25:19 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[IR Tutorials]]></category>
		<category><![CDATA[SEO Myths]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1499</guid>
		<description><![CDATA[Today I updated my Tutorial on Correlation Coefficients to include a new section on the effect of sample size on &#8230;<p><a href="http://irthoughts.wordpress.com/2010/10/18/on-correlation-coefficients-and-sample-size/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1499&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Today I updated my <a href="http://www.miislita.com/information-retrieval-tutorial/a-tutorial-on-correlation-coefficients.pdf">Tutorial on Correlation Coefficients</a> to include a new section on the effect of sample size on the significance of correlation coefficients. This was motivated by some comments from search engine marketers on correlation strengths. (<a href="http://searchenginewatch.com/3641002">http://searchenginewatch.com/3641002</a>). The new material might help those interested in learning whether a reported correlation coefficient is statistically different from zero. It is given below. Enjoy it.</p>
<p>The problem with correlation strength scales is that these say nothing about how the size of a sample impacts the significance of a correlation coefficient. This is a very important issue that is now addressed.</p>
<p>Consider three different correlation coefficients: 0.50, 0.35, and 0.17. Assume that we want to test that there is no significant relationship between the two variables at hand. The null hypothesis (H0) to be tested is that these r values are not statistically different from zero (rho = 0). How to proceed?</p>
<p>As recommended by Stevens (17), for rho = 0, H0 can be tested using a two tailed (i.e.,two sided) t-test at a given confidence level, usually at a 95% level. If t<sub>calculated</sub> ≥ t<sub>table</sub>, H0 is rejected. However, if t<sub>calculated</sub> &lt; t<sub>table</sub> H0 is not rejected and there is no significant correlation between variables.</p>
<p>Here t<sub>calculated</sub> is computed as r/SEr = r*SQRT[((n – 2)/(1 – r<sup>2</sup>))] while t<sub>table</sub> values are obtained from the literature (<a href="http://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values">http://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values</a> ). Table 2 summarizes the result of testing the null hypothesis at different sample size values.</p>
<div>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td colspan="7" valign="top"><strong>Table 2. </strong><strong><em>H<sub>0</sub></em></strong> <strong>tests at different sample sizes; two-tailed, 95% confidence.</strong></td>
</tr>
<tr>
<td valign="top"><strong><em>n</em></strong></td>
<td valign="top"><strong><em>df = n &#8211; 2</em></strong></td>
<td valign="top"><strong><em>r</em></strong></td>
<td valign="top"><strong><em>SE<sub>r</sub></em></strong></td>
<td valign="top"><strong><em>t(calc)</em></strong></td>
<td valign="top"><strong><em>t (0.95)</em></strong></td>
<td valign="top"><strong><em>Reject </em></strong>(<em>H<sub>0</sub></em> : <em>rho</em><em> = 0</em>)<strong><em>?</em></strong></td>
</tr>
<tr>
<td valign="top">5</td>
<td valign="top">3</td>
<td valign="top">0.50</td>
<td valign="top">0.50</td>
<td valign="top">1.000</td>
<td valign="top">3.182</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">10</td>
<td valign="top">8</td>
<td valign="top">0.50</td>
<td valign="top">0.31</td>
<td valign="top">1.633</td>
<td valign="top">2.306</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">12</td>
<td valign="top">10</td>
<td valign="top">0.50</td>
<td valign="top">0.27</td>
<td valign="top">1.826</td>
<td valign="top">2.228</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">14</td>
<td valign="top">12</td>
<td valign="top">0.50</td>
<td valign="top">0.25</td>
<td valign="top">2.000</td>
<td valign="top">2.179</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">20</td>
<td valign="top">18</td>
<td valign="top">0.50</td>
<td valign="top">0.20</td>
<td valign="top">2.449</td>
<td valign="top">2.101</td>
<td valign="top">reject</td>
</tr>
<tr>
<td valign="top">30</td>
<td valign="top">28</td>
<td valign="top">0.50</td>
<td valign="top">0.16</td>
<td valign="top">3.055</td>
<td valign="top">2.048</td>
<td valign="top">reject</td>
</tr>
<tr>
<td valign="top">40</td>
<td valign="top">38</td>
<td valign="top">0.50</td>
<td valign="top">0.14</td>
<td valign="top">3.559</td>
<td valign="top">2.024</td>
<td valign="top">reject</td>
</tr>
<tr>
<td valign="top">50</td>
<td valign="top">48</td>
<td valign="top">0.50</td>
<td valign="top">0.13</td>
<td valign="top">4.000</td>
<td valign="top">2.011</td>
<td valign="top">reject</td>
</tr>
<tr>
<td colspan="7" valign="top"> </td>
</tr>
<tr>
<td valign="top">5</td>
<td valign="top">3</td>
<td valign="top">0.35</td>
<td valign="top">0.54</td>
<td valign="top">0.647</td>
<td valign="top">3.182</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">10</td>
<td valign="top">8</td>
<td valign="top">0.35</td>
<td valign="top">0.33</td>
<td valign="top">1.057</td>
<td valign="top">2.306</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">12</td>
<td valign="top">10</td>
<td valign="top">0.35</td>
<td valign="top">0.30</td>
<td valign="top">1.182</td>
<td valign="top">2.228</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">14</td>
<td valign="top">12</td>
<td valign="top">0.35</td>
<td valign="top">0.27</td>
<td valign="top">1.294</td>
<td valign="top">2.179</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">20</td>
<td valign="top">18</td>
<td valign="top">0.35</td>
<td valign="top">0.22</td>
<td valign="top">1.585</td>
<td valign="top">2.101</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">30</td>
<td valign="top">28</td>
<td valign="top">0.35</td>
<td valign="top">0.18</td>
<td valign="top">1.977</td>
<td valign="top">2.048</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">40</td>
<td valign="top">38</td>
<td valign="top">0.35</td>
<td valign="top">0.15</td>
<td valign="top">2.303</td>
<td valign="top">2.024</td>
<td valign="top">reject</td>
</tr>
<tr>
<td valign="top">50</td>
<td valign="top">48</td>
<td valign="top">0.35</td>
<td valign="top">0.14</td>
<td valign="top">2.589</td>
<td valign="top">2.011</td>
<td valign="top">reject</td>
</tr>
<tr>
<td colspan="7" valign="top"> </td>
</tr>
<tr>
<td valign="top">5</td>
<td valign="top">3</td>
<td valign="top">0.17</td>
<td valign="top">0.57</td>
<td valign="top">0.299</td>
<td valign="top">3.182</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">10</td>
<td valign="top">8</td>
<td valign="top">0.17</td>
<td valign="top">0.35</td>
<td valign="top">0.488</td>
<td valign="top">2.306</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">12</td>
<td valign="top">10</td>
<td valign="top">0.17</td>
<td valign="top">0.31</td>
<td valign="top">0.546</td>
<td valign="top">2.228</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">14</td>
<td valign="top">12</td>
<td valign="top">0.17</td>
<td valign="top">0.28</td>
<td valign="top">0.598</td>
<td valign="top">2.179</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">20</td>
<td valign="top">18</td>
<td valign="top">0.17</td>
<td valign="top">0.23</td>
<td valign="top">0.732</td>
<td valign="top">2.101</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">30</td>
<td valign="top">28</td>
<td valign="top">0.17</td>
<td valign="top">0.19</td>
<td valign="top">0.913</td>
<td valign="top">2.048</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">40</td>
<td valign="top">38</td>
<td valign="top">0.17</td>
<td valign="top">0.16</td>
<td valign="top">1.063</td>
<td valign="top">2.024</td>
<td valign="top">don&#8217;t reject</td>
</tr>
<tr>
<td valign="top">50</td>
<td valign="top">48</td>
<td valign="top">0.17</td>
<td valign="top">0.14</td>
<td valign="top">1.195</td>
<td valign="top">2.011</td>
<td valign="top">don&#8217;t reject</td>
</tr>
</tbody>
</table>
</div>
<p>The table addresses at which size level an r value is high enough to be statistically significant.</p>
<p>For n = 14, all three r values (0.50, 0.35, and 0.17) are not statistically different from zero.</p>
<p>For n = 30, r = 0.50 is statistically different from zero while r = 0.35 and r = 0.17 are not.</p>
<p>Conversely, r = 0.50 is not statistically different from zero when n is equal or less than 14 while r = 0.35 is not different from zero when n is equal or less than 30.</p>
<p>Finally, r = 0.17 is not statistically different from zero at any of the sample sizes tested.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/ir-tutorials/'>IR Tutorials</a>, <a href='http://irthoughts.wordpress.com/category/seo-myths/'>SEO Myths</a>, <a href='http://irthoughts.wordpress.com/category/spam/'>Spam</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1499/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1499/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1499/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1499&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/10/18/on-correlation-coefficients-and-sample-size/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>On Inverted Indexes</title>
		<link>http://irthoughts.wordpress.com/2010/10/15/on-inverted-indexes/</link>
		<comments>http://irthoughts.wordpress.com/2010/10/15/on-inverted-indexes/#comments</comments>
		<pubDate>Fri, 15 Oct 2010 12:17:04 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Newsletters]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1495</guid>
		<description><![CDATA[The upcoming issue of IRW will be out soon. This will be Part 3 of the series on inverted index &#8230;<p><a href="http://irthoughts.wordpress.com/2010/10/15/on-inverted-indexes/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1495&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The upcoming issue of IRW will be out soon. This will be Part 3 of the series on inverted index architectures.</p>
<p>In Part 1 we covered different types of inverted indexes: Boolean, non-positional indexes.</p>
<p>In Part 2 we covered some techniques for fast indexing.</p>
<p>In Part 3  we will be covering more on positional inverted indexes, examples, and techniques for fast intersecting posting lists.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1495/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1495/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1495/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1495/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1495/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1495/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1495/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1495/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1495/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1495/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1495/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1495/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1495/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1495/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1495&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/10/15/on-inverted-indexes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Understanding Accuracy and Precision</title>
		<link>http://irthoughts.wordpress.com/2010/10/14/understanding-accuracy-and-precision/</link>
		<comments>http://irthoughts.wordpress.com/2010/10/14/understanding-accuracy-and-precision/#comments</comments>
		<pubDate>Thu, 14 Oct 2010 21:11:08 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[IR Tutorials]]></category>
		<category><![CDATA[SEO Myths]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1485</guid>
		<description><![CDATA[Students often have hard time understanding the difference between accuracy and precision, particularly when they read quack &#8220;science&#8221; &#8220;studies&#8221; when surfing  the Web. &#8230;<p><a href="http://irthoughts.wordpress.com/2010/10/14/understanding-accuracy-and-precision/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1485&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Students often have hard time understanding the difference between accuracy and precision, particularly when they read quack &#8220;science&#8221; &#8220;studies&#8221; when surfing  the Web. This post might help them to grasp these concepts.</p>
<p><strong>What is Accuracy?</strong></p>
<p>Accuracy is a term describing deviation of an experimental value from a target value. A target value is a value accepted as ‘true’. Constants, fundamental quantities, and theoretical values are considered ‘true values’. Thus, accuracy is proximity to a true value.</p>
<p>To illustrate, assume that a quantity x is measured. Its true value is x<sub>t</sub> =1.00 and we report an experimental value x<sub>e</sub> of 0.90. The absolute error of this observation is | x<sub>e</sub> – x<sub>t</sub> | = 0.10 and its relative error is (| x<sub>e</sub> – x<sub>t</sub> |/ x<sub>t</sub>)*100 = 10%. The accuracy is the ratio between the experimental to true value. When expressed as a percent, it is called relative accuracy. In this case, x<sub>e</sub>/ x<sub>t</sub> = 0.90/1.00. This corresponds to a 90% accuracy.</p>
<p><strong>What is Precision?</strong></p>
<p>Precision has been loosely defined as how reproducible experimental results are. However, modern convention makes a careful distinction between reproducibility (between-run precision) and repeatability (within-run precision). Furthermore according to Freiser (1992),</p>
<ul>
<li><strong>Repeatability</strong> is the closeness of agreement between individual experimental results obtained with the same method on identical test material or samples, under the same conditions (same operator, same apparatus, same laboratories, and same intervals of time).</li>
<li><strong>Reproducibility</strong> is the closeness of agreement between individual experimental results obtained with the same method on identical test material or samples, but under different conditions (different operator, different apparatus, different laboratories, and different intervals of time).</li>
</ul>
<p>Note that the source of dispersion and errors in the experimental results is different in each case. Therefore arbitrarily expressing the precision of results in terms of standard deviations without considering how the data was collected (within- or between-run precision) should be avoided.</p>
<p>Similarly, comparing any two standard deviations, or standard errors for that matter, without regard for how the data was collected (experimental conditions, number of degrees of freedom, different sampling times, etc) should also be avoided. In particular, estimates of precision or comparisons of precisions from data set that constantly change within sampling times is a futile exercise.</p>
<p>Last but not least, the precision of a measurement depends on the measuring scale used. For instance, saying “He is about 55 years old.” is less precise than saying “He is 660 months old.” or than saying “He is 20,075 days old”.</p>
<p>References</p>
<p>Freiser, H. (1992). Concept Calculations in Analytical Chemistry. Chapter 12, p. 203. CRC Press, Boca Raton.</p>
<p>Miller, J. C. &amp; Miller, J. N. (1984). Statistics for Analytical Chemistry. Chapter 1, p.19. Wiley, New York.</p>
<p>PS. I misplaced repeatability and reproducibility and fixed few more typos. Well and done. Thanks Dr. J. C. for pointing that out.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/ir-tutorials/'>IR Tutorials</a>, <a href='http://irthoughts.wordpress.com/category/seo-myths/'>SEO Myths</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1485/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1485/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1485/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1485/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1485/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1485/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1485/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1485/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1485/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1485/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1485/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1485/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1485/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1485/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1485&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/10/14/understanding-accuracy-and-precision/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>Introduction to Nemeth Uniform Braille System (NUBS)</title>
		<link>http://irthoughts.wordpress.com/2010/10/07/introduction-to-nemeth-uniform-braille-system-nubs/</link>
		<comments>http://irthoughts.wordpress.com/2010/10/07/introduction-to-nemeth-uniform-braille-system-nubs/#comments</comments>
		<pubDate>Thu, 07 Oct 2010 18:32:08 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Data Mining]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1473</guid>
		<description><![CDATA[I&#8217;m currently playing and trying to develop an encryption method using the Braille System and some kabalistic elements as mapping components. &#8230;<p><a href="http://irthoughts.wordpress.com/2010/10/07/introduction-to-nemeth-uniform-braille-system-nubs/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1473&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m currently playing and trying to develop an encryption method using the Braille System and some kabalistic elements as mapping components. There is something about the beauty of this system that has attracted me for a long time.</p>
<p>The implications of 6 and 8 dot-matrix notation systems to IR are many. After reading on the <a href="http://www.braille2000.com/brl2000/docs/NUBS803.pdf">Nemeth Uniform Braille System</a>, you might grasp the point.</p>
<p>Check also the unicode entities for braille at <a href="http://unicode.org/book/ch12.pdf">http://unicode.org/book/ch12.pdf</a></p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/data-mining/'>Data Mining</a>, <a href='http://irthoughts.wordpress.com/category/programming/'>Programming</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1473/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1473/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1473/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1473/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1473/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1473/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1473/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1473/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1473/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1473/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1473/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1473/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1473/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1473/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1473&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/10/07/introduction-to-nemeth-uniform-braille-system-nubs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>IRW:2010-9: Inverted Index Architectures Part Two</title>
		<link>http://irthoughts.wordpress.com/2010/09/30/irw2010-9-inverted-index-architectures-part-two/</link>
		<comments>http://irthoughts.wordpress.com/2010/09/30/irw2010-9-inverted-index-architectures-part-two/#comments</comments>
		<pubDate>Thu, 30 Sep 2010 13:38:58 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[IR Tools]]></category>
		<category><![CDATA[Newsletters]]></category>
		<category><![CDATA[Search Engines Architecture Course]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1469</guid>
		<description><![CDATA[The current issue of IRW is out! This is Part Two of the series on inverted index architectures, a 3-part &#8230;<p><a href="http://irthoughts.wordpress.com/2010/09/30/irw2010-9-inverted-index-architectures-part-two/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1469&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:center;"><img class="aligncenter" src="http://www.miislita.com/irw/inverted-index-2.gif" alt="inverted index" /></p>
<p>The current issue of IRW is out!</p>
<p>This is Part Two of the series on inverted index architectures, a 3-part series organized as follows:</p>
<p>Part One: Inverted Index Types<br />
Part Two: Fast Indexing Techniques<br />
Part Three: Fast Intersecting and Sharding</p>
<p>Tasks related with indexing, searching and processing are also discussed.</p>
<p>The QA section features short code liners in JavaScript aimed at helping readers understand what is tokenization and how is implemented.</p>
<p>Although not described in the newsletter, it is possible to construct these type of components with scripting languages. As a matter of fact, we have built an entire forward index and inverted index written entirely with JavaScript. Once computed, the inverted index can be written to memory. This work for small collections. For large collections, we read/write it to a text file using ActiveX, which is then posting-lists intersected in the usual way. However, for really large collections this is not effective and a database solution is recommended. The point to be made is that constructing a JavaScript-based search engine at the client and with real components, not a mere over-sized look-up &#8220;site search tool&#8221;, is possible. Since ActiveX is Microsoft&#8217;s land, it is not a universal solution. As a quick enterprise solution for short collections, it is ok, I guess.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/ir-tools/'>IR Tools</a>, <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>, <a href='http://irthoughts.wordpress.com/category/search-engines-architecture-course/'>Search Engines Architecture Course</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1469/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1469/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1469/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1469/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1469/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1469/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1469/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1469/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1469/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1469/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1469/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1469/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1469/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1469/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1469&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/09/30/irw2010-9-inverted-index-architectures-part-two/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>

		<media:content url="http://www.miislita.com/irw/inverted-index-2.gif" medium="image">
			<media:title type="html">inverted index</media:title>
		</media:content>
	</item>
		<item>
		<title>On the SEOMOZ LDA Fiasco</title>
		<link>http://irthoughts.wordpress.com/2010/09/17/on-the-seomoz-lda-fiasco/</link>
		<comments>http://irthoughts.wordpress.com/2010/09/17/on-the-seomoz-lda-fiasco/#comments</comments>
		<pubDate>Fri, 17 Sep 2010 18:20:21 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Quack Science]]></category>
		<category><![CDATA[SEO Myths]]></category>
		<category><![CDATA[Spam]]></category>
		<category><![CDATA[Statistics and Mathematics]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1463</guid>
		<description><![CDATA[ LDA and Google’s ranks well correlated? After the hilarious example of this guy with the SEOMOZ LDA tool (http://smackdown.blogsblogsblogs.com/2010/09/09/proof-that-the-new-seomoz-tools-is-at-least-half-accurate/ ) &#8230;<p><a href="http://irthoughts.wordpress.com/2010/09/17/on-the-seomoz-lda-fiasco/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1463&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p style="text-align:left;"> LDA and Google’s ranks well correlated?</p>
<p style="text-align:left;">After the hilarious example of this guy with the SEOMOZ LDA tool (<a href="http://smackdown.blogsblogsblogs.com/2010/09/09/proof-that-the-new-seomoz-tools-is-at-least-half-accurate/">http://smackdown.blogsblogsblogs.com/2010/09/09/proof-that-the-new-seomoz-tools-is-at-least-half-accurate/</a> ) I can only laugh out loud. Have anyone tried something like that?</p>
<p style="text-align:left;">Regarding the new fiasco with their LDA tool. Oh, no, another one… (<a href="http://www.seomoz.org/blog/lda-correlation-017-not-032">http://www.seomoz.org/blog/lda-correlation-017-not-032</a>) : What can I said? They sound pathetic and apologetic. The words overhyped, shitty, sloppy, flawed, etc are not enough to describe their &#8220;research work&#8221;.</p>
<p style="text-align:left;">What will happen now with those Mute Speakerphones that were misled? Those that listen to fools become one.</p>
<p style="text-align:left;">I don’t feel any sympathy for their 15 minutes of “honesty”. The damage was done already to naïve readers.</p>
<p style="text-align:left;">Also, note that this latest flaw was discovered by them. It was not the result of any peer review process from external referees, as those throwing a towel at them would like to believe.</p>
<p style="text-align:left;">As mentioned before, beware of SEOs statistical “studies” and their quack “science&#8221;  (<a href="http://irthoughts.wordpress.com/2010/04/23/beware-of-seo-statistical-studies/">http://irthoughts.wordpress.com/2010/04/23/beware-of-seo-statistical-studies/</a>  ), especially if coming from SEOMOZ.</p>
<p style="text-align:left;">Probably their snakeoil will make a comeback soon. (Oh, no. Again?)</p>
<p style="text-align:left;">If they still think they have a valid LDA implementation, why not announce it at David Blei’s Topic-Models werein a community of LDA experts will review  it  and compare it against other implementations?</p>
<p style="text-align:left;">Two things can happen:</p>
<p style="text-align:left;">(a) It will be reviewed.</p>
<p style="text-align:left;">(b) it will be ignored.</p>
<p style="text-align:left;">I “invite” them to do so.</p>
<p style="text-align:left;">Please, just don’t show up with your snakeoil, yellow shoes, your seo mom, paid cheerleaders, vested investors, overhyped claims, etc, etc.</p>
<p style="text-align:left;">PS.</p>
<p style="text-align:left;">More on their hype machine here: <a href="http://skitzzo.com/archives/seomoz-hype-machine.php">http://skitzzo.com/archives/seomoz-hype-machine.php</a></p>
<p style="text-align:left;"> It appears that even Danny Sullivan is not buying SEOmoz’s &#8220;research&#8221; on LDA. Accordingly, &#8220;He didn’t think it was the remarkable change that SEOmoz was making it out to be.&#8221; (<a href="http://outspokenmedia.com/internet-marketing-conferences/evening-forum-with-danny-sullivan/">http://outspokenmedia.com/internet-marketing-conferences/evening-forum-with-danny-sullivan/</a>). He even confronted and put into question their &#8220;highly correlated&#8221; numbers. And that was even before they recanted.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/quack-science/'>Quack Science</a>, <a href='http://irthoughts.wordpress.com/category/seo-myths/'>SEO Myths</a>, <a href='http://irthoughts.wordpress.com/category/spam/'>Spam</a>, <a href='http://irthoughts.wordpress.com/category/statistics-and-mathematics/'>Statistics and Mathematics</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1463/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1463/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1463/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1463&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/09/17/on-the-seomoz-lda-fiasco/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
		<item>
		<title>IRW:August: Inverted Index Architectures: Part One</title>
		<link>http://irthoughts.wordpress.com/2010/09/06/irwaugustinverted-index-architectures-part-one/</link>
		<comments>http://irthoughts.wordpress.com/2010/09/06/irwaugustinverted-index-architectures-part-one/#comments</comments>
		<pubDate>Mon, 06 Sep 2010 21:01:32 +0000</pubDate>
		<dc:creator>egarcia</dc:creator>
				<category><![CDATA[Newsletters]]></category>

		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=1458</guid>
		<description><![CDATA[The current issue of IRW is out: In this issue of IRW, we cover what some IRs consider the heart &#8230;<p><a href="http://irthoughts.wordpress.com/2010/09/06/irwaugustinverted-index-architectures-part-one/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1458&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>The current issue of IRW is out:</p>
<p><span style="font-size:x-small;">In this issue of IRW, we cover what some IRs consider the heart of a search engine: its inverted index. This is a 3-part subject, organized as follows: </span></p>
<p>Part One: Inverted Index Types</p>
<p>Part Two: Fast Indexing Techniques</p>
<p>Part Three: Fast Intersecting &amp; Sharding</p>
<p>Enjoy it.</p>
<br />Filed under: <a href='http://irthoughts.wordpress.com/category/newsletters/'>Newsletters</a>  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/irthoughts.wordpress.com/1458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/irthoughts.wordpress.com/1458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/irthoughts.wordpress.com/1458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/irthoughts.wordpress.com/1458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/irthoughts.wordpress.com/1458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/irthoughts.wordpress.com/1458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/irthoughts.wordpress.com/1458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/irthoughts.wordpress.com/1458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/irthoughts.wordpress.com/1458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/irthoughts.wordpress.com/1458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/irthoughts.wordpress.com/1458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/irthoughts.wordpress.com/1458/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/irthoughts.wordpress.com/1458/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/irthoughts.wordpress.com/1458/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=irthoughts.wordpress.com&amp;blog=1041983&amp;post=1458&amp;subd=irthoughts&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://irthoughts.wordpress.com/2010/09/06/irwaugustinverted-index-architectures-part-one/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/2d26d7051f681fdbb28379876c940a32?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">irthoughts</media:title>
		</media:content>
	</item>
	</channel>
</rss>
