<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Predating Link Models and PageRank</title>
	<atom:link href="http://irthoughts.wordpress.com/2008/02/14/predating-link-models-and-pagerank/feed/" rel="self" type="application/rss+xml" />
	<link>http://irthoughts.wordpress.com/2008/02/14/predating-link-models-and-pagerank/</link>
	<description>News, Papers, and Theses on Information Retrieval, Data Mining, and Search Engine Technologies.</description>
	<pubDate>Fri, 25 Jul 2008 15:29:27 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
		<item>
		<title>By: E. Garcia</title>
		<link>http://irthoughts.wordpress.com/2008/02/14/predating-link-models-and-pagerank/#comment-539</link>
		<dc:creator>E. Garcia</dc:creator>
		<pubDate>Fri, 15 Feb 2008 12:55:20 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=168#comment-539</guid>
		<description>Thanks, snunes, for stopping by.

&lt;a href="http://research.microsoft.com/~najork/sigir2007.pdf?0sr=p" rel="nofollow"&gt;Najork's paper at 2007 SIGIR&lt;/a&gt; suggests that link-spammers  targeting PageRank may play a role in their finding. 

&lt;blockquote&gt;
The fact that features based on outgoing links underperform those based on incoming links matches our expectations; if anything, it is mildly surprising that outgoing links provide a useful signal for ranking at all. On the other hand, the fact that in-degree features outperform PageRank under all measures is quite surprising. A possible explanation is that link-spammers have been targeting the published PageRank algorithm for many years, and that this has led to anomalies in the web graph that affect PageRank, but not other link-based features that explore only a distance-1 neighborhood of the result set. Likewise, it is surprising that simple query-independent features such as in-degree, which might estimate global quality but cannot capture relevance to a query, would outperform query-dependent features such as HITS authority scores.&lt;/blockquote&gt;

I look forward to AIRWeb 2008 to peer review manuscripts along those lines or at least read what others conducting link-based spam research are going to submit.

On the other hand, Najork et. al. used breadth-first crawling to collect links and phantom nodes to avoid the sink problem.

It would be interesting to see how their results compare if these experimental conditions are changed.</description>
		<content:encoded><![CDATA[<p>Thanks, snunes, for stopping by.</p>
<p><a href="http://research.microsoft.com/~najork/sigir2007.pdf?0sr=p" rel="nofollow">Najork&#8217;s paper at 2007 SIGIR</a> suggests that link-spammers  targeting PageRank may play a role in their finding. </p>
<blockquote><p>
The fact that features based on outgoing links underperform those based on incoming links matches our expectations; if anything, it is mildly surprising that outgoing links provide a useful signal for ranking at all. On the other hand, the fact that in-degree features outperform PageRank under all measures is quite surprising. A possible explanation is that link-spammers have been targeting the published PageRank algorithm for many years, and that this has led to anomalies in the web graph that affect PageRank, but not other link-based features that explore only a distance-1 neighborhood of the result set. Likewise, it is surprising that simple query-independent features such as in-degree, which might estimate global quality but cannot capture relevance to a query, would outperform query-dependent features such as HITS authority scores.</p></blockquote>
<p>I look forward to AIRWeb 2008 to peer review manuscripts along those lines or at least read what others conducting link-based spam research are going to submit.</p>
<p>On the other hand, Najork et. al. used breadth-first crawling to collect links and phantom nodes to avoid the sink problem.</p>
<p>It would be interesting to see how their results compare if these experimental conditions are changed.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: snunes</title>
		<link>http://irthoughts.wordpress.com/2008/02/14/predating-link-models-and-pagerank/#comment-538</link>
		<dc:creator>snunes</dc:creator>
		<pubDate>Fri, 15 Feb 2008 11:42:56 +0000</pubDate>
		<guid isPermaLink="false">http://irthoughts.wordpress.com/?p=168#comment-538</guid>
		<description>What do you think of the findings made by Najork et al. (2007) that conclude that HITS is not very different from a simple inlink count? Web-based features are theoretically very robust but it seems that in practice the results are not as good as expected. Content-based features are much more interesting.

"We were quite surprised to find that HITS, a query-dependent feature, is about as effective as web page in-degree, the most simpleminded query-independent link-based feature."

in HITS on the Web: How does it Compare?
http://research.microsoft.com/research/pubs/view.aspx?0rc=p&#38;type=Publication&#38;id=1734

Great blog! Thanks for your posts.</description>
		<content:encoded><![CDATA[<p>What do you think of the findings made by Najork et al. (2007) that conclude that HITS is not very different from a simple inlink count? Web-based features are theoretically very robust but it seems that in practice the results are not as good as expected. Content-based features are much more interesting.</p>
<p>&#8220;We were quite surprised to find that HITS, a query-dependent feature, is about as effective as web page in-degree, the most simpleminded query-independent link-based feature.&#8221;</p>
<p>in HITS on the Web: How does it Compare?<br />
<a href="http://research.microsoft.com/research/pubs/view.aspx?0rc=p&amp;type=Publication&amp;id=1734" rel="nofollow">http://research.microsoft.com/research/pubs/view.aspx?0rc=p&amp;type=Publication&amp;id=1734</a></p>
<p>Great blog! Thanks for your posts.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
