The current issue of IRW features Web Scraping as a vehicle for conducting Web Mining.

As mentioned in the newsletter, there are so many things that can be done with scrapers. For instance, the below is a comparative of the number of script tags (<script …>…</script>) and link tags  (<link …./ >) declared in several index pages and extracted with two scrapers mentioned in the IRW article: the Script and Link Tag Scrapers. As expected, pages with a lot of content are prone to have  more scripts.

Search Engines Script Tags Link Tags * 15 2 12 1 10 0 4 0 1 0


Socially-oriented Sites Script Tags Link Tags 38 5 9 3 7 13 5 4 ** 1 6


* At the time of the analysis, redirects to the alias, but same results are obtained.

** and return same results.

On the other hand, Web Scraping can unveil potential Web Vulnerabilites in an architecture, so there is a positive side to the story.  

In the good hands, scrapers can do great things. In the wrong ones, they can be a nightmare.

Unfortunately, hackers know well that scrapers can be embedded into malware and get their hands on source codes. Ask victims of such scrapers like Google and other companies (

Besides legal issues and an unfriendly landscape (censorship), it appears they got tired of chinese hackers picking on them so they are pulling out of China  -or treatening to do so.’s_Most_Popular

Beaten in their own game: brain power.

About these ads