A new miner is available at Minerazzi.com: The Information Retrieval Collection (http://www.minerazzi.com/irc). What you can do with it? Use this … More
Month: December 2014
Building topic-specific collections, the easy way
We have improved the Minerazzi platform (http://www.minerazzi.com) by adding new features. That includes an internal filter for deduplicating urls, which … More
Lessons learned from building an IR collection
We are currently building the Information Retrieval Collection (IRC) with the Minerazzi platform. URLs pointing to resources like articles from … More
Improving the Data Structures and Algorithms Collection
We have almost doubled the index of the Data Structures and Algorithms (DSAC) miner. In addition, we are moving to … More
Unveiling Link Honey Pots with Minerazzi
In Web Spam Taxonomy, Gyongyi and Garcia-Molina, describe several web spam techniques, one being honey pots. They describe these as … More
Minerazzi: Allowing Users to Recrawl Search Results
Effectively immediately Minerazzi (http://www.minerazzi.com) allows users to recursively recrawl search results. Why is recrawling so important? The purpose of allowing … More