, , ,

Effectively immediately Minerazzi (http://www.minerazzi.com) allows users to recursively recrawl search results.

Why is recrawling so important?

The purpose of allowing users to recrawl URLs is to expose them to new content, to involve them in learning through discovery. To turn their searches into a mining activity. This makes more sense than limiting their search experience to inspecting zillion of cached records from a search engine index. The problem with the latter is that frequently those records are either outdated or irrelevant, not to mention that in that scenario the users are simply passive expectators.

Allowing users to recrawl search results has many advantages and possibilities. For instance, users can use the discovered URLs to build curated collections, self-guide investigative work, or gather link intelligence from sites, directories, blogs, forums, or social networks. In general, recrawling allows users to discover hidden paths to fresh, new, or rich content.

Considering that the total number of primary and secondary URLs defines the reach of a microindex, in theory recrawling should result into an endless reach.

At this time, we do not recrawl .css and .pdf files, but we recrawl the most common file formats (.php, .asp, .aspx, .html, .htm, .js, etc). However, if the content of a file is dynamic, obfuscated, or poorly coded more likely it will return garbage or nothing.

Having said that, we invite you to try the recrawling experience with our public miners.