As you might know by now, Minerazzi is a platform for building ‘miners’. We define a miner as a topic-specific search engine that allows end-users to search, index, mine, and recrawl online resources. The goal is to turn web searchers into data miners as a natural evolution of the traditional concept of searching.

Minerazzi proposes a different search paradigm. Each miner built with the platform allows users to be at the center of the search experience, as participants rather than mere expectators.

For instance, consider one of the main features of Minerazzi: Recrawling.

Recrawling is a guided search activity, a discovery learning technique that allows users to quickly discover new resources associated to a given search result.

To illustrate, suppose you are using the Data Structures and Algorithms Collection (DSAC) miner (http://www.minerazzi.com/dsac). At the time of writing, searching for something as specific as [ bloom filter ] returns two records with the following URLs:

1. http://xlinux.nist.gov/dads/HTML/bloomFilter.html
2. http://en.wikipedia.org/wiki/Bloom_filter

Recrawling the first URL retrieves 17 secondary URLs: 10 External (58.82%) and 7 Internal (41.18%), for a 1.43 ratio.

Recrawling the second URL retrieves 308 secondary URLs: 159 External (51.62%) and 149 Internal (48.38%), for a 1.07 ratio.

Secondary URLs are sorted by type and alphabetically. These are URLs that are somehow related to a resource content (e.g., front-end) and design (e.g., back-end).

Clicking on a secondary URL allows users to visit said resource whereas clipboard-copying an entire list of URLs can be done by clicking the top-right { S } link and pressing the Ctrl + C keys. After that the user can export the list from the clipboard to a text file or spreadsheet.

We are now testing the consumption of URLs from third-party search engine result pages. The goal is to allow users to quickly exhaust results from those third parties.

There is no way back to merely searching.

Advertisements