This morning we added an entire new data set to the Puerto Rico Collection (http://www.minerazzi.com/prbusca) . The set added was done by indexing the https://data.pr.gov portal which itself is a hub to many data sets. To verify this, search for [ puerto rico data portal ]. Economists, Statisticians, and local researchers are familiar with that portal.
This is an example of using Minerazzi to build customized collections from a hub site. One only need to use the Search Inside feature from a search result and start discovering a whole new world of data mining possibilities.
BTW, US government versions are in the pipeline. As many agencies are too big, these will be agency-specific.
Sure, you can do a site search in Google by appending the site: operator to https://data.pr.gov and submit that as a query. However, you will not be able to data mine the results and organize curated collections while searching, nor you will be able to search inside a Google result. To me, this is an interesting search paradigm. Whether it might be a game changer, time will tell. As is now, I’m just happy with what we have accomplished with the platform.
In addition, when using Minerazzi’s Search Inside crawler, the tool will pull current versions of documents as they actually exists in remote servers. That is, these are not precached records from an index.
On other matters, as an example of mining authority hubs with Minerazzi, this coming week we will be launching the University Sites Collection (USC). Librarians, students, and teachers will be able to build curated university resource collections to their heart needs using this new Minerazzi miner. A worldwide universities version is in the pipeline.
Overall, with Minerazzi, it is really simple to extract and build collections on practically any subject, business, or knowledge domain.