The Chemistry Organizations Miner (http://www.minerazzi.com/chemorgs/) is our newest productivity-driven search engine. This micro-index helps you find worldwide chemistry organizations, societies, … More
Category: Crawlers
Document tree flattening as an exploration technique for data mining .xml files (sitemaps, feeds, inventories, raw data, etc)
Two of our tools, Web Feed Flattener and Feed URLs Extractor, were updated and now accept files with the .xml … More
Fractals Miner
Fractals Miner: Fractal Patterns and Growth Phenomena – Theory, Experiments, & more. Available now at http://minerazzi.com/fractals/ Research the fractal geometry … More
Python Search Engine
Here is a python-based search engine with an implementation inspired on one of our papers at the old Mi Islita.com … More
Domain Extractor Tool
The Domain Extractor is a new Minerazzi tool, available now at http://www.minerazzi.com/tools/domain/extractor.php The tool extracts domains and subdomains from up … More
Detecting Bogus HTTP Status Codes
We have added a new algorithm to the MUST tool, available at http://www.minerazzi.com/tools/must/must.php The tool now automatically detects bogus http … More
An In-Context Topic Crawler
I have completed a breadth-first in-context crawler that traverses the Web, recursively discovering links in two modes: 1. continuous mode: … More