Tags

, , , , ,

The Domain Extractor is a new Minerazzi tool, available now at

http://www.minerazzi.com/tools/domain/extractor.php

The tool extracts domains and subdomains from up to 10,000 URLs at once. Larger sets are resized to conform to this limit. This is done to avoid browser crashes.

From the input set, the Domain Extractor returns a set consisting of domains and subdomains. The results are deduplicated and sorted in alphabetical order

The tool comes handy when one wants to extract chunks consisting of 10,000 domains from databases or other sources.

It can be conveniently used in combinations of other of our tools, like

The FQU Bot
http://www.minerazzi.com/tools/fqu/fqu.php

and

MUST
http://www.minerazzi.com/tools/must/must.php

Simple, light, but a powerful toy/tool: The Domain Extractor can be used as part of a crawling strategy: Once domains and subdomains are extracted, the chunks of URLs can be sent to a queue for crawlers to revisit them.

Another application consists in querying a search engine, extract URLs from its results page and then process them through the tool.

There might be other applications, but the above can give you an idea of how handy the tool can be.

Advertisements