A nice collection of Public Record Databases is now available at http://www.minerazzi.com/pubdbs/
During the course of building a financial miner, we found sites committing a lot of crimes against meta data. The most recent are courtesy of the SEC.gov and Investor.gov sites. Perhaps the result bad copy rewritten by software or humans?
These are great sites for finding financial and business information, but some of their pages contain poorly written meta tag data that make indexers go ga-ga gu-gu.
To illustrate, check the meta description tags of the pages at the following URLs:
Links and CSS style instructions declared as meta description data? Great!
We have improved our Minerazzi Hosts Miner (MHM) available at http://www.minerazzi.com/tools/mhm/mhm.php
The tool now provides alternate searches. We found that the discovered alternate searches some times retrieve additional resources.
Among other useful applications the tool simplifies the building of topic-specific collections and micro-indexes.
For instance, querying microsoft.com retrieves 6 results:
MHM then suggests several alternate searches. One of these is
Querying this new address (which at the time of writing resolves to 126.96.36.199), retrieves 74 new results:
That’s the beta test phase we are in at Minerazzi (http://www.minerazzi.com). This time we are testing some nice tools.
Stay ahead here, with a new search experience:
Puerto Rico Daily News & Image Searches. Driving traffic to Puerto Rico’s best media sites. The fastest way to find news and images relevant to Puerto Rico. Coming soon to http://www.miislita.com
I think this can be applied to many knowledge domains without making the same mistakes from similar services across the Web. For now, baby steps.
A nice update, indeed: http://www.miislita.com/web-crawler/web-crawler.php
Our popular tool, The Web Crawler, is back! This new iteration of the tool is a lot more faster because is based on a different strategy: extractions of HREF sets and then refinement of these to get URLs that are qualified for status checks. So the tool also works as a link checker.
Another advantage of the above strategy is this:
We have added to our email crawler
the following features:
1. A User Tracking Session (just find the link and click on it) to view current user data.
2. Search for user email addresses in the top search engines and social networks
Give it a try.
We plan to add the tracking session feature to all our pages. This feature is now visible to gives you an idea of how it works, but can be invisible to users. Geo and search data can be added in a snap.
Why pay monthly fees when you can have your own tracking service, customized to your needs?