We have improved our Minerazzi Hosts Miner (MHM) available at http://www.minerazzi.com/tools/mhm/mhm.php
The tool now provides alternate searches. We found that the discovered alternate searches some times retrieve additional resources.
Among other useful applications the tool simplifies the building of topic-specific collections and micro-indexes.
For instance, querying microsoft.com retrieves 6 results:
MHM then suggests several alternate searches. One of these is
Querying this new address (which at the time of writing resolves to 22.214.171.124), retrieves 74 new results:
MHM is a tool for discovering sites on same host or IP and for the discovery of sites affiliated to each other, or that might be your competitor. It is available at
It is great for discovering domain names branded with keywords or known name brands. Excellent also for discovering spam communities, domainers, and more.
You may also use it to build micro-indexes and topic-specific collections (as we do) or to chase down communities of personal interest to law and order agencies, recruiters, etc.
If you think that domain searches are just about domain marketing and whois stuff, think again.
Try MDM, our domain intelligence tool, with queries of the form w1.w2… where w1 and w2 are words as described at the end of this link:
This is a follow up on the Beauty of XOR and XNOR searches post, describing possible applications of these search modes to Information Retrieval, Search Marketing, and Web Mining. The post is a snippet taken from http://www.minerazzi.com/help/xor-xnor.php
An IR researcher can test the performance of an LSI algorithm with a sample of documents retrieved through XOR and XNOR searches. Said sample should be rich in co-occurrence cases. Using a similar procedure, search marketers or Web intelligence specialists can identify sets of documents that emphasize keywords somehow related through different co-occurrence paths.
An interesting application consists in extracting all the unique terms (or just the high frequency ones) from a text source and constructing an XOR query with these. We may refer to this as XORing a text source. This should help one identify a network of co-occurrence paths over a collection and which documents might be relevant to specific combination of terms from the original source.
The text source can be a title, description, abstract, or paragraph of a document, or even an entire document. However, XORing a large document might be computer-intensive.
A similar exercise can be done by XNORing a text source. In both cases, the resultant output can be used to identify prospective competitors; i.e., documents relevant to similar concepts or belonging to companies within the same business space.
We are currently testing the XOR and XNOR search modes as a query disambiguation strategy.
PS. Today, 1-9-2014, we added new material that discusses these search modes for disambiguation and clustering. :)
More faster than saying “Look mom: No mac address needed!”
If during a browser-specific session a user queries a search engine and accesses subscription-based web services provided by the same search engine (web-based email accounts, gadgets, apps…), the IP used for searching can be associated to his web service credentials (username, password…). Therefore, it is possible for a search engine to guess the identity of that user and know what the user is searching for, when, and how. With referrer and click-through data, it is also possible for that search engine to know where said user came from and where did he/she go.
In most cases, geolocation data are much more accurate for devices with GPS, like smart phones, and HTML5-compatible browsers. In general, users privacy becomes increasingly compromised as more web services, apps, and device features are enabled. On the Web, most free stuffs are not really free, but involve a privacy cost; otherwise, they won’t be free.
Of course, if during a session a user lends the device to another searcher, the search engine might not be able to guess the identity of that user.
Vulnerability scans via search engines. Includes Google scans and Bing reflections.
PlaceRaider has been called a government spyware for smartphones. Expect copycats soon. Download the PlaceRaider article.
The abstract says:
“As smartphones become more pervasive, they are increasingly targeted by malware. At the same time, each new generation of smartphone features increasingly powerful onboard sensor suites. A new strain of `sensor malware’ has been developing that leverages these sensors to steal information from the physical environment | e.g., researchers have recently demonstrated how malware can `listen’ for spoken credit card numbers through the microphone, or `feel’ keystroke vibrations using the accelerometer. Yet the possibilities of what malware can `see’ through a camera have been understudied.”
“This paper introduces a novel `visual malware’ called PlaceRaider, which allows remote attackers to engage in remote reconnaissance and what we call \virtual theft.” Through completely opportunistic use of the phone’s camera and other sensors, PlaceRaider constructs rich, three dimensional models of indoor environments. Remote burglars can thus `download’ the physical space, study the environment carefully, and steal virtual objects from the environment (such as nancial documents, information on computer monitors, and personally identi able information). Through two human subject studies we demonstrate the e ectiveness of using mobile devices as powerful surveillance and virtual theft platforms, and we suggest several possible defenses against visual malware.”
We keep improving the Minerazzi site (http://www.minerazzi.com). We moved all pages to a php format. In addition, here are recent changelogs for the Web Crawler (http://www.minerazzi.com/labs/crawlinker.php):
07-05-11: Email address extraction, deduplication, and sorting capabilities added.
07-04-11: Design and copy changes.
07-03-11: Navigation menu restored and bug fixed.
07-03-11: Navigation menu removed to test bug.
07-02-11: Top-bottom quick navigation menu added.
07-02-11: Day/Time Stamp, Reverse DNS, and IPv4 List capabilities added.
07-02-11: Integration to Whois Tool.
The Whois Database Retriever (http://www.minerazzi.com/labs/whois.php) now features suffix/prefix stripping capabilities. This means that users only need to enter a candidate domain name without any alias or extension and the tool scans multiple registrar databases. We expect to add some additional features to this time-saving application.
In the meantime, we keep beta testing the engine. Our staff of ‘miners’ are doing just a great job.
Thanks to the Internet, hackers are -or soon will be- invading your cell phones, car, and TV.
The Energizer DUO Trojan: What You Need to Know, reports that the Energizer USB charger has been infected with a nasty Trojan.
Ford Motor Rolls Out New Security Features To Prevent Car-Hacking, reports that Ford is taking steps to prevent hackers from literally car-jacking your vehicle.
Google, DISH Network in Set-top Tests, reports that Google is moving to provide search services through your TV. With TV soon hitting the market with Internet Widgets and similar technologies, soon your TV sessions will be subject to hacking.
So, very soon: hackers, spammers, and marketers in your car, phone, and TV.
To secure a job, get certified in Internet Security related technologies. Or how about, Multimedia Search Marketing (MSM)? That’s a new great acronym to think about.