Some services apparently used by Wikipedia for mapping IPs to other resources. Great for business intelligence, tracking users, spammers, marketing research,…
We used them to test yahoo.com’s IP (220.127.116.11).
Replace any instances of 18.104.22.168 with the one you want to test.
This is a new Minerazzi.com miner, available at
This miner allows users to find web spam and anti-spam resources, including research papers and conferences on adversarial IR strategies. Users can search by topic, author, doi, or event.
We updated our Keywords Spam Detector,
and added new content that editors, writers, SEOs, and others might find useful. I’m reproducing below some of the new material added.
Recommendations for Writing Titles
- Search Engines
Search engines might process entire titles, but tend to display in their search results about less than 70 characters. So you may want to limit web page titles to about this mark, like between 60 to 65 characters.
- Academic Journals
Some editorial guidelines, like JAMA, limit the length of titles to 150 characters for reports of research and other major articles and 100 characters for Editorials, Viewpoints, Commentaries, and Letters. (JAMA, 2016).
- Words Usage in Titles
The average character length of a word in English, Spanish, and similar languages is about six. Thus on average a 60-character title amounts to about 10 words, regardless of if these are unique terms. This is just a reference mark as text estimates can be influenced by other variables. For instance, text averages can be topic-sensitive and influenced by their syntactic structure (Busch-Lauer, 2000).
How short is too short?
- The length of a title is a relative concept. By current standards, a 60-character title, which amounts to about 10 words, is considered fair enough for search engines, very short for most academic journals, but too long for songs. Indeed. A recent study found that song titles with one or a few words are on the rise and preferred (Kopf, 2016). However, these types of titles are not informative enough for search engines and academic journals.
- Generally speaking, articles with short titles are more attractive to readers than those with longer titles because the latter are frequently perceived as complex, confusing, or boring. If readers don’t find attractive a title or cannot understand it, there is a little chance that they will read or cite its abstract or the full paper (Deng, 2015; Chawla, 2015).
- A 2015 study confirmed that academic papers with short titles receive more citations per paper, being more attractive to readers than articles with longer titles (Letchford, A., Moat, H. S., and Preis, T., 2015).
- A 2012 study found that short-titled articles have higher viewing and citation rates than those with longer titles. Similarly, articles with results-describing titles are cited more often than those with methods-describing titles (Paiva, Nogueira Lima, & Ribeiro Paiva, 2012). The same study found that titles containing a question mark, containing a reference to a specific geographical region, and that used a colon or a hyphen were associated with a lower number of citations.
Visit the Keywords Spam Detector page to learn more about the topic or to follow the referenced studies. It might at least help you to investigate why artists like Rihanna and Justin Bieber prefer one-word song titles.
This is a new tool, available at
Term repetition abuse is considered an adversarial IR practice known as keyword spam. See list of practices we fought at AIRWEB at http://airweb.cse.lehigh.edu/2007/cfp.html
This tool can help you to write better titles, abstracts, descriptions, paragraphs, or full text by allowing you to detect and fix over-repeated terms. The tool uses a proprietary algorithm for detecting frequency-based spam.
Once detected, over-repeated terms can be edited by either reducing their term frequency or diluting the input by adding unique terms not present in the original text.
MHM is a tool for discovering sites on same host or IP and for the discovery of sites affiliated to each other, or that might be your competitor. It is available at
It is great for discovering domain names branded with keywords or known name brands. Excellent also for discovering spam communities, domainers, and more.
You may also use it to build micro-indexes and topic-specific collections (as we do) or to chase down communities of personal interest to law and order agencies, recruiters, etc.
5-24-2016 update: This tool retrieves email addresses from all over the Web, is now available at
- Visit a public mailing list archive like the one at the Robotstxt.org site which lists email-rich URLs. Crawl those URLs with our tool. You should be able to grab several hundreds of email addresses. If a mailing list is too big, it will freeze your browser. In that case, you may want to save it to your local host, break it into several files and crawl them one at-a-time.
- Another alternative is to visit public mailing list archives like https://openoffice.apache.org/mailing-lists.html, narrow down your selections and crawl a desired URL with our tool.
A handy resource:
1. If you are using Spybot Search & Destroy on Windows32 systems, enter
2. Spybot Search & Destroy will list all blocked connections; i.e. those that are redirected to the localhost.
3. You can manually add/delete entries.
4. It is a another layer of security!!
For details, check:
Yesterday we had a brainstorming session with our programmers on google hacking. It is soooooo easy to grab php codes, passwords, databases from all over the Web, thanks to sloppy coders. For instance, do a search for
or check the list at http://www.thenetworkadministrator.com/googlesearches.htm These types of searches will spit out directory trees.
There are many “smart cookies” posting derivatives of these lists all over the Web.
And how about typos?
Try filetype command searches with extra characters in extensions like
Servers will spit out entire php codes.
The great offenders are large sites like those belonging to .edu, .gov, .org, not to mention large .com and .net sites.
Ho, Ho, Ho, Merry Christmas, Santa.