MHM is a tool for discovering sites on same host or IP and for the discovery of sites affiliated to each other, or that might be your competitor. It is available at
It is great for discovering domain names branded with keywords or known name brands. Excellent also for discovering spam communities, domainers, and more.
You may also use it to build micro-indexes and topic-specific collections (as we do) or to chase down communities of personal interest to law and order agencies, recruiters, etc.
Learn about the power of X Searches (short for XOR and XNOR searches) for keyword discovery, disambiguation, clustering, information retrieval, and data mining in general.
Stay ahead here, with a new search experience:
Here is a nice article about the risks of misusing big data
Here are my comments on the topic:
1. Most traditional statistically significant analyzes were meant to be used with small data sets, not with big data, unless stratification of the big data is possible. In general, if a large enough data set is used, any t-test study of very small correlation coefficients can be forced to become statistically significant, and so misleading. Said effect is a by-product artifact of the equations involved.
2. Although no data set is exactly normally distributed, most statistical analyses require that the data be approximately normally distributed for their findings to be valid. This is one of the first things a peer reviewer of statistical articles will look for. Methods and techniques for transforming data to become normally distributed prior to any analysis do exist, although some data sets might not be transformable and forcing them to adopt normality can be contraindicated.
3. Avoid arithmetically adding and averaging correlation coefficients, standard deviations, slopes, cosine similarity measures, and dissimilar ratios in general. They are simply not additive, regardless of what some outdated meta-analysis articles say or what you hear from groupie search marketers (“Why regurgitate in blogs what you don’t understand?”). 🙂
Our popular tool, The Web Crawler, is back! This new iteration of the tool is a lot more faster because is based on a different strategy: extractions of HREF sets and then refinement of these to get URLs that are qualified for status checks. So the tool also works as a link checker.
Another advantage of the above strategy is this:
We have just published this short article, based on The Color Miner tool:
Fractalettes: A Fractal Design Strategy to Color Mining and Learning through Discovery – Based on Fractal Geometry, fractalettes are color palettes within color palettes, where each cell contains color space information and relationships. These types of architectures engage end-users in data mining, critical thinking, and learning through discovery.
Indeed, the AZZOO measure outperforms all conventional measures in the application of IRIS biometrics and handwritten character recognition.
At least that’s what is claimed.
The hilarious picture above shows how some SEOs look when playing to be scientists. This often occurs when interpreting big data.
Few specific scenarios:
1. Applying the statistical theory of small samples to extremely large samples, like …
2. …using large amount of data to force very small correlation coefficients to become statistically significant.
3. Trying to arithmetically average ratios (like correlation coefficients, standard deviations, slopes, and cosine similarities).
4. Mistaking Cauchy Distributions for Normal Distributions.
5. Adding together intensive properties.
Fortunately, I know of good folks that are doing a great job at educating their search marketing peers (Mike Grehan, Bruce Clay, Danny Sullivan, etc) without playing to be scientists.
Correlation coefficients, coefficients of variations, standard deviations, slopes, tangents, cosines, densities, temperatures, dissimilar ratios, and intensive properties in general are not additive. Therefore, arithmetic averages cannot be computed out of any of these.
Still, from time to time some “experts” and pseudo “scientists” do that.
Want to know why this is not mathematically and statistically possible? This is the subject of a paper I wrote and that is about to be published in Communications in Statistics – Theory and Methods (by Taylor & Francis).
Incidentally, I will provide a preview of the topic to the search marketing community. Thanks to my dear friend, Mike Grehan, this will be the topic I’ll be speaking about at the March, 2012 SES, NY.