Well, here is a light example of a focused RSS news parser: http://www.minerazzi.com/prbusca/
We are testing a random rss parser that works as an unfocused news aggregator. We are already working on a second version to make it focused, but it will take quite a bit.
The goal is to develop a tool that not just will deliver topic-specific RSS feeds to end users, but one capable of data mining said RSS news readers.
In the meantime, you can try the unfocused version by visiting any of our current miners, available at http://www.minerazzi.com.
We have reindexed and expanded the grants and scholarships miner available at http://www.minerazzi.com/grants
As usual, users can scrape search results to find zillion of urls, links, images, scripts, etc, to build curated collections or gather intelligence, without spending any budget in web scrapping services.
More educational miners will be available soon.
On Cosine Similarity
Cosine similarity is commonly used in data mining and information retrieval as a measure of the resemblance between data sets; i.e. how similar or alike these are. It is an important concept used in Vector Space Theory and affine models.
While there are many tools and tutorials on the subject out there, quite often what is missed from these is a clear explanation of the underlying meaning and nature of the variables involved.
Did you know that centering data sets by subtracting the corresponding variable means can and will impact the angle between them, and therefore, the corresponding cosine similarity? Did you know that said change can be used to assess whether the variables are orthogonal, uncorrelated, or both/neither? Do you know what a cosine similarity of zero actually mean?
All these and similar questions are addressed with our cosine similarity tool and companion tutorial. Access them now at
To use the tool simply enter two data sets and select how these are delimited. Then check whether you want to compute their cosine similarity by using them as given (raw mode) or by subtracting their mean (centered mode). To interpret the results from either mode, read the companion tutorial.
JStatMiner is a new miner built with Minerazzi and available at http://www.minerazzi.com/jstatminer/
Use it to mine all top statistical journals from around the World!
Whether you are a researchers, librarian, teacher, or student, now you can have an easy access to a huge collection of popular and hard-to-find statistical journals.
This tool transforms a data set into z-scores and one/two-tail percentiles.
The tool also computes central tendency and dispersion measures like means, medians, standard deviations, variances, coefficients of variation, and ranges.
Available now at http://www.minerazzi.com/tools
Our old Color Miner tool is now available at
This is a tool that generates fractalettes.
We define a fractalette as a color palette within a color palette. These types of fractal-like arrays allows you to investigate color-color, color-space, and space-space relationships.
To use it, just submit an absolute URL, complete with its http(s) scheme.
Enjoy it. :)
A short tutorial on the Levenshtein Distance is available now at
Did you know that Levenshtein Distance is at the heart of sequence analysis and text mining-based technologies? It is so simple, elegant, and relevant to many research fields.
The Levenshtein Distance Calculator is back. This tool was removed from our old site, but now is available at
This is a visual and interactive tool great for sequence analysis, text mining, and teaching. A tutorial listing practical applications will soon follow.
Open Source Projects is a new miner available at http://www.minerazzi.com/osp. It allows you to find or submit all kind of open source projects. Access open source community resources. Search by software, hardware, or project name.
Looking for open source projects relevant to Apache, Linux, or Windows? Need to be more specific in your search (e.g. search for Weka, JQuery, Aptana, JNode, Ubuntu, Mozilla, etc…)? Want to build your own open source collections? If so, this miner is for you.