The BioInformatics Databases Miner


, , , ,

This is a new miner available at Use it to find or build genome, sequence, proteomics, RNA, pathway, metabolic, microarray, exosomal, PCR, phenotype, taxonomic, carbohydrate, metabolimic, drug design, and imaging collections. Search by topic or database.

Sample queries can be

[ cancer research ], [ rna and dna ], [ bioinformatics ], or similar.

If you are into bioinformatics databases, this miner is for you.

Introducing the Hydrocarbons Parser


, , , ,

This tool extracts chemical information by parsing a set of hydrocarbon formulas of the form CxHy.

The information collected is then used to validate the chemical formulas, compute numbers and types of chemical bonds, discriminate between functional isomers, and predict normal boiling point temperatures.

The tool works without consulting molecular orbital theory or a chemical database. That’s right. Just enter a set of formulas!

The tool is available at

Chemical engineers, teachers, and students might find this tool useful.

Building a Hydrocarbons Parser


, , , , ,

Text mining is not limited to computer sciences. It can be applied to chemistry sciences.

Indeed, parsing chemical formulas is not any different from parsing words and phrases, with the added advantage that we can find more practical applications with the former than with the latter.

To illustrate, we are currently building a Hydrocarbons Parser. This is a tool that parses an input chemical formula and predicts the number and types of chemical bonds, normal boiling point temperatures, and few other things. The predicted data can then be compare with experimental results.

Chemical engineers as chemists, chemistry students and their teachers may find this tool useful. This tool is part of an ongoing effort of finding more applications to text mining in other disciplines and fields.

Building a Facebook Jobs Offers Collection


, ,

We have expanded the Jobs Miner at so now you can easily build curated collections about jobs from a specific company. This can be done thanks to the recrawling power of Minerazzi.

For instance search for [ facebook jobs ] and click the “links” tool, located below a search result snippet relevant to facebook. This allows you to crawl the links from that url. You will be presented with a new list of url links, each being crawlable with the same tool. Export results as you usually would (by copy/pasting).

Try with other companies. If there is one you are interested in that has a jobs section, but is not listed, submit its url. This is also a nice way of promoting jobs pages from specific companies.

To gather intelligence from a url, use the tools that come with the miner. – Puerto Ricos’ Search Engine


, , ,

As you may know, is now a search engine data miner relevant to Puerto Rico. Its legacy content is being moved to the tools and tutorials sections of

You can use the revamped site to find and mine products and services from PR, like newspapers, hotels, blogs, banks, places, companies, universities, gov agencies, reggaeton artists, etc.

Text Mining Tool: A Positional Posting List Generator


, , , ,

A positional inverted index essentially is a set of posting lists storing term weights, term positions, docids, etc from a collection of documents.

Posting lists can also be generated from a single piece of text.

Said lists come handy when we want to conduct text forensics or analyze writing styles; for instance, to check if there is evidence of plagiarism, to imputate authorship, or to analyze how a writer distributes stopwords, rare words, or specific combinations of terms across paragraphs, chapters, etc…

However, counting words and term positions by hand can be time consuming, unless you have a tool that does it for you.

We have developed such a tool, precisely. It is available at

The tool generates posting lists of the form:

term: {frequency value, [array of positions]}

where frequency values are taken for term weights and an array of positions is associated to a term.

At this time the tool analyzes plain text, only.

With minor modifications, it can be used to build a positional inverted index where Robertson’s BM25 weights are stored.

The Microsoft-Nokia Fiasco


, , ,

Reality Check! The Microsoft-Nokia Fiasco.

This RSS news, found with Minerazzi’s Social Pulse Parser (SPP).

A bye bye and a welcome


, , ,

Mi is now a miner, dedicated exclusively to the indexing of sites relevant to Puerto Rico. Its legacy content is having a second life as it is being moved to the Tools and Tutorials sections of All these changes are part of a broader effort of placing the latter at the center of the action.

Bye bye Mi Islita (2001-2015). Welcome, Mi Islita (2015 – ?).