Algorithms, bioinformatics, chemical mining, chemistry, Data Conversion, data miners, Data Mining, information retrieval, ir, minerazzi, miners, mining, news, social mining, statistics, tools, tutorials, Vector Space Models
We have expanded the number of similarity measures that our Binary Similarity Calculator computes from 30 to 72 (and counting…)
Same measures with different names have been consolidated into a single record, and different measures with same name have been enumerated as necessary.
These similarity coefficients have many applications across disciplines: from bioinformatics to chemistry, chemometrics, statistics, data mining, information retrieval, marketing research, etc.
The tool is available at
We have also included the new similarity measures proposed by Consonni & Todeschini (2012), and Todeschini, et al (2012).
Our Tutorial on Distance and Similarity was also updated, accordingly. Check it out at
Consonni, V. and Todeschini, R. (2012). New Similarity Coefficients for Binary Data. MATCH Commun. Math. Comput. Chem. 68, 581-592.
Todeschini, R., Consonni, V., Xiang, H., Holliday, J., Buscema, M., and Willet, P. (2012). Similarity Coefficients for Binary Chemoinformatics Data: Overview and Extended Comparison Using Simulated and Real Data Sets. J. Chem. Inf. Model. 52 (11).
Local Term Weight Models from Power Transformations
Development of BM25IR: A Best Match Model based on Inverse Regression
In this article we show how power transformations can be used as a common framework for the derivation of local term weights. We found that under some parametric conditions, BM25 and inverse regression produce equivalent results. As a special case of inverse regression, we show that the largest increment in term weight occurs when a term is mentioned for the second time. A model based on inverse regression (BM25IR) is presented. Simulations suggest that BM25IR works fairly well for different BM25 parametric conditions and document lengths.
Energy Converter is a new data conversion tool, available at
Easily convert energy units or compute energy oil, coal, & natural gas equivalents and more with this one-to-many (O2M) mapping tool. Just input a value and press the Enter key.
To browse a comprehensive list of data conversion and extraction tools, visit
Retail Banking is a new Minerazzi miner, available at
Find products, services, and companies relevant to card or cardless ATM software, digital wallets, mobile payments, payment service providers, and more with this new miner. Search by technologies or keywords.
Recrawl a search result to find additional resources or build your own curated collection.
For additional topic-specific miners, visit http://www.minerazzi.com
This is a new data conversion tool, available now at
No need to mess over and over with annoying pull-down menus.
Just input a value and press Enter key. The tool then easily converts all kind of mass units at once, allowing you to save time and efforts.
Supports SI, Avoirdupois, Troy, Apothecaries units, and more.
The tool uses the same design pattern algorithm that powers our Length Converter tool at
This tool is available at
The tool allows you to do Forward and Reverse DNS lookups. Given a host name, the tool finds its IP. Conversely, given an IP the tool finds the corresponding host.
Forward DNS lookup resolves a host name to an IP address (A record). The process of reverse resolving an IP address uses the pointer DNS record type (PTR record).
Thus, the tool does Forward-confirmed reverse DNS (FCrDNS) lookups. This is a networking parameter configuration where a given IP address has both forward (name-to-address) and reverse (address-to-name) Domain Name System (DNS) entries that match each other.
Unlike similar tools which do Forward/Reverse DNS lookups on a single host, our tool does lookups on multiple hosts, saving users time and effort.
To use the tool, enter one host name (or IP) per line, ending each line by pressing the Enter key.
Forward DNS lookups are faster than Reverse DNS lookups so for the latter you may want to do a few checks at once.
Depending on DNS server configurations, lookups with or without the www alias can produce dissimilar results. For instance yahoo.com with and without www returns different results.
Our tool can be used to identify Internet service providers (ISPs) who do not provide properly matching DNS and rDNS records. It can also be used to find shared hosting and, when misconfigured, forwarders information leaks.
FCrDNS verification can also be used for whitelisting purposes because spammers and phishers cannot usually by-pass this verification when they use zombie computers for email spoofing. That is, the reverse DNS might verify, but it will usually be part of another domain than the claimed domain name.
We have developed a new tool that simplifies Z-to-P and P-to-Z Transformations. It is available at
Unlike similar tools that handle one input score at a time, our tool computes Z-to-P and P-to-Z transformations over an entire set of input scores, saving users time and effort.
The tool facilitates the work of data miners, statisticians, or anyone that need to compute Z and P scores without having to consult Z statistical tables.
It is a great tool for students and teachers interested in Statistics.
For crowdsourcers and freelancers:
This is a new minerazzi.com miner, available at
Find work-for-hire jobs and remote employment opportunities. Search by crowdsourcing and freelancing companies, projects, or expertise area.Be hired!
We are getting closer to Mind Retrieval. The implications of being able to mine the brain are obvious for all sciences, in addition to homeland security, law and order, marketing research, etc.
I got last night this news, “Scientists map brain’s ‘thesaurus’ to help decode inner thoughts
Scientists at the University of California, Berkeley, have taken a step in that direction by building a “semantic atlas” that shows in vivid colors and multiple dimensions how the human brain organizes language. The atlas identifies brain areas that respond to words that have similar meanings
Last year I mentioned that we are getting close to Mind Retrieval.
That post was a reminder of a previous 2010 interview by Nuno Valenzuela, a visionary SEM from Spain. Great guy.
I met Nuno back in 2007 when I was invited to present at a Madrid Search Engine Congress (OJOBuscador) on Latent Semantic Indexing (LSI).
See conference legacy links here
Here is a link to Nuno’s interview. You may want to resize browser window:
And some relevant links here:
Unfortunately, OJOBuscador site is now defunct so their links are broken.