OKAPI BM25 Tutorial


, , ,

We have restored, refined, and updated this tutorial and added some historical background.


This is a light tutorial on OKAPI BM25, a Best Match model where local weights are computed as parameterized frequencies and global weights as RSJ weights. Local weights are based on a 2-Poison model and the verbosity and scope hypotheses and global weights on the Robertson-Spärck-Jones Probabilistic Model.


In the early 80s Gillian Venner, Nathalie Mitev, and Stephen Walker (1985, 1987) conducted research work that led to the design and evaluation of online public access catalogs (OPACs) at Polytechnic of Central London (PCL).

The project initial phases spanned from November 1982 to May 1985. The prototype was named OKAPI (Online Keyword Access to Public Information). As Mitev (1985) wrote:

“Designing an online public access catalogue [OPAC]: Okapi, a catalogue on a local area network [LAN] is the final report of a two-year research  project ”Microprocessor networking in libraries” which was funded by the British Library and the Department of Trade and Industry, and based at the Polytechnic of Central London.”

“The aim was to produce an OPAC on a LAN, that would be readily usable without training or experience, without sacrificing effectiveness or being tedious for experienced users.”

“The result was a functioning prototype OPAC called Okapi, which has a number of distinctive features: use is eased by coloured keys and a lack of jargon; the system uses search decision trees to select a suitable action at each stage of a search, and it performs automatic Boolean and hyper-Boolean functions where appropriate. The OPAC was installed and evaluated in one of the Polytechnic site libraries.”

Want more? Read the tutorial at


Mayaro Virus (MAYV) Miner


, , , , ,

This is a new Minerazzi.com miner that is available now at


Research the scientific literature for the Mayaro Virus (MAYV). Read research and news from CDC, NIH, WHO, and other sources. Search by location, site, or health organization. Recrawl search results to build your own curated collection on MAYV.

This is a new disease with symptoms similar to Chickungunya (CHIKV) but stronger. It is now moving to the Caribbean and soon to PR and Florida.


Probabilistic Model Tutorial


, , ,

This is an updated version of a tutorial on the Robertson-Spärck-Jones Probabilistic Model.

It is available now at


The model computes global weights, known as RSJ weights, based on Independence Assumptions and Ordering Principles for probable relevance. The model subsumes IDF and IDFP as RSJ weights in the absence of relevance information.

Enjoy it.

09-26-2016 Update: A new section was added to the tutorial before the Conclusion section. References were added accordingly. Few lines were edited.

PS: I corrected the original publication date to read “Published: 03-30-2009” which is the correct date. My fault.

Moving Averages Calculator


, , , ,

Calculate several moving averages, including simple, cumulative, and exponential moving averages with this new tool, available at


A great tool for researchers, teachers, and students! Just enter a data set and the range to be shifted.

72 Binary Similarity Measures


, , , , , , , , , , , , , , , , ,

We have expanded the number of similarity measures that our Binary Similarity Calculator computes from 30 to 72 (and counting…)

Same measures with different names have been consolidated into a single record, and different measures with same name have been enumerated as necessary.

These similarity coefficients have many applications across disciplines: from bioinformatics to chemistry, chemometrics, statistics, data mining, information retrieval, marketing research, etc.

The tool is available at


We have also included the new similarity measures proposed by Consonni & Todeschini (2012), and Todeschini, et al (2012).

Our Tutorial on Distance and Similarity was also updated, accordingly. Check it out at




Consonni, V. and Todeschini, R. (2012). New Similarity Coefficients for Binary Data. MATCH Commun. Math. Comput. Chem. 68, 581-592.

Todeschini, R., Consonni, V., Xiang, H., Holliday, J., Buscema, M., and Willet, P. (2012). Similarity Coefficients for Binary Chemoinformatics Data: Overview and Extended Comparison Using Simulated and Real Data Sets. J. Chem. Inf. Model. 52 (11).

USA City Distances Calculator


, , , , , ,

Calculate mile distances from one USA city to over 300 major cities with this O2M-powered calculator. The tool is available at


Use it to find all possible city-distance combinations at once. No need to mess over and over with annoying pull-down menus.

The distances computed are not driving distances, but based on geolocation data (latitudes and longitudes) as listed in Wikipedia at


The results are sorted alphabetically by city, states. The data is processed with our One-to-Many algorithm to showcase the many possibilities of O2M-driven tools. Additional O2M tools can be found at


Improving MUST


, , , , ,

We have tweaked MUST (Minerazzi URL Scoring Tool) to run a bit faster.

Try it with modern socials like








or with all those Old Glory Days URLs from the Search Engines Golden Age (the 90s). Check if they are defunct, redirecting, or still active.

AAAHHH: All those old days with their dumb business models.🙂