OKAPI BM25 Tutorial

Tags

, , ,

We have restored, refined, and updated this tutorial and added some historical background.

Abstract

This is a light tutorial on OKAPI BM25, a Best Match model where local weights are computed as parameterized frequencies and global weights as RSJ weights. Local weights are based on a 2-Poison model and the verbosity and scope hypotheses and global weights on the Robertson-Spärck-Jones Probabilistic Model.

Introduction

In the early 80s Gillian Venner, Nathalie Mitev, and Stephen Walker (1985, 1987) conducted research work that led to the design and evaluation of online public access catalogs (OPACs) at Polytechnic of Central London (PCL).

The project initial phases spanned from November 1982 to May 1985. The prototype was named OKAPI (Online Keyword Access to Public Information). As Mitev (1985) wrote:

“Designing an online public access catalogue [OPAC]: Okapi, a catalogue on a local area network [LAN] is the final report of a two-year research  project ”Microprocessor networking in libraries” which was funded by the British Library and the Department of Trade and Industry, and based at the Polytechnic of Central London.”

“The aim was to produce an OPAC on a LAN, that would be readily usable without training or experience, without sacrificing effectiveness or being tedious for experienced users.”

“The result was a functioning prototype OPAC called Okapi, which has a number of distinctive features: use is eased by coloured keys and a lack of jargon; the system uses search decision trees to select a suitable action at each stage of a search, and it performs automatic Boolean and hyper-Boolean functions where appropriate. The OPAC was installed and evaluated in one of the Polytechnic site libraries.”

Want more? Read the tutorial at

http://www.minerazzi.com/tutorials/okapi-bm25-model.pdf

Mayaro Virus (MAYV) Miner

Tags

, , , , ,

This is a new Minerazzi.com miner that is available now at

http://www.minerazzi.com/mayaro/

Research the scientific literature for the Mayaro Virus (MAYV). Read research and news from CDC, NIH, WHO, and other sources. Search by location, site, or health organization. Recrawl search results to build your own curated collection on MAYV.

This is a new disease with symptoms similar to Chickungunya (CHIKV) but stronger. It is now moving to the Caribbean and soon to PR and Florida.

 

Probabilistic Model Tutorial

Tags

, , ,

This is an updated version of a tutorial on the Robertson-Spärck-Jones Probabilistic Model.

It is available now at

http://www.minerazzi.com/tutorials/probabilistic-model-tutorial.pdf

The model computes global weights, known as RSJ weights, based on Independence Assumptions and Ordering Principles for probable relevance. The model subsumes IDF and IDFP as RSJ weights in the absence of relevance information.

Enjoy it.

09-26-2016 Update: A new section was added to the tutorial before the Conclusion section. References were added accordingly. Few lines were edited.

PS: I corrected the original publication date to read “Published: 03-30-2009” which is the correct date. My fault.

Moving Averages Calculator

Tags

, , , ,

Calculate several moving averages, including simple, cumulative, and exponential moving averages with this new tool, available at

http://www.minerazzi.com/tools/moving-averages/calculator.php

A great tool for researchers, teachers, and students! Just enter a data set and the range to be shifted.

72 Binary Similarity Measures

Tags

, , , , , , , , , , , , , , , , ,

We have expanded the number of similarity measures that our Binary Similarity Calculator computes from 30 to 72 (and counting…)

Same measures with different names have been consolidated into a single record, and different measures with same name have been enumerated as necessary.

These similarity coefficients have many applications across disciplines: from bioinformatics to chemistry, chemometrics, statistics, data mining, information retrieval, marketing research, etc.

The tool is available at

http://www.minerazzi.com/tools/similarity/binary-similarity-calculator.php

We have also included the new similarity measures proposed by Consonni & Todeschini (2012), and Todeschini, et al (2012).

Our Tutorial on Distance and Similarity was also updated, accordingly. Check it out at

http://www.minerazzi.com/tutorials/distance-similarity-tutorial.pdf

 

References

Consonni, V. and Todeschini, R. (2012). New Similarity Coefficients for Binary Data. MATCH Commun. Math. Comput. Chem. 68, 581-592.

Todeschini, R., Consonni, V., Xiang, H., Holliday, J., Buscema, M., and Willet, P. (2012). Similarity Coefficients for Binary Chemoinformatics Data: Overview and Extended Comparison Using Simulated and Real Data Sets. J. Chem. Inf. Model. 52 (11).

USA City Distances Calculator

Tags

, , , , , ,

Calculate mile distances from one USA city to over 300 major cities with this O2M-powered calculator. The tool is available at

http://www.minerazzi.com/tools/city-distances/calculator.php

Use it to find all possible city-distance combinations at once. No need to mess over and over with annoying pull-down menus.

The distances computed are not driving distances, but based on geolocation data (latitudes and longitudes) as listed in Wikipedia at

https://en.wikipedia.org/wiki/List_of_United_States_cities_by_population

The results are sorted alphabetically by city, states. The data is processed with our One-to-Many algorithm to showcase the many possibilities of O2M-driven tools. Additional O2M tools can be found at

http://www.minerazzi.com/tools/

Improving MUST

Tags

, , , , ,

We have tweaked MUST (Minerazzi URL Scoring Tool) to run a bit faster.

Try it with modern socials like

http://www.pinterest.com

http://www.snapchat.com

http://www.instagram.com

http://www.vimeo.com

http://www.periscope.com

http://www.facebook.com

http://www.twitter.com

or with all those Old Glory Days URLs from the Search Engines Golden Age (the 90s). Check if they are defunct, redirecting, or still active.

AAAHHH: All those old days with their dumb business models.🙂