BM25F Model Tutorial

Tags

, ,

We have restored, expanded, and updated our tutorial on the BM25 Extension to Multiple Weighted Fields Model, best known as BM25F. It is now available at

http://www.minerazzi.com/tutorials/bm25f-model-tutorial.pdf

Active links were also added to the References section.

Enjoy it.

Nobel Prize Laureates Miner

Tags

, , , , , ,

This is a new Minerazzi.com miner, available now at

http://www.minerazzi.com/nobel-prize/

Use it to find resources relevant to laureates of the Nobel Prize. Search by laureates, country, discipline, or field. Find Nobel Prize Laureates in Chemistry, Physics, and other fields.

 

 

What is a Precision Matrix?

Tags

, , , ,

For completeness, we have added the following content to the Exercises section of the Matrix Inverter tool available at

http://www.minerazzi.com/tools/matrix-inverter/gauss-jordan.php

and mentioned in the post

https://irthoughts.wordpress.com/2016/10/05/matrix-inverter-a-matrix-inversion-tool/

The following information was found online (Quora, 2013, StackExchange, 2013a; 2013b).

Let Ʃ be a covariance matrix and Ʃ-1 an inverse covariance matrix, commonly referred to as the precision matrix.

With Ʃ, one observes the unconditional correlation between a variable i, to a variable j by reading off the (i,j)-th index.

It may be the case that the two variables are correlated, but do not directly depend on each other, and another variable k explains their correlation. By computing Ʃ-1 we can examine if the variables are partially correlated and conditionally independent.

Ʃ-1 displays information about the partial correlations of variables. A partial correlation describes the correlation between variable i and j, once you condition on all other variables. If i and j are conditionally independent then the (i,j)-th element of Ʃ-1 will equal zero. If the data follows a multivariate normal then the converse is true, a zero element implies conditional independence.

In general, Ʃ-1 is a measure of how tightly clustered the variables are around the mean (diagonal elements) and the extend to which they do not co-vary with the other variables (non-diagonal elements). The higher the diagonal elements, the tighter the variables are clustered around the mean.

So far I found that to be, in my opinion, the simplest explanation on the subject. So there you have a good application for our Matrix Inverter tool.

References

Matrix Inverter: A Matrix Inversion Tool

Tags

, , ,

This is a new tool, available now at

http://www.minerazzi.com/tools/matrix-inverter/gauss-jordan.php

The tool inverts a square matrix using Gauss-Jordan Elimination.

A matrix filled with zeroes is returned if the input matrix is non-invertible. This is used as a crude signal.

A non-invertible square matrix, also called singular or degenerate, is one whose determinant is zero.

The tool can be used to double check calculations of small matrices or as a demo resource.

Have a nice invertible day. 🙂

Box-Cox Power Transformations Tool

Tags

, , , ,

This is a new statistical and data mining tool, available now at

http://www.minerazzi.com/tools/power-transformations/box-cox.php

It greatly simplifies the work of those dealing with data transformation problems.

Enjoy it.

About the tool:

  • This tool lets you transform a data set by applying one or more Box-Cox Power Transformations. The research articles given in the References section of the tool cover this topic.
  • To use the tool enter one data set value per line. End each line by hitting the Enter key so these are recognized as individual entries.
  • To apply multiple transforms, check preset field.
  • To apply a single transform, uncheck preset field and enter a p value (p ∈ [-2,+2]).
  • Submit or reset form as needed.

10-02-2016 Update:

We added a new feature to the tool so it now lets users return all non-negative transforms.

OKAPI BM25 Tutorial

Tags

, , ,

We have restored, refined, and updated this tutorial and added some historical background.

Abstract

This is a light tutorial on OKAPI BM25, a Best Match model where local weights are computed as parameterized frequencies and global weights as RSJ weights. Local weights are based on a 2-Poison model and the verbosity and scope hypotheses and global weights on the Robertson-Spärck-Jones Probabilistic Model.

Introduction

In the early 80s Gillian Venner, Nathalie Mitev, and Stephen Walker (1985, 1987) conducted research work that led to the design and evaluation of online public access catalogs (OPACs) at Polytechnic of Central London (PCL).

The project initial phases spanned from November 1982 to May 1985. The prototype was named OKAPI (Online Keyword Access to Public Information). As Mitev (1985) wrote:

“Designing an online public access catalogue [OPAC]: Okapi, a catalogue on a local area network [LAN] is the final report of a two-year research  project ”Microprocessor networking in libraries” which was funded by the British Library and the Department of Trade and Industry, and based at the Polytechnic of Central London.”

“The aim was to produce an OPAC on a LAN, that would be readily usable without training or experience, without sacrificing effectiveness or being tedious for experienced users.”

“The result was a functioning prototype OPAC called Okapi, which has a number of distinctive features: use is eased by coloured keys and a lack of jargon; the system uses search decision trees to select a suitable action at each stage of a search, and it performs automatic Boolean and hyper-Boolean functions where appropriate. The OPAC was installed and evaluated in one of the Polytechnic site libraries.”

Want more? Read the tutorial at

http://www.minerazzi.com/tutorials/okapi-bm25-model.pdf

Mayaro Virus (MAYV) Miner

Tags

, , , , ,

This is a new Minerazzi.com miner that is available now at

http://www.minerazzi.com/mayaro/

Research the scientific literature for the Mayaro Virus (MAYV). Read research and news from CDC, NIH, WHO, and other sources. Search by location, site, or health organization. Recrawl search results to build your own curated collection on MAYV.

This is a new disease with symptoms similar to Chickungunya (CHIKV) but stronger. It is now moving to the Caribbean and soon to PR and Florida.

 

Probabilistic Model Tutorial

Tags

, , ,

This is an updated version of a tutorial on the Robertson-Spärck-Jones Probabilistic Model.

It is available now at

http://www.minerazzi.com/tutorials/probabilistic-model-tutorial.pdf

The model computes global weights, known as RSJ weights, based on Independence Assumptions and Ordering Principles for probable relevance. The model subsumes IDF and IDFP as RSJ weights in the absence of relevance information.

Enjoy it.

09-26-2016 Update: A new section was added to the tutorial before the Conclusion section. References were added accordingly. Few lines were edited.

PS: I corrected the original publication date to read “Published: 03-30-2009” which is the correct date. My fault.

Moving Averages Calculator

Tags

, , , ,

Calculate several moving averages, including simple, cumulative, and exponential moving averages with this new tool, available at

http://www.minerazzi.com/tools/moving-averages/calculator.php

A great tool for researchers, teachers, and students! Just enter a data set and the range to be shifted.

72 Binary Similarity Measures

Tags

, , , , , , , , , , , , , , , , ,

We have expanded the number of similarity measures that our Binary Similarity Calculator computes from 30 to 72 (and counting…)

Same measures with different names have been consolidated into a single record, and different measures with same name have been enumerated as necessary.

These similarity coefficients have many applications across disciplines: from bioinformatics to chemistry, chemometrics, statistics, data mining, information retrieval, marketing research, etc.

The tool is available at

http://www.minerazzi.com/tools/similarity/binary-similarity-calculator.php

We have also included the new similarity measures proposed by Consonni & Todeschini (2012), and Todeschini, et al (2012).

Our Tutorial on Distance and Similarity was also updated, accordingly. Check it out at

http://www.minerazzi.com/tutorials/distance-similarity-tutorial.pdf

 

References

Consonni, V. and Todeschini, R. (2012). New Similarity Coefficients for Binary Data. MATCH Commun. Math. Comput. Chem. 68, 581-592.

Todeschini, R., Consonni, V., Xiang, H., Holliday, J., Buscema, M., and Willet, P. (2012). Similarity Coefficients for Binary Chemoinformatics Data: Overview and Extended Comparison Using Simulated and Real Data Sets. J. Chem. Inf. Model. 52 (11).