BM25 and BM25F: Implications to SEO and Web Design

We published two great tutorials on the BM25 and BM25F algorithms.

10-11-2016 Link Updates:

BM25F: http://www.minerazzi.com/tutorials/bm25f-model-tutorial.pdf

BM25: http://www.minerazzi.com/tutorials/okapi-bm25-model.pdf

The “take away home” from the theory behind these algorithms:

1. A term (e.g., a keyword) has more information gain the first time it occurs in a document.

2. More likely, a term weighs more in a title field than in other fields.

3. The weight of a term and its occurrence frequency are not linearly related so repeating a text x times does not make it to gain x times more weigh or a document x times more relevant to said term.

4. A linear combination of field scores that destroys term dependencies is contraindicated (See BM25F).

Most SEOs know well about 1 and 2.

As a term has more information gain during its first occurrences, a document about specific terms should mention these at the beginning, particularly in the title tag. For testing purposes and since end user assume that a large headline is the actual title of a document (which is not),  we like to repeat the title tag content in an h1 header that is placed prominently at the beginning of the document. Keywords from the title are then repeated early in the document body. In this way, one can write for both end users and search engines. If a search engine uses some form of the above algorithms (which we don’t know if they do), that base is covered, too. You don’t have to adopt this strategy, unless you want to. It is just our way of conducting tests, but it is a reasonable approach.