A Brief History of Search Results Ranking, by Stephen Robertson

Tags

, , , , ,

Back in Feb 5 2019, Prof. Stephen Robertson published “A Brief History of Search Results Ranking” (https://ieeexplore.ieee.org/document/8634887). I finally got a complimentary copy from Robertson via ResearchGate. This is a historical article all IR researchers and search engine marketers (SEOs/SEMs) may want to read. It is part of the IEEE Annals of the History of Computing (https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=85).

His take on PageRank? He writes: “As a secondary aim, I hope to debunk the myth of PageRank, which remains in widespread currency.”

In the paper, Robertson sets the record straight regarding the myth of Google’s PageRank. He wrote:

“So how important was PageRank in the Google ranker? It was one of a large number of features, and contributed something to the overall effectiveness of the ranker. But in my view, it was much less important than doing all the other things well. And in particular, the advantage claimed for PageRank (that it quantifies the authority of a page as viewed by other web page authors) can already be obtained from matching the query against anchor text. Anchor text is a little less subtle than PageRank as a quantitative measure, but on the other hand it is query specific, which PageRank is not. This fact has been recognised in the information retrieval
research community, supported by some evaluation work by Hawking and colleagues, for example — work which was published nine years before McCormick’s book.

Matching anchor text well is vital for a good web search engine; using PageRank is useful, but nothing more.”

Advertisements

Dataset for Sequential Sentence Classification

Tags

, , , ,

A research group at CSIE department of National Taiwan University (NTU), supervised by Prof. Shou-De Lin, is currently working on releasing a new dataset for sequential sentence classification and type classification for paper abstracts on arXiv.
 
This annotation and classification project is available at https://mslab.csie.ntu.edu.tw/~labeler/abstract.php
 
Annotations are made by article authors and upon being invited/contacted by the research group. I’m happy to see they included in the dataset my “Local Term Weights Models from Power Transformations | Development of BM25IR” paper (https://arxiv.org/ftp/arxiv/papers/1608/1608.01573.pdf).

PHP Implementation of Text Mining in Digital Libraries using OKAPI BM25 Model

Tags

, , , , , , ,

Here is a nice article from International Journal of Computer Applications Technology and Research, https://ijcat.com/archieve/volume7/issue10/ijcatr07101003.pdf, on text mining digital libraries with OKAPI BM25. The authors successfully used a PHP implementation (XAMPP bundle).

Happy to see in reference 22 of the article that they cited one of my tutorials, on the BM25F implementation. This one is available at http://www.minerazzi.com/tutorials/bm25f-model-tutorial.pdf (and through researchgate.net).

The authors generously included some PHP snippets so it should be easy for those that can code to follow up.

 

LANL Miner

Tags

, , ,

 

LANL Miner (http://www.minerazzi.com/lanl). Our newest miner/engine.

Public resources, news, research, people, and more from Los Alamos National Laboratory.

Use its news channel to access feeds relevant to Los Alamos National Laboratory.

Recrawl search results and build your own curated collection of resources.

Want to know about the Manhattan Project? How about research from outerspace?

Some goldmine resources found with the LANL Miner:

Los Alamos Research Online (beta)
https://research-online.lanl.gov/oppie/service

Cambridge Structural Database
https://www.lanl.gov/library/find/articles/csd.php

LANL-PRIMO
http://lanl-primo.hosted.exlibrisgroup.com/primo_library/libweb/action/search.do?vid=LANL

Multiple databases
https://www.lanl.gov/library/find/articles/

And a lot more…

Sarcomas Miner

Tags

, , , , , ,

Sarcomas Miner. http://minerazzi.com/sarcomas/

Find the following resources relevant to sarcoma: Types, diagnosis, classification, epidemiology, awareness, and more. Raise awareness by doing research on these types of cancers.

Use its news channel to find in one place latest news and research articles.

Recrawl search results and build your own curated collection of resources.

W3C Miner

Tags

, , , ,

W3C Miner. http://minerazzi.com/w3c/

W3C public resources, news, standards, work groups, and more.

Use its news channel to easily access from a single place several rss news feeds relevant to World Wide Web Consortium.

Recrawl search results and build your own curated collection of resources.

ORNL Miner

Tags

, , ,

ORNL Miner is now available at

http://www.minerazzi.com/ornl

Find ORNL public resources, news, research, people, and more. Access cutting-edge research and technology inventions.

Use its news channel to easily access from a single place several rss news feeds relevant to Oak Ridge National Laboratory. 

Recrawl search results and build your own curated collection of resources.

Carcinomas Miner

Tags

, , , ,

Carcinomas Miner | Find resources relevant to the types of cancers known as carcinomas (http://www.minerazzi.com/carcinomas/).

Recrawl search results and build your own curated collection of resources. Use this miner to extract valuable front/back end data from relevant sites.

You may also use its rss news channels (http://www.minerazzi.com/carcinomas/spp.php) to find news from around the Web relevant to these and other types of cancers.

Bond Order Calculator Tool

Tags

, , , ,

bond-order-example

This tool computes bond orders of diatomic species having up to 20 electrons, without using Molecular Orbital Theory! It is available at

http://www.minerazzi.com/tools/bond-order/calculator.php

We developed the tool inspired in Dr. Arijit Das set of innovative and time economic formulae for chemical education. His methodologies are suitable for computer-based learning (CBL) activities or for writing computer programs for solving chemistry problems.

Unlike with other bond order calculators, to use ours you don’t need to write Lewis structures, and electron configurations, or count electrons, bonds, orbitals, and atoms. Just enter a chemical formula and the tool will do the rest for you.

In my opinion, students who know how to write programs for solving chemistry problems have an edge when taking quantitative courses like analytical chemistry, instrumental analysis, chemometrics, computational chemistry, and similar courses. I think they might be better prepared for multidisciplinary research work than those who cannot code.

Developing this tool was really gratifying as the work inspired us to derive an algorithm for predicting number of unpaired electrons and magnetic properties of single atoms, diatomic species, and their ions. Hopefully, this algorithm will be available early next year in the form of a new chemistry calculator.

We are also developing a tool that computes bond orders of all kind of species, including the polyatomic cases.

We are sincerely in debt to Dr. Arijit Das from Ramthakur College, Agartala, West Tripura, India for encouraging us to develop this tool for educators, scholars, and chemistry students.

Note:

This tool, as our Hydrocarbons Parser (http://www.minerazzi.com/tools/hydrocarbons/parser.php) is listed in the City College Chemistry Web Resources Guide at CUNY. Find them both in the guide Computational Chemistry category (http://libguides.ccny.cuny.edu/chemistry/computational).

Applications of Binary Fractals to Long Genetic Sequences via a Kronecker Family of Genetic Matrices

Tags

, , , , , , ,

CSS Fractal Studio

Back in 2017, Stepanyan & Petoukhov reported that long nucleotide sequences can be modeled as binary fractals by means of Kronecker exponentiation of matrices.

https://www.mdpi.com/2078-2489/8/1/12/pdf

see also https://arxiv.org/ftp/arxiv/papers/1310/1310.8469.pdf

Abstract reads in part:

“This method uses a set of symmetries of biochemical attributes of nucleotides. It also uses the possibility of presentation of every whole set of N-mers as one of the members of a Kronecker family of genetic matrices. With this method, a long nucleotide sequence can be visually represented as an individual fractal-like mosaic or another regular mosaic of binary type.”

We added the fractal resembling the pattern of the nucleotide sequence Homo sapiens chromosome 22 genomic scaffold into our Fractal Studio tool at

http://www.minerazzi.com/tools/fractals/studio.php

Researchers can reproduce its binary mosaic, shown above, by just selecting the Homo Sapiens Mosaic option from the tool selection menu. Compare results with Figures 4 and 8 of Stepanyan & Petoukhov article. Compare also some multifractals that the tool generates with some of the genetic mosaics described in the article.

Multidisciplinary research is a beautiful thing.