The following complementary collections were reindexed and updated
Information Retrieval, http://www.minerazzi.com/irc
Data Structures & Algorithms, http://www.minerazzi.com/dsac
Both include RSS news channels to Bing, Google, MIT, and Arxiv so users can easily find news relevant to these collections.
That Quantum Computing and Searching is the next Information Security (IS) and Information Retrieval (IR) frontier is more than clear. According to Phys.org and quote
“The National Institute of Standards and Technology (NIST) is officially asking the public for help heading off a looming threat to information security: quantum computers, which could potentially break the encryption codes used to protect privacy in digital systems. NIST is requesting methods and strategies from the world’s cryptographers, with the deadline less than a year away.”
Read more at:
Now that Quantum Computers and Quantum Searches are at a corner near you, the implications are many: from search marketing to search apps, from social grids, to quantum PCs, from big challenges to big data, from quantum retrieval to mind retrieval: The sky is the limit. Back in 2013 we mentioned quantum searches in the context of XOR/XNOR searches.
A miner on quantum searches will soon be available at http://www.minerazzi.com. In the meantime, see some useful links below:
- Phys.org (2016). NIST asks public to help future-proof electronic information.
- Viamontes, G. F., Markov, I. L., & Hayes, P. (2005). Is Quantum Search Practical?
- Phys.org (2005). Data structures influence speed of quantum search in unexpected ways.
- Quora (2014). How do you use the Grover quantum search algorithm to find all the solutions to some search query?
- Paparo, G. D. & Martin-Delgado, M. A. (2012). Google in a Quantum Network.
- Wang, H., Wu, J., Yang, X., Chen, P., & Yi, X. (2014). An Enhanced Quantum PageRank Algorithm Integrated with Quantum Search.
- Lu, S., Zhang, Y., & Liu, F. (2013). An efficient quantum search engine on unsorted database.
- MIT Technology Review (2011). Quantum PageRank Algorithm Outperforms Classical Version.
We have enhanced the MUST tool, available at
This is a redirection checker tool that upon url redirections reports initial and final status codes, URLs, and IP addresses.
The tool now:
1. accepts 500 urls per submission.
2. summarizes broken and active URLs.
This is one of several tools that we use in-house for re-indexing databases and cleaning up crawl results, except without url limitations.
We have restored, refined, and updated this tutorial and added some historical background.
This is a light tutorial on OKAPI BM25, a Best Match model where local weights are computed as parameterized frequencies and global weights as RSJ weights. Local weights are based on a 2-Poison model and the verbosity and scope hypotheses and global weights on the Robertson-Spärck-Jones Probabilistic Model.
In the early 80s Gillian Venner, Nathalie Mitev, and Stephen Walker (1985, 1987) conducted research work that led to the design and evaluation of online public access catalogs (OPACs) at Polytechnic of Central London (PCL).
The project initial phases spanned from November 1982 to May 1985. The prototype was named OKAPI (Online Keyword Access to Public Information). As Mitev (1985) wrote:
“Designing an online public access catalogue [OPAC]: Okapi, a catalogue on a local area network [LAN] is the final report of a two-year research project ”Microprocessor networking in libraries” which was funded by the British Library and the Department of Trade and Industry, and based at the Polytechnic of Central London.”
“The aim was to produce an OPAC on a LAN, that would be readily usable without training or experience, without sacrificing effectiveness or being tedious for experienced users.”
“The result was a functioning prototype OPAC called Okapi, which has a number of distinctive features: use is eased by coloured keys and a lack of jargon; the system uses search decision trees to select a suitable action at each stage of a search, and it performs automatic Boolean and hyper-Boolean functions where appropriate. The OPAC was installed and evaluated in one of the Polytechnic site libraries.”
Want more? Read the tutorial at
This is an updated version of a tutorial on the Robertson-Spärck-Jones Probabilistic Model.
It is available now at
The model computes global weights, known as RSJ weights, based on Independence Assumptions and Ordering Principles for probable relevance. The model subsumes IDF and IDFP as RSJ weights in the absence of relevance information.
09-26-2016 Update: A new section was added to the tutorial before the Conclusion section. References were added accordingly. Few lines were edited.
PS: I corrected the original publication date to read “Published: 03-30-2009” which is the correct date. My fault.
Algorithms, bioinformatics, chemical mining, chemistry, Data Conversion, data miners, Data Mining, information retrieval, ir, minerazzi, miners, mining, news, social mining, statistics, tools, tutorials, Vector Space Models
We have expanded the number of similarity measures that our Binary Similarity Calculator computes from 30 to 72 (and counting…)
Same measures with different names have been consolidated into a single record, and different measures with same name have been enumerated as necessary.
These similarity coefficients have many applications across disciplines: from bioinformatics to chemistry, chemometrics, statistics, data mining, information retrieval, marketing research, etc.
The tool is available at
We have also included the new similarity measures proposed by Consonni & Todeschini (2012), and Todeschini, et al (2012).
Our Tutorial on Distance and Similarity was also updated, accordingly. Check it out at
Consonni, V. and Todeschini, R. (2012). New Similarity Coefficients for Binary Data. MATCH Commun. Math. Comput. Chem. 68, 581-592.
Todeschini, R., Consonni, V., Xiang, H., Holliday, J., Buscema, M., and Willet, P. (2012). Similarity Coefficients for Binary Chemoinformatics Data: Overview and Extended Comparison Using Simulated and Real Data Sets. J. Chem. Inf. Model. 52 (11).
We have tweaked MUST (Minerazzi URL Scoring Tool) to run a bit faster.
Try it with modern socials like
or with all those Old Glory Days URLs from the Search Engines Golden Age (the 90s). Check if they are defunct, redirecting, or still active.
AAAHHH: All those old days with their dumb business models. 🙂
The Social Pulse Parser (SPP) has been expanded to include the following categories.
World News (19 rss)
Technology (25 rss)
Data Mining (14 rss)
Search Engine Marketing (9 rss)
Social Media Marketing (12 rss)
Government (6 rss)
Organizations (6 rss)
We expect to track additional categories and rss resources across more social networks
Try the SPP now at