The Extended Boolean Model for Information Retrieval

Tags

, , , ,

The Extended Boolean Model for Information Retrieval. This is an IR tutorial I wrote circa 2006 (http://www.minerazzi.com/tutorials/term-vector-6.pdf). It may be useful to those interested in learning its applicability and implementation.

The model was developed by Edward Alan Fox in his 1983 PhD thesis: Extending the Boolean and Vector Space Models of Information Retrieval with p-norm Queries and Multiple Concept Types. Ph.D. Dissertation, Cornell University. Retrieved from https://catalog.hathitrust.org/Record/009232562

The tutorial covers the advantages, limitations, and drawbacks of the model. By varying the p-norm, from p = 1 to p = infinity, its ranking behavior changes from that of a vector space-like to that of a strict Boolean-like.

I know, I know. This was posted way back at
https://irthoughts.wordpress.com/2016/06/06/extended-boolean-model-tutorial/

I’m just reposting it just in case some missed this great classic model.

Introducing Recursive Searches

Tags

,

Introducing Recursive Searches (http://www.minerazzi.com/tools/recursive-searches/recursive-searches.php): A proof of concept for the notion of Recursive Forms.

This tool recursively searches a two-dimensional associative array. The Periodic Table of Elements is used as a case study. We define a recursive form (RF) as an HTML form that uses its previous output as its new input. Recursion is done through a single-line text field.

We are currently developing dozens of RF-based tools. Example and use for this RF in particular follows.

Can you answer the following question (Q) without consulting the Periodic Table? If not, don’t worry. You are not alone. Most chemistry students can’t, either:

Q: An element has an atomic weight closest to a number X (e.g, X = 50, 66, 100, or 112). What is its atomic number?

Not many chemistry students can figure out the name, symbol, or atomic number of an element with atomic weight closest to a given number X, particularly for transition metals and above. This RF comes to the rescue…

In general, RFs can be implemented for many dissimilar purposes. Check here: http://www.minerazzi.com/tools/index.php

Cosine Similarity and Overlapping Values

Tags

The NIH Value Set Authority Center (VSAC) has an oldie but goodie documentation on cosine similarity and overlapping values at https://www.nlm.nih.gov/vsac/support/userforum/VSACUsersForum12.pdf

Happy to see they quoted one of my posts:

https://irthoughts.wordpress.com/2015/04/10/a-cosine-similarity-tool-and-companion-tutorial/

For resources relevant to the topic, see

72 Binary Similarity Measures Tool: http://www.minerazzi.com/tools/similarity/binary-similarity-calculator.php

Cosine Similarity Calculator: http://www.minerazzi.com/tools/cosine-similarity/cosine-similarity-calculator.php

Recursive Mini Converters

Tags

Recursive Mini Converters is our newest tool.

This tool is a proof of concept for the notion of recursive forms and recursive mini converters. It is available at

http://minerazzi.com/tools/recursive-mini-converters/recursive-mini-converters.php

On Recursive Forms

If a single-line text field is used for input and output it can be called a two-way field. If this is the only text field that is visible at the front-end of a form, we may call said form a two-way mini form. If this form processes input and output by means of conversion factors, then said form defines a mini converter. If the output becomes the new input, the new output depends on the previous one and a recursive form and recursive mini converter is obtained.

With that in mind, we show that recursive mini converters:

—can be coded using pure PHP; i.e., no framework, JQuery, or JavaScript are needed.
—are suitable for small screen displays like smartphones and tablets.
—add non-intrusive unit conversion capabilities to other web tools when placed on same page.
—can coexist on a web page, independent from each other, and without collisions.

It remains to show that multiple mini converters can be incorporated into a single form, but this is not challenging.

Applications/Future Work

Recursive mini forms are not just for building unit converters. These can be used for:

–Recursive Queries: Searching multidimensional associative arrays and databases.
–Recursive Calculations: Computing terms from equations.
–Recursive Mapping: Finding Equivalencies.
–Recursive Associations: Concepts Mapping.
–…and more.

Significant Figures Calculator

Tags

, , , ,

This tool, The Significant Figures Calculator, lets users compute and edit significant figures.

The tool is based on the theory of relative errors and reports results in conventional and scientific notation. Values involving ambiguous trailing zeros are properly counted.

A handy tool for teachers and students, and for those that need to report quantities to a given number of significant figures.

Giving a precise definition for the correct number of significant figures is quite subtle (Higham, 2002). Different rules for counting significant figures have been reported. There is also the fact that not all significant figures are meaningful figures.

The theory of relative errors can be used to derive safe counting guidelines. See Numerical Analysis, by Burden and Faires, pages 20-22. According to the theory, it can be stated that significant figures can be attributed only to measured quantities for which a relative error can be computed. Thus it is not possible to attribute significant figures to zero as a quantity, exact numbers, conversion factors, and non-measured constants. However, all digits of a measured constant are significant.

Although there are many similar calculators out there, many do not pass what can be called “the test of zeros”. That is, enter one or more zeros as a quantity (0, 0.0, 0.00,…) and check the result. According to the theory of relative errors, no significant figures attribution is possible. Expressing these in scientific notation does not add any artificial significance to them as we still cannot compute relative errors for them.

Let us finally address the question of how many significant figures are in “0.”, i.e., in a zero followed by a decimal point. Some authors claim that this zero has one significant figure. Their argument here is that the convention of trailing zeros ending with a decimal place applies. This implies that said zero is trailing itself. This is an invalid argument as we can also imply that said zero is a leading zero, leading itself. Again, no relative error can be computed for a self-trailing or self-leading zero.

Last updated: 5-23-2021

The Video Finder

Tags

, , ,

I always wanted to have an easy way of accessing news videos from national TV channels, without having to subscribe to them or pay for third-party tools that do this. So I built my own one: The Video Finder (http://www.minerazzi.com/tools/video-finder/video-finder.php).

Users can submit a YouTube channel id or a url containing one and the tool returns a links list in feeds format. Videos on other categories or topics can be accessed with the tool. The tool is based on a previous one (The RAR Parser, http://www.minerazzi.com/tools/rar/feeds-parser.php).

Now I have a way of building web pages on a given category (Human Resources, Chemistry, Genetics, Health, Sports,…), listing relevant channels. Great for educational or research purposes.

Fast reverse complement computation of DNA sequences without string concatenation loops

Tags

, ,

Arguments on why string concatenation loops (SCL) is a bad programming practice were given in a previous post.

You can also google for additional arguments against SCL.

We now show a simple PHP code that makes unnecessary SCL when computing DNA reverse complement sequences. Consider the sequence

ATTAAAGGTTTATACCTTCCC

This sequence corresponds to the first 21 nucleotides of the NCBI Reference Sequence NC_045512.2; i.e., to the first 21 nts of the complete genome of the severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1 (https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2 ).

To compute its reverse complement sequence,

GGGAAGGTATAAACCTTTAAT

we just need to use the following one-liner:

echo strrev(strtr(‘ATTAAAGGTTTATACCTTCCC’, ‘ACGT’, ‘TGCA’));

Voila!

No string splitting, concatenation, or looping are needed!

As I always say, graduate students and postdocs who can code have and edge in multidisciplinary research work over those that cannot code.

Why string concatenation in loops is a bad programming strategy.

Tags

, , , , ,

I’m finally back after a long vacation, thanks to the pandemic. Having said that, let’s go back to business.

Why string concatenation in loops (SCL) is a bad programming strategy.

String concatenation in loops (SCL), is often used in sequencing analysis; for instance, for finding complement and reverse complement sequences, for DNA to RNA translations, and so forth. It is also used for generating random sequences, and DNA data storing.

I’m currently writing an article on this subject. A little snippet follows:

“SCL is a common strategy because of its simplicity. In many programming languages, however, SCL is discouraged because generates a large number of temporary objects, consumes memory resources, and increases execution times. As sequences get larger, or if these are processed in batch mode, implementing SCL becomes a computationally inefficient strategy. “

I cannot wait to finish it and provide one-liner solutions, applicable across knowledge domains.

A search in Google for string concatenation in loops provides some reasons for avoiding SCL.

On Fake Research, “Academic” Spam, and ASEO

Tags

, , , ,

Oldie but goodie. Little has changed since then as can be seen from the comparative below.

BEFORE: ASEO

Academic Search Engine Spam and Google Scholar’s Resilience Against it
https://quod.lib.umich.edu/j/jep/3336451.0013.305?view=text;rgn=main

What an Audacious Hoax Reveals About Academia
Three scholars wrote 20 fake papers using fashionable jargon to argue for ridiculous conclusions.
https://www.theatlantic.com/ideas/archive/2018/10/new-sokal-hoax/572212/

NOW: CSEO

CSEO = Covid SEO

Covid Spam/Scam.
https://www.google.com/search?q=coronavirus%20spam

The Almost Binary Heuristic

Tags

, , ,

Yet another bond order calculation heuristic that still fail.

I describe this new heuristic, The Almost Binary Heuristic, in the “What is computed?” section of the bond order calculator at
http://www.minerazzi.com/tools/bond-order/calculator.php

The tool itself is described in the Bond Order Calculator Tool post at https://irthoughts.wordpress.com/2018/12/20/bond-order-calculator-tool/

The Almost Binary Heuristic is aimed at computing bond orders of diatomic species having up to 20 electrons in a straightforward manner. The heuristic can be used to reproduce the results of our bond order calculator tool.

I’ve included the php script that generates the so-called “phone number” trick for computing bond orders of diatomic species with up to 20 electrons.

Feel free to copy/rewrite the code with your favorite programming language or use it to build your own bond order calculator tool. Just please keep the credit lines in place. 🙂

Cheers