This is an amazing research: Latent Simplex Position Model: High Dimensional Multi-view Clusteringwith Uncertainty Quantification, by Prof. Leo Duan from Department of Statistics University of Florida, Gainesville, FL.
There is a great discussion on weighted averages of correlation coefficients at https://www.researchgate.net/post/average_of_Pearson_correlation_coefficient_values
My most recent comments there are given below.
“The main reason for not averaging correlation coefficients in the arithmetic sense follows.”
“Correlations coefficients cannot be averaged in the arithmetic sense as they are not additive in the arithmetic sense. This is due to the fact that a correlation coefficient is a cosine, and cosines are not additive. This can be understood by mean-centering a paired data set and computing the cosine similarity between the vectors representing the variables involved.”
“If a paired data set violates the bivariative normality assumption (often overlooked, as Seifert correctly asserted), that worsens the picture. However, even if it doesn’t violates bivariative normality the computed average is a mathematically invalid exercise. If a meta analysis study is based on these averages the results can be easily challenged on these grounds.”
“Sample-size weighting is a good start, as Seifert asserted. We can certainly do better. We may compute self-weighted averages from one, more than one, or all of the constituent terms of a correlation coefficient, to account for different types of variability information present in the paired data, which otherwise might be ignored by simply sample-size weighting or applying Fisher Transformations. Which self-weighting scheme to use depends on the source of variability information to be considered (https://www.tandfonline.com/doi/abs/10.1080/03610926.2011.654037).”
In the paper, “Development of gene-based molecular markers tagging low alkaloid pauper locus in white lupin (Lupinus albus L.)”, published online on 08-13-2019 in Journal of Applied Genetics, Springer (https://link.springer.com/content/pdf/10.1007%2Fs13353-019-00508-9.pdf), the authors computed the Sokal-Michener (simple matching) and Rogers-Tanimoto coefficients with our Binary Similarity Calculator (http://www.minerazzi.com/tools/similarity/binary-similarity-calculator.php), which computes 72 different resemblance (similarity) measures.
I’m so happy to know that more and more researchers across disciplines are finding new uses for this free-to-use tool.
Building these types of tools are always fun, but are even more gratifying when these provide that extra handy help to other researchers. That’s why I decided to go the multidisciplinary way in the sciences. Their success is my success.
When you are a multidisciplinary scientist or teacher, one way of measuring your success is by looking at what students and others in different fields and countries do with the tools and resources you develop. Satisfaction goes all the way up when these help make a difference in their life.
I’m happy to know that in his 2018 PhD thesis “On Enhancing the Security of Time Constrained Mobile ContactlessTransactions” (https://pure.royalholloway.ac.uk/portal/files/33898207/Iakovos_Gurulian_PhD_Thesis.pdf), the author, Iakovos Gurulian from the Information Security Group, Department of Mathematics at the prestigious Royal Holloway, University of London, developed a Python program capable of running our Binary Similarity Calculator (http://www.minerazzi.com/tools/similarity/binary-similarity-calculator.php), which computes 72 different similarity measures. See pages 87-89, tables 4.1 and 4.2, and reference 118 of the thesis.
The Tutorial on Distance and Similarity (http://www.minerazzi.com/tutorials/distance-similarity-tutorial.pdf) was also cited as reference 60.
According to https://www.topuniversities.com/, Royal Holloway ranks 6/10 in London and 291/1000 in the world. Famous for its Founder’s Building, one of the most spectacular university buildings in the world, the College was officially opened by Queen Victoria in 1886.
This the third and last part of a tutorial series on the non-additivity of correlation coefficients.
Their bias & nature, transformations, and approximations to normality are discussed. The risks of blindly transforming scores to ranks or arbitrarily converting r-to-Z values/Z-to-r values (Fisher Transformations) are discussed. Shifted up cosine approximations to normality are also covered.
Not all researchers know that score-to-rank transformations can change the sampling distribution of a statistic (e.g. a correlation coefficient) and that Fisher transformations are sensitive to normality violations. Combining both types of transformations is a recipe for a statistical disaster.
Alas, some meta analysis and data analytic folks are guilty of that.
Curating collections requires going to original sources which is gratifying.
As part of the effort of building a miner on the golden age of Statistics, I researched those from Ronald Fisher times who might still alive. I found one researcher that precisely is Fisher’s only PhD: Calyampudi Radhakrishna Rao, now 98.
I asked Dr. Rao for help in identifying important references and moments from those times. He graciously sent me his CV listing references to all of his glorious books (15), articles (477), and moments.
Even in his retirement he is still publishing:
Dr. Rao also sent me a PDF with historical photos of him with Mahalanobis, Prime Minister Nehru, Prime Minister Indira Gandhi, and others, and of many glorious moments from his career. What an honor!
His work has impacted so many fields that there are several technical terms bearing his name.
Here is an appealing quote from him:
“We study physics to solve problems in physics, chemistry to solve problems in chemistry, and botany to solve problems in botany. There are no statistical problems which we solve using statistics. We use statistics to provide a course of action with minimum risk in all areas of human endeavor under available evidence. — C. R. Rao”
Ronald Aylmer Fisher was considered an outsider by the statistical establishment of his time.
The links below (1-3) show his struggles & nuances with Karl Pearson, his son Egon, Bowley, their followers, and the Royal Statistical Society (RSS). His life was a story of accomplishments and noise (deceptions and nasty RSS politics). He was too ahead of his time.
That reminds me of the struggles of another maverick: Benoit Mandelbrot. Eventually and like Mandelbrot, Fisher greatness was recognized. Also like Mandelbrot, he was able to boost the signal-to-noise of his career and life.
Most statisticians consider Fisher the Father of Modern Statistics (https://en.wikipedia.org/wiki/Ronald_Fisher), even when he was not allowed to teach Statistics at the University of Cambridge (they tried to silence Fisher).
Yes, scientists too can be demeaning to other scientists, more for personal reasons than for ideas and the Scientific Method. After all, they are also mostly carbon units called “humans”.
1. Fisher in 1921 https://projecteuclid.org/download/pdfview_1/euclid.ss/1118065041
2. Fisher vs Pearson: A 1935 Exchange from Nature
3. Fisher: The Outsider
R. A. Fisher: how an outsider revolutionized statistics
We have updated and improved our Regression & Correlation Calculator to demonstrate, as shown in the above figure, that a Spearman’s Correlation Coefficient is just a Pearson’s Correlation Coefficient computed from ranks.
The tool uses an algorithm that converts values to ranks and averages any ties that might be present before calculating the correlations. This comes handy when we need to compute a Spearman’s Correlation Coefficient from ranks with a large number of ties.
We have explained in the “What is Computed?” section of the page’s tool that as the number of ties increases the classic textbook formula for computing Spearman’s correlations
increasingly overestimates the results, even if ties were averaged.
By contrast, computing a Spearman’s as a Pearson’s always work, even in the presence or absence of ties.
To illustrate the above, consider the following two sets:
X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 2]
using Spearman’s classic equation rs = 0.6364 ≈ 0.64.
By contrast, rs = 0.5222 ≈ 0.52 when computed as a Pearson coefficient derived from ranks. This is a non trivial difference.
Accordingly, we can make a case as to why we should ditch for good Spearman’s classic formula.
We also demonstrate in the page’s tool why we should never arithmetically add or average Spearman’s correlation coefficients. The same goes for Pearson’s.
Early articles in the literature of correlation coefficients theory failed to recognize the non-additivity of Pearson’s and Spearman’s Correlation Coefficients.
Sadly to say, this is sometimes reflected in current research articles, textbooks, and online publications. The worst offenders are some marketers and teachers that, in order to protect their failing models, resist to consider up-to-date research on the topic.
PS. Updated on 09-14-2018 to include the numerical example and to rewrite some lines.
I got a copy of this nice research work written as a book chapter, Building Classes of Similar Chemical Elements from Binary Compounds and their Stoichiometries from its author, Guillermo Restrepo.
It is great to see chemistry research at the intersection of similarity-based classification studies.
Read it. It is a nice work!