New Relevant Tool: https://irthoughts.wordpress.com/2018/09/14/regression-correlation-calculator-updates-and-improvements/
I’ve been asked to explain the difference between Spearman (S) and Pearson (P) Correlation Coefficients. Good question as these are frequently used in data mining studies.
I hope this helps.
S is equivalent to P, computed on variables, after these have been transformed into rank-orders. In such a case, we can determine S from the coefficient of determination (D) of a linear regression equation. For instance, if D = 0.49, S=0.7. BTW, D = 0.49 means that 49% of the variations can be explained by the regression model, but 51% cannot be explained by the model. Thus, to compute S with EXCEL, simply rank-order the variables, apply linear regression on a scatter plot, and square root the coefficient of determination. You can also inspect the slope.
Any changes in the original variables that do not affect the rank-order, should not change S, but P. For a givent set of variables, if S > P, we might conclude that the variables are consistently correlated, but not in a linear fashion. However, if S and P are very similar and different from zero, there is indication of a linear relationship.
Pros/Cons of S
It is less sensitive to bias due to outliers, does not require data to be metrically scaled or of normality assumptions, but of assumptions about symmetry of a gaussian-like distribution. It is applied to ordinal variables. Ties must be factored in to computations and calculations are tedious.
Pros/Cons of P
It is easy to compute. Assumes normality in both variables. It is sensitive to outliers.
Important Notes on Correlation
A correlation coefficient varies from +1 to -1. If it is zero the variables are not related. If it is positive, these are positively correlated: one increases when the other increases. If it is negative, these are negatively correlated: one increases when the other decreases and viceversa.
Correlation is not causality. It is just a measure of association between variables that addresses whether these covary. It is not necessary to prejudge these as dependent or independent before estimating correlation.
To determine whether these covary in a significant fashion, we can apply a t-test to the correlation coefficient at a given n – 2 degrees of freedom and confidence level, usually at 95%.
PS.
In a more recent post, (https://irthoughts.wordpress.com/2008/10/29/similarity-pearson-and-spearman-coefficients/) I explained the connection between Pearson and Spearman coeffficients with cosine similarities and dot products and a particular case wherein all these are equivalent.
References
https://www.msu.edu/~nurse/classes/summer2002/813/week8spearman.htm
http://www.chipst2c.org/lectures/Stat_lecture_correlation.pdf
http://www.statpac.com/statistics-calculator/correlation-regression.htm
A tutorial on the correct way of computing and analyzing correlation coefficients is available at http://www.miislita.com/information-retrieval-tutorial/a-tutorial-on-correlation-coefficients.pdf
IN addition, a response to SEOmoz “rebuttal” and alleged “knowledge” on statistics is available now at https://irthoughts.wordpress.com/2010/07/12/on-seomoz-knowledge-about-statistics/.
Dr. E. Garcia
Some useful links:
How to compute correlation coefficients with Excel:
https://irthoughts.wordpress.com/2010/06/10/on-spearmans-correlation-coefficients-with-excel/
Why you should not arithmetically add and average correlation coefficients:
https://irthoughts.wordpress.com/2011/01/07/on-the-non-additivity-of-correlation-coefficients/