I’ve been asked to explain the difference between Spearman (S) and Pearson (P) Correlation Coefficients. Good question as these are frequently used in data mining studies.

I hope this helps.

S is equivalent to P, computed on variables, after these have been transformed into rank-orders. In such a case, we can determine S from the coefficient of determination (D) of a linear regression equation. For instance, if D = 0.49, S=0.7. BTW, D = 0.49 means that 49% of the variations can be explained by the regression model, but 51% cannot be explained by the model. Thus, to compute S with EXCEL, simply rank-order the variables, apply linear regression on a scatter plot, and square root the coefficient of determination. You can also inspect the slope.

Any changes in the original variables that do not affect the rank-order, should not change S, but P. For a givent set of variables, if S > P, we might conclude that the variables are consistently correlated, but not in a linear fashion. However, if S and P are very similar and different from zero, there is indication of a linear relationship.

**Pros/Cons of S**

It is less sensitive to bias due to outliers, does not require data to be metrically scaled or of normality assumptions, but of assumptions about symmetry of a gaussian-like distribution. It is applied to ordinal variables. Ties must be factored in to computations and calculations are tedious.

**Pros/Cons of P**

It is easy to compute. Assumes normality in both variables. It is sensitive to outliers.

**Important Notes on Correlation**

A correlation coefficient varies from +1 to -1. If it is zero the variables are not related. If it is positive, these are positively correlated: one increases when the other increases. If it is negative, these are negatively correlated: one increases when the other decreases and viceversa.

Correlation is not causality. It is just a measure of association between variables that addresses whether these covary. It is not necessary to prejudge these as dependent or independent before estimating correlation.

To determine whether these covary in a significant fashion, we can apply a t-test to the correlation coefficient at a given n – 2 degrees of freedom and confidence level, usually at 95%.

PS.

In a more recent post, (http://irthoughts.wordpress.com/2008/10/29/similarity-pearson-and-spearman-coefficients/) I explained the connection between Pearson and Spearman coeffficients with cosine similarities and dot products and a particular case wherein all these are equivalent.

**References**

https://www.msu.edu/~nurse/classes/summer2002/813/week8spearman.htm

http://www.chipst2c.org/lectures/Stat_lecture_correlation.pdf

http://www.statpac.com/statistics-calculator/correlation-regression.htm

Pingback: Similarity, Pearson, and Spearman Coefficients « IR Thoughts

Pingback: Beware of SEO Statistical Studies « IR Thoughts

E. Garcia

said:A tutorial on the correct way of computing and analyzing correlation coefficients is available at http://www.miislita.com/information-retrieval-tutorial/a-tutorial-on-correlation-coefficients.pdf

IN addition, a response to SEOmoz “rebuttal” and alleged “knowledge” on statistics is available now at http://irthoughts.wordpress.com/2010/07/12/on-seomoz-knowledge-about-statistics/.

Dr. E. Garcia

egarcia

said:Some useful links:

How to compute correlation coefficients with Excel:

http://irthoughts.wordpress.com/2010/06/10/on-spearmans-correlation-coefficients-with-excel/

Why you should not arithmetically add and average correlation coefficients:

http://irthoughts.wordpress.com/2011/01/07/on-the-non-additivity-of-correlation-coefficients/