Today I updated my Tutorial on Correlation Coefficients to include a new section on the effect of sample size on the significance of correlation coefficients. This was motivated by some comments from search engine marketers on correlation strengths. (http://searchenginewatch.com/3641002). The new material might help those interested in learning whether a reported correlation coefficient is statistically different from zero. It is given below. Enjoy it.
The problem with correlation strength scales is that these say nothing about how the size of a sample impacts the significance of a correlation coefficient. This is a very important issue that is now addressed.
Consider three different correlation coefficients: 0.50, 0.35, and 0.17. Assume that we want to test that there is no significant relationship between the two variables at hand. The null hypothesis (H0) to be tested is that these r values are not statistically different from zero (rho = 0). How to proceed?
As recommended by Stevens (17), for rho = 0, H0 can be tested using a two tailed (i.e.,two sided) t-test at a given confidence level, usually at a 95% level. If tcalculated ≥ ttable, H0 is rejected. However, if tcalculated < ttable H0 is not rejected and there is no significant correlation between variables.
Here tcalculated is computed as r/SEr = r*SQRT[((n – 2)/(1 – r2))] while ttable values are obtained from the literature (http://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values ). Table 2 summarizes the result of testing the null hypothesis at different sample size values.
|Table 2. H0 tests at different sample sizes; two-tailed, 95% confidence.|
|n||df = n – 2||r||SEr||t(calc)||t (0.95)||Reject (H0 : rho = 0)?|
The table addresses at which size level an r value is high enough to be statistically significant.
For n = 14, all three r values (0.50, 0.35, and 0.17) are not statistically different from zero.
For n = 30, r = 0.50 is statistically different from zero while r = 0.35 and r = 0.17 are not.
Conversely, r = 0.50 is not statistically different from zero when n is equal or less than 14 while r = 0.35 is not different from zero when n is equal or less than 30.
Finally, r = 0.17 is not statistically different from zero at any of the sample sizes tested.