Today I updated my Tutorial on Correlation Coefficients to include a new section on the effect of sample size on the significance of correlation coefficients. This was motivated by some comments from search engine marketers on correlation strengths. (http://searchenginewatch.com/3641002). The new material might help those interested in learning whether a reported correlation coefficient is statistically different from zero. It is given below. Enjoy it.

The problem with correlation strength scales is that these say nothing about how the size of a sample impacts the significance of a correlation coefficient. This is a very important issue that is now addressed.

Consider three different correlation coefficients: 0.50, 0.35, and 0.17. Assume that we want to test that there is no significant relationship between the two variables at hand. The null hypothesis (H0) to be tested is that these r values are not statistically different from zero (rho = 0). How to proceed?

As recommended by Stevens (17), for rho = 0, H0 can be tested using a two tailed (i.e.,two sided) t-test at a given confidence level, usually at a 95% level. If t_{calculated} ≥ t_{table}, H0 is rejected. However, if t_{calculated} < t_{table} H0 is not rejected and there is no significant correlation between variables.

Here t_{calculated} is computed as r/SEr = r*SQRT[((n – 2)/(1 – r^{2}))] while t_{table} values are obtained from the literature (http://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values ). Table 2 summarizes the result of testing the null hypothesis at different sample size values.

Table 2. H_{0}tests at different sample sizes; two-tailed, 95% confidence. |
||||||

n |
df = n – 2 |
r |
SE_{r} |
t(calc) |
t (0.95) |
(Reject H : _{0}rho = 0)? |

5 | 3 | 0.50 | 0.50 | 1.000 | 3.182 | don’t reject |

10 | 8 | 0.50 | 0.31 | 1.633 | 2.306 | don’t reject |

12 | 10 | 0.50 | 0.27 | 1.826 | 2.228 | don’t reject |

14 | 12 | 0.50 | 0.25 | 2.000 | 2.179 | don’t reject |

20 | 18 | 0.50 | 0.20 | 2.449 | 2.101 | reject |

30 | 28 | 0.50 | 0.16 | 3.055 | 2.048 | reject |

40 | 38 | 0.50 | 0.14 | 3.559 | 2.024 | reject |

50 | 48 | 0.50 | 0.13 | 4.000 | 2.011 | reject |

5 | 3 | 0.35 | 0.54 | 0.647 | 3.182 | don’t reject |

10 | 8 | 0.35 | 0.33 | 1.057 | 2.306 | don’t reject |

12 | 10 | 0.35 | 0.30 | 1.182 | 2.228 | don’t reject |

14 | 12 | 0.35 | 0.27 | 1.294 | 2.179 | don’t reject |

20 | 18 | 0.35 | 0.22 | 1.585 | 2.101 | don’t reject |

30 | 28 | 0.35 | 0.18 | 1.977 | 2.048 | don’t reject |

40 | 38 | 0.35 | 0.15 | 2.303 | 2.024 | reject |

50 | 48 | 0.35 | 0.14 | 2.589 | 2.011 | reject |

5 | 3 | 0.17 | 0.57 | 0.299 | 3.182 | don’t reject |

10 | 8 | 0.17 | 0.35 | 0.488 | 2.306 | don’t reject |

12 | 10 | 0.17 | 0.31 | 0.546 | 2.228 | don’t reject |

14 | 12 | 0.17 | 0.28 | 0.598 | 2.179 | don’t reject |

20 | 18 | 0.17 | 0.23 | 0.732 | 2.101 | don’t reject |

30 | 28 | 0.17 | 0.19 | 0.913 | 2.048 | don’t reject |

40 | 38 | 0.17 | 0.16 | 1.063 | 2.024 | don’t reject |

50 | 48 | 0.17 | 0.14 | 1.195 | 2.011 | don’t reject |

The table addresses at which size level an r value is high enough to be statistically significant.

For n = 14, all three r values (0.50, 0.35, and 0.17) are not statistically different from zero.

For n = 30, r = 0.50 is statistically different from zero while r = 0.35 and r = 0.17 are not.

Conversely, r = 0.50 is not statistically different from zero when n is equal or less than 14 while r = 0.35 is not different from zero when n is equal or less than 30.

Finally, r = 0.17 is not statistically different from zero at any of the sample sizes tested.

Related post: https://irthoughts.wordpress.com/2016/04/18/virus-evolution-citation/

Pingback: On Statistical Significance and SEO Statistical “Studies” « IR Thoughts

GvSparx

said:Can you tell me what is SEr and how is it calculated?

egarcia

said:Standard Error of r, where r is a correlation coefficient. BTW all tutorials are offline but are accessible from the Vault section of the site.