ICC reliability hypothesis testing

Expected ICC:
Minimum acceptable ICC:
Number of raters (k):
Desired power (1 – β):
Significance level (α, two-sided):
Required sample size (n):

Approximate number of subjects required to test an intraclass correlation coefficient (ICC) in a one-way ANOVA model with desired power.

Can be used to estimate the required approximate sample size (n) for inter-rater or test-retest reliability studies. For example: for a reliability study with two raters (or two repeated measurements), an expected ICC value of 0.8 and a desired power of 0.8 (80%), 49 subjects are needed to demonstrate that this ICC value is significantly different from a minimum acceptable ICC value of 0.6 at a two-tailed significance level of 0.05.

Note: In most cases a one-sided hypothesis test will make more sense. For a one-tailed hypothesis test, multiply the desired significance level by two (e.g., α = 0.05 * 2 = 0.1). In the example above, for a one-tailed test the required sample size is 39 subjects).

Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies. Stat Med. 1998;17(1):101-10.

Calulator as Shiny app