Power Analysis for Mixed Models in R

Everything so far has assumed observations are independent. That breaks down the moment your data has a natural grouping: students nested in classrooms, patients nested in clinics, measurements nested in subjects. Ignoring that structure inflates your effective sample size and overstates power. A [[concepts/mixed-effects|mixed effects]] model fixes this by adding a random intercept that absorbs between-group variation.

What's new on this rung

family = "lme" in the constructor selects the mixed-effects estimator — fit by maximum likelihood, so the summary header reports estimator: MLE;
the random-intercept term goes directly in the formula as (1|classroom);
set_cluster tells the engine about the grouping structure — the cluster variable name, its ICC, and the number of groups.

Everything else — find_power, target_test, verbose = FALSE, summary() — works exactly as before.

The model

We study a classroom experiment: does a new teaching_method lift score, controlling for prior_gpa? Students are nested in 30 classrooms.

family = "lme" selects the mixed estimator;
(1|classroom) in the formula adds a random intercept for classroom;
model$set_cluster("classroom", ICC = 0.15, n_clusters = 30) supplies the two numbers that define the grouping: the ICC (intraclass correlation — the fraction of total variance that lies between classrooms; 0.15 means 15% of score variation is between classrooms rather than between students) and n_clusters (how many distinct classrooms exist);
effects follow the same effect-size benchmarks as OLS — 0.10 / 0.25 / 0.40 for small / medium / large.

Note

Mixed models default to 800 simulations rather than the 1,600 used for OLS, because each mixed-model fit is more expensive. Confidence intervals are correspondingly a little wider; increase n_sims explicitly if you need tighter Monte-Carlo uncertainty.

How much power at n = 300?

library(mcpower)

model <- MCPower$new("score ~ teaching_method + prior_gpa + (1|classroom)", family = "lme")
model$set_variable_type("teaching_method=binary")
model$set_effects("teaching_method=0.4, prior_gpa=0.18")
model$set_cluster("classroom", ICC = 0.15, n_clusters = 30)

result <- model$find_power(sample_size = 300, target_test = "teaching_method, prior_gpa", verbose = FALSE)
print(summary(result))

==================================================
  MCPower · Power Analysis
==================================================
formula: score ~ teaching_method + prior_gpa + (1|classroom)
estimator: MLE  N=300  sims=800  α=0.05  target=80%
effects: teaching_method=0.40, prior_gpa=0.18

Per-test power
───────────────────────────────────
Test                 Power   Target
───────────────────────────────────
teaching_method      90.9%      80%
prior_gpa            85.8%      80%
───────────────────────────────────

Power & 95% CI
───────────────────────────────────────────
Test                 Power           CI 95%
───────────────────────────────────────────
teaching_method      90.9%   [88.7%, 92.7%]
prior_gpa            85.8%   [83.2%, 88.0%]
───────────────────────────────────────────
95% CIs are Monte-Carlo (Wilson), n_sims=800.

Joint significance distribution
────────────────────────
k     Exactly   At least
────────────────────────
0        1.6%       100%
1       20.1%      98.4%
2       78.2%      78.2%
────────────────────────

Estimator details
  tau_estimate: nan
  boundary_hits: 0
  joint_uncorrected_rate: 0.9812
  joint_corrected_rate: 0.9812
  singular_fit_rate: 0
  boundary_rate_per_component: [0.0]

Plots: plot(result) to view, plot(result, 'chart.png') to save.

Both targets are comfortably above 80%: teaching_method lands at 90.9% and prior_gpa at 85.8%. The joint significance distribution shows a 78.2% chance of detecting both effects in the same study — close to the 80% target, so N = 300 is a reasonable budget when both effects matter.

The estimator details block reports boundary_hits: 0 — no simulation hit a variance boundary during fitting. Boundary hits show up more often in smaller samples, where mixed-model fitting is numerically demanding, so seeing none here is reassuring at this sample size.

Tip

ICC values are rarely known precisely in advance. Run a few scenarios with ICC = 0.05, ICC = 0.15, and ICC = 0.30 to see how sensitive your power estimate is to that assumption — the same way you would stress-test effect sizes.

Going further

Repeated measures

A repeated-measures (within-subjects) design — each participant measured across many trials — is a clustered design with the participant as the cluster. The catch is the sample size: n counts measurements, not participants, so n = participants × trials-per-participant. Sixty participants on 100 trials each is sample_size = 6000 with n_clusters = 60 — passing sample_size = 60 would model one measurement per person:

# 60 participants × 100 trials = 6000 measurements
model <- MCPower$new("rt ~ condition + (1|participant)", family = "lme")
model$set_cluster("participant", ICC = 0.30, n_clusters = 60)
result <- model$find_power(sample_size = 6000, target_test = "condition", verbose = FALSE)

Extra trials sharpen a within-participant predictor (one rotated trial by trial, estimated against the within-person residual) but do nothing for a between-participant one (group, age — constant within a person). Mark between-participant predictors as cluster-level so the engine uses the participant count as their effective sample size:

# `group` is between-participant; `condition` stays within-participant
model <- MCPower$new("rt ~ condition + group + (1|participant)", family = "lme")
model$set_cluster("participant", ICC = 0.30, n_clusters = 60,
                  cluster_level_vars = c("group"))

See Repeated measures for the within- vs between-participant power discussion.

Cluster-level predictors

If your primary predictor is assigned at the group level — a treatment applied to all students in a classroom, a policy rolled out site-wide — declare it in set_cluster so the engine uses n_clusters (not total N) as the effective sample size:

model$set_cluster("site", ICC = 0.15, n_clusters = 30,
                  cluster_level_vars = c("treatment"))

Halving n_clusters at fixed total N roughly doubles the SE of a cluster-level predictor while barely affecting within-cluster predictors. See Cluster-level predictors for the design-effect discussion.

Crossed and nested groupings

For crossed designs (e.g. participants × stimuli) or nested designs (classrooms within schools), name every grouping factor in the formula and call set_cluster once per factor:

# Crossed: 24 subjects × 12 items — formula "rt ~ frequency + (1|subject) + (1|item)"
model$set_cluster("subject", ICC = 0.20, n_clusters = 24)
model$set_cluster("item", ICC = 0.15, n_clusters = 12)

For crossed groupings, N must be a multiple of the design atom (n_subjects × n_items); for nested, call set_cluster once for the outer factor and once for the "school:classroom" child with n_per_parent. See Multiple groupings.

Random slopes

A random intercept lets each group start at a different baseline; a random slope lets the effect itself vary across groups. Declare it in the formula as (1 + x|group) and give it a variance in set_cluster — the extra variability widens the SE of the average effect, so treating a varying slope as fixed overstates power:

model <- MCPower$new("rt ~ dose + (1 + dose|subject)", family = "lme")
model$set_cluster("subject", ICC = 0.20, n_clusters = 30,
                  random_slopes = list(list(predictor = "dose", variance = 0.15)))

Random-slope fits hit variance boundaries more often than intercept-only ones, so watch the boundary-hit rate in summary(). See Random slopes.

Clustered logistic (GLMM)

A binary outcome with clustering is a logistic GLMM — combine family = "logit" with a (1|group) term, where the ICC is read on the log-odds (latent) scale. This composes with the cluster-level trick above: a treatment randomised at the clinic level in a binary-outcome trial is just family = "logit" plus cluster_level_vars:

model <- MCPower$new("recovered ~ treatment + baseline_severity + (1|clinic)", family = "logit")
model$set_baseline_probability(0.3)
model$set_cluster("clinic", ICC = 0.10, n_clusters = 40, cluster_level_vars = c("treatment"))

See Clustered logistic (GLMM).

Scenario stress-testing for mixed models

Mixed models support three scenario knobs: random_effect_dist (normal / heavy_tailed / right_skewed), random_effect_df, and icc_noise_sd. Use them to probe how sensitive power is to non-Gaussian random effects or an uncertain ICC:

model$set_scenario_configs(list(
    re_stress = list(random_effect_dist = "heavy_tailed",
                     random_effect_df = 5, icc_noise_sd = 0.07)
))
model$find_power(sample_size = 600, target_test = "teaching_method",
                 scenarios = c("optimistic", "re_stress"))

See Mixed-model scenario knobs.

next → ANOVA & post-hoc