Power Analysis for Mixed Models - Python

Everything so far has assumed observations are independent. That breaks down the moment your data has a natural grouping: students nested in classrooms, patients nested in clinics, measurements nested in subjects. Ignoring that structure inflates your effective sample size and overstates power. A [[concepts/mixed-effects|mixed effects]] model fixes this by adding a random intercept that absorbs between-group variation.

What's new on this rung

family="lme" in the constructor selects the mixed-effects estimator — fit by maximum likelihood, so the summary header reports estimator: MLE;
the random-intercept term goes directly in the formula as (1|classroom);
set_cluster tells the engine about the grouping structure — the cluster variable name, its ICC, and the number of groups.

Everything else — find_power, target_test, verbose=False, .summary() — works exactly as before.

The model

We study a classroom experiment: does a new teaching_method lift score, controlling for prior_gpa? Students are nested in 30 classrooms.

family="lme" selects the mixed estimator;
(1|classroom) in the formula adds a random intercept for classroom;
set_cluster("classroom", ICC=0.15, n_clusters=30) supplies the two numbers that define the grouping: the ICC (intraclass correlation — the fraction of total variance that lies between classrooms; 0.15 means 15% of score variation is between classrooms rather than between students) and n_clusters (how many distinct classrooms exist);
effects follow the same effect-size benchmarks as OLS — 0.10 / 0.25 / 0.40 for small / medium / large.

Note

Mixed models default to 800 simulations rather than the 1,600 used for OLS, because each mixed-model fit is more expensive. Confidence intervals are correspondingly a little wider; increase n_sims explicitly if you need tighter Monte-Carlo uncertainty.

How much power at n = 300?

from mcpower import MCPower

model = MCPower("score = teaching_method + prior_gpa + (1|classroom)", family="lme")
model.set_variable_type("teaching_method=binary")
model.set_effects("teaching_method=0.4, prior_gpa=0.18")
model.set_cluster("classroom", ICC=0.15, n_clusters=30)

result = model.find_power(sample_size=300, target_test="teaching_method, prior_gpa", verbose=False)
print(result.summary())

==================================================
  MCPower · Power Analysis
==================================================
formula: score = teaching_method + prior_gpa + (1|classroom)
estimator: MLE  N=300  sims=800  α=0.05  target=80%
effects: teaching_method=0.40, prior_gpa=0.18

Per-test power
───────────────────────────────────
Test                 Power   Target
───────────────────────────────────
teaching_method      90.9%      80%
prior_gpa            85.8%      80%
───────────────────────────────────

Power & 95% CI
───────────────────────────────────────────
Test                 Power           CI 95%
───────────────────────────────────────────
teaching_method      90.9%   [88.7%, 92.7%]
prior_gpa            85.8%   [83.2%, 88.0%]
───────────────────────────────────────────
95% CIs are Monte-Carlo (Wilson), n_sims=800.

Joint significance distribution
────────────────────────
k     Exactly   At least
────────────────────────
0        1.6%       100%
1       20.1%      98.4%
2       78.2%      78.2%
────────────────────────

Estimator details
  tau_estimate: nan
  boundary_hits: 0
  joint_uncorrected_rate: 0.9812
  joint_corrected_rate: 0.9812
  singular_fit_rate: 0
  boundary_rate_per_component: [0.0]

Plots: result.plot() to view, result.plot('chart.png') to save.

Both targets are comfortably above 80%: teaching_method lands at 90.9% and prior_gpa at 85.8%. The joint significance distribution shows a 78.2% chance of detecting both effects in the same study — close to the 80% target, so N = 300 is a reasonable budget when both effects matter.

The estimator details block reports boundary_hits: 0 — no simulation hit a variance boundary during fitting. Boundary hits show up more often in smaller samples, where mixed-model fitting is numerically demanding, so seeing none here is reassuring at this sample size.

Tip

ICC values are rarely known precisely in advance. Run a few scenarios with ICC=0.05, ICC=0.15, and ICC=0.30 to see how sensitive your power estimate is to that assumption — the same way you would stress-test effect sizes.

Going further

Repeated measures

A repeated-measures (within-subjects) design — each participant measured across many trials — is a clustered design with the participant as the cluster. The catch is the sample size: n counts measurements, not participants, so n = participants × trials-per-participant. Sixty participants on 100 trials each is sample_size=6000 with n_clusters=60 — passing sample_size=60 would model one measurement per person:

# 60 participants × 100 trials = 6000 measurements
model = MCPower("rt = condition + (1|participant)", family="lme")
model.set_cluster("participant", ICC=0.30, n_clusters=60)
result = model.find_power(sample_size=6000, target_test="condition", verbose=False)

Extra trials sharpen a within-participant predictor (one rotated trial by trial, estimated against the within-person residual) but do nothing for a between-participant one (group, age — constant within a person). Mark between-participant predictors as cluster-level so the engine uses the participant count as their effective sample size:

# `group` is between-participant; `condition` stays within-participant
model = MCPower("rt = condition + group + (1|participant)", family="lme")
model.set_cluster("participant", ICC=0.30, n_clusters=60,
                  cluster_level_vars=["group"])

See Repeated measures for the within- vs between-participant power discussion.

Cluster-level predictors

If your primary predictor is assigned at the group level — a treatment applied to all students in a classroom, a policy rolled out site-wide — declare it in set_cluster so the engine uses n_clusters (not total N) as the effective sample size:

model.set_cluster("site", ICC=0.15, n_clusters=30,
                  cluster_level_vars=["treatment"])

Halving n_clusters at fixed total N roughly doubles the SE of a cluster-level predictor while barely affecting within-cluster predictors. See Cluster-level predictors for the design-effect discussion.

Crossed and nested groupings

For crossed designs (e.g. participants × stimuli) or nested designs (classrooms within schools), name every grouping factor in the formula and call set_cluster once per factor:

# Crossed: 24 subjects × 12 items — formula "rt ~ frequency + (1|subject) + (1|item)"
model.set_cluster("subject", ICC=0.20, n_clusters=24)
model.set_cluster("item", ICC=0.15, n_clusters=12)

For crossed groupings, N must be a multiple of the design atom (n_subjects × n_items); for nested, call set_cluster once for the outer factor and once for the "school:classroom" child with n_per_parent. See Multiple groupings.

Random slopes

A random intercept lets each group start at a different baseline; a random slope lets the effect itself vary across groups. Declare it in the formula as (1 + x|group) and give it a variance in set_cluster — the extra variability widens the SE of the average effect, so treating a varying slope as fixed overstates power:

model = MCPower("rt = dose + (1 + dose|subject)", family="lme")
model.set_cluster("subject", ICC=0.20, n_clusters=30,
                  random_slopes=["dose"], slope_variance=0.15)

Random-slope fits hit variance boundaries more often than intercept-only ones, so watch the boundary-hit rate in .summary(). See Random slopes.

Clustered logistic (GLMM)

A binary outcome with clustering is a logistic GLMM — combine family="logit" with a (1|group) term, where the ICC is read on the log-odds (latent) scale. This composes with the cluster-level trick above: a treatment randomised at the clinic level in a binary-outcome trial is just family="logit" plus cluster_level_vars:

model = MCPower("recovered = treatment + baseline_severity + (1|clinic)", family="logit")
model.set_baseline_probability(0.3)
model.set_cluster("clinic", ICC=0.10, n_clusters=40, cluster_level_vars=["treatment"])

See Clustered logistic (GLMM).

Scenario stress-testing for mixed models

Mixed models support three scenario knobs: random_effect_dist (normal / heavy_tailed / right_skewed), random_effect_df, and icc_noise_sd. Use them to probe how sensitive power is to non-Gaussian random effects or an uncertain ICC:

model.set_scenario_configs({
    "re_stress": {"random_effect_dist": "heavy_tailed",
                  "random_effect_df": 5, "icc_noise_sd": 0.07}
})
model.find_power(sample_size=600, target_test="teaching_method",
                 scenarios=["optimistic", "re_stress"])

See Mixed-model scenario knobs.

next → ANOVA & post-hoc