What this report shows

MCPower’s scenario analysis deliberately stresses a planned study: instead of simulating the idealised design you specified, each scenario perturbs it — slopes wobble between observations, residual variance trends with the predictors, correlations differ from the plan, predictor and residual distributions get swapped, group allocation becomes random. Each perturbation is controlled by a knob in configs/scenarios.json, and each knob documents a precise statistical law for the disturbance it injects.

This report checks that every knob generates exactly its documented law — no more, no less. It is the L5 layer of the validation charter: the L3 reports prove the optimistic (unperturbed) generator faithful against the spec as a point oracle; here the spec value is deliberately randomised, so the oracle is the perturbation law itself (including its documented distortions: the ±0.8 correlation clamp, PSD repair, the censored t(3) table, the stale heteroskedasticity anchor).

The gate doctrine. Every gate is set == get on the realised magnitude: the knob’s value, recovered from the generated data by a probe, must match the documented law within an SE-of-mean z-band of 4 (each case draws K independent perturbation blocks; a mis-scaled knob lands tens of σ out, chance alone exceeds the band once in ~10,000 gates). Two checks are explicitly not gates: monotone power across presets (a readable summary that washes out real faults) and a global error-variance invariant (heterogeneity is supposed to inflate it). The β̂-unbiasedness backstop (B1) — including the intercept — is kept as a cheap mean-leak tripwire: a mis-centred swapped distribution lands in β̂₀ while every effect estimate stays clean.

Results at a glance

72 of 72 gates pass. Golden reproducibility: all cases reproduce.

Case Verdict Golden check
fg_glm_flip all PASS reproduces
fg_glm_ident all PASS reproduces
scen_b0 all PASS reproduces
scen_b1_glm all PASS reproduces
scen_b1_ols all PASS reproduces
scen_b2 all PASS reproduces
scen_b3 all PASS reproduces
scen_b4 all PASS reproduces
scen_b5 all PASS reproduces
scen_co_high all PASS reproduces
scen_co_low all PASS reproduces
scen_co_psd all PASS reproduces
scen_fa_mle all PASS reproduces
scen_fa_mle_fp all PASS reproduces
scen_fa_ols all PASS reproduces
scen_he all PASS reproduces
scen_hs all PASS reproduces
scen_px all PASS reproduces
scen_px_t3 all PASS reproduces
scen_re all PASS reproduces
scen_re_replace all PASS reproduces

Tier A — one knob at a time

Each case turns on a single knob on an otherwise-optimistic design, draws K independent perturbation blocks, and recovers the knob’s magnitude with a probe matched to its law.

Heterogeneity (slope wobble)

Effects vary per observation: βⱼ + N(0, (h·βⱼ)²). The probe regresses squared true-β residuals on each squared predictor — the slope recovers h²βⱼ² per predictor, separating He from anything that only moves pooled moments.

Case Gate Measured Law z Verdict
jitter slope on x1² = h²β² (h=0.4) scen_he jitter slope on x1² = h²β² (h=0.4) 0.02491 0.0256 -0.25106 PASS
jitter slope on x2² = h²β² (h=0.4) scen_he jitter slope on x2² = h²β² (h=0.4) 0.00830 0.0100 -1.07315 PASS

Heteroskedasticity (residual-variance trend)

Residual variance follows Var(εᵢ) = σ²·exp(γzᵢ)/exp(γ²/2) with γ = ln(λ)/4, z the standardised driver. log e² is then linear in z with slope exactly γ — shape-blind, so the same probe serves B3. The realised ±2σ ratio is exp(4γ̂).

Case Gate Measured Law z Verdict
log-e² slope γ = ln(λ)/4 scen_hs log-e² slope γ = ln(λ)/4 0.34797 0.34657 0.60302 PASS

Realised λ̂ = exp(4γ̂) = 4.022 (set: 4); the raw ±2σ binned variance ratio reads 3.95 (reported only — finite bins make its law approximate; the slope is the gate).

Correlation noise

Per block the off-diagonals get symmetrised Gaussian noise — symmetrisation halves the variance, so the per-block ρ law is N(ρ, s²/2) censored at ±0.8, plus finite-n sampling noise in quadrature. The low-ρ case gates the exact law; the high-ρ case sits against the clamp, gating the censored-normal truncation law itself (mean visibly below the naive ρ).

Case Gate Measured Law z Verdict
block-r mean = ±0.8-clamp censored law scen_co_low block-r mean = ±0.8-clamp censored law 0.29620 0.30000 -0.68832 PASS
block-r SD = √(s²/2 + (1−ρ²)²/n) scen_co_low block-r SD = √(s²/2 + (1−ρ²)²/n) 0.11048 0.10704 0.88024 PASS
block-r mean = ±0.8-clamp censored law1 scen_co_high block-r mean = ±0.8-clamp censored law 0.73059 0.72807 0.94848 PASS
block-r SD = √(s²/2 + (1−ρ²)²/n)1 scen_co_high block-r SD = √(s²/2 + (1−ρ²)²/n) 0.07522 0.07873 -1.86440 PASS

At ρ = 0.75, s = 0.15: the censored law predicts mean 0.7281 — the −0.0219 shift below the nominal ρ is the documented clamp truncation, and the realised mean lands on it.

PSD repair (3 predictors, high ρ)

Repair cannot fire at p = 2 (any clamped 2×2 is PD), so a 3-variable all-0.6 design under s = 0.3 is where eigenvalue-floor + diagonal-renormalisation distortion lives. There is no closed-form law; the empirical per-pair moments are frozen as MCPower-golden (table below) and re-checked on every run.

r12_mean r13_mean r23_mean r12_sd r13_sd r23_sd
0.58466 0.56899 0.55664 0.17938 0.18598 0.19204

Distribution swaps (predictors)

Each continuous-normal column is swapped per block with probability q to a uniform pick from the pool; every pool candidate is standardised (mean 0, var 1), so a swap perturbs shape only — a mis-centred candidate would be a mean leak straight into β̂₀.

Case Gate Measured Law z Verdict
swap frequency = q scen_px swap frequency = q 0.53000 0.50000 1.65979 PASS
pick share right_skewed = 1/3 scen_px pick share right_skewed = 1/3 0.37264 0.33333 1.71701 PASS
pick share left_skewed = 1/3 scen_px pick share left_skewed = 1/3 0.29245 0.33333 -1.78569 PASS
pick share uniform = 1/3 scen_px pick share uniform = 1/3 0.33491 0.33333 0.06868 PASS
right_skewed mean = 0 scen_px right_skewed mean = 0 -0.00186 0.00000 -1.44183 PASS
right_skewed var = 1 scen_px right_skewed var = 1 0.99895 1.00000 -0.32336 PASS
left_skewed mean = 0 scen_px left_skewed mean = 0 0.00075 0.00000 0.51932 PASS
left_skewed var = 1 scen_px left_skewed var = 1 0.99996 1.00000 -0.01066 PASS
uniform mean = 0 scen_px uniform mean = 0 -0.00133 0.00000 -0.98269 PASS
uniform var = 1 scen_px uniform var = 1 1.00042 1.00000 0.34051 PASS

Block classifications: left_skewed 124, normal 376, right_skewed 158, uniform 142.

Custom pool: high_kurtosis

The only swappable marginal outside the presets. Its engine identity is a censored standardised t(3): a 2048-knot inverse-CDF table on percentiles [0.00121, 0.99879], normalised at build to the censored table’s own SD (1.5958 raw, vs √3 ≈ 1.7321 for the full t(3)) — so the marginal has exactly unit variance, excess kurtosis ≈ 6.39, and support ±6.0 SD. The censoring is deliberate: it bounds every synthetic marginal at ±6 SD while keeping this the heaviest-tailed shape (t(3)’s own kurtosis is infinite). v1 — and this engine until 2026-06 — divided by √3, which standardises the full t(3) and left the censored marginal at var ≈ 0.858, a silent ~14% effect-size shrink for every high-kurtosis predictor; the L5 gate below caught it, and the table is now normalised by construction.

Case Gate Measured Law z Verdict
every block swapped (q = 1) scen_px_t3 every block swapped (q = 1) 1.00000 1 NA PASS
t3 mean = 0 scen_px_t3 t3 mean = 0 -0.00194 0 -1.67897 PASS
t3 var = 1 (table-normalized) scen_px_t3 t3 var = 1 (table-normalized) 0.99882 1 -0.37682 PASS

Residual swaps

With probability q_r the block’s residual distribution is replaced — distribution and df — by a pool pick (t(df)·√((df−2)/df) or (χ²(df)−df)/√(2df)). The shape laws (skew = √(8/df), excess kurtosis = 6/(df−4)) recover the df, proving the scenario’s df is carried, not the spec’s.

Case Gate Measured Law z Verdict
swap frequency = q_r scen_re swap frequency = q_r 0.51000 0.50000 0.39958 PASS
pick share heavy_tailed = 1/2 scen_re pick share heavy_tailed = 1/2 0.51471 0.50000 0.42008 PASS
pick share skewed = 1/2 scen_re pick share skewed = 1/2 0.48529 0.50000 -0.42008 PASS
skewed: skew = √(8/df), df = 10 scen_re skewed: skew = √(8/df), df = 10 0.88878 0.89443 -0.85425 PASS
heavy: excess kurtosis = 6/(df−4), df = 10 scen_re heavy: excess kurtosis = 6/(df−4), df = 10 1.04341 1.00000 1.59313 PASS
pin holds: residual stays high_kurtosis (right_skewed swap inert) scen_re_replace pin holds: residual stays high_kurtosis (right_skewed swap inert) 0.98000 1.00000 NA PASS
symmetric: skew = 0 (pinned t(6), not the χ² swap) scen_re_replace symmetric: skew = 0 (pinned t(6), not the χ² swap) -0.01593 0.00000 -0.95569 PASS
mean = 0 scen_re_replace mean = 0 0.00140 0.00000 1.20001 PASS
var = 1 (t(6) standardized) scen_re_replace var = 1 (t(6) standardized) 0.99766 1.00000 -0.92347 PASS

The pinned case (scen_re_replace) verifies the swap-eligibility rule: it pins the spec residual with set_residual_distribution("high_kurtosis"), then configures a forced right_skewed swap (q_r = 1). Because pick_residual only swaps an unpinned default-normal residual, the swap is inert — every draw keeps the pinned, symmetric censored-t3 high_kurtosis residual (skew ≈ 0, table-normalised to var 1), never the χ²(6) the config asks for. The skew = 0 gate is the tripwire: a fired swap would force skew = √(8/6) ≈ 1.15.

Factor-proportion sampling

sampled_factor_proportions = FALSE (the optimistic default) assigns factor levels by a deterministic largest-remainder walk — counts are a pure function of (n, p), identical across draws, each within 1 of n·p. TRUE draws levels per row: counts are Binomial(n, p) with variance n·p(1−p).

Case Gate Measured Law z Verdict
fixed: counts identical across draws (no RNG) scen_fa_ols fixed: counts identical across draws (no RNG) 1.0000 1 NA PASS
fixed: max |count − n·p| ≤ 1 (largest remainder) scen_fa_ols fixed: max |count − n·p| ≤ 1 (largest remainder) 0.0000 1 NA PASS
sampled: Var(count) level 1 = n·p(1−p) scen_fa_ols sampled: Var(count) level 1 = n·p(1−p) 243.7875 250 -0.3935 PASS
sampled: Var(count) level 2 = n·p(1−p) scen_fa_ols sampled: Var(count) level 2 = n·p(1−p) 182.2775 210 -2.1712 PASS
sampled: Var(count) level 3 = n·p(1−p) scen_fa_ols sampled: Var(count) level 3 = n·p(1−p) 175.0250 160 1.3196 PASS
fixed: counts identical across draws (no RNG)1 scen_fa_mle fixed: counts identical across draws (no RNG) 1.0000 1 NA PASS
fixed: max |count − n·p| ≤ 1 (largest remainder)1 scen_fa_mle fixed: max |count − n·p| ≤ 1 (largest remainder) 0.0000 1 NA PASS
sampled: Var(count) level 1 = n·p(1−p)1 scen_fa_mle sampled: Var(count) level 1 = n·p(1−p) 220.4100 250 -1.3942 PASS
sampled: Var(count) level 2 = n·p(1−p)1 scen_fa_mle sampled: Var(count) level 2 = n·p(1−p) 194.0300 210 -0.9094 PASS
sampled: Var(count) level 3 = n·p(1−p)1 scen_fa_mle sampled: Var(count) level 3 = n·p(1−p) 158.4900 160 -0.1044 PASS
find_power accepts Fa toggle under estimator = Mle scen_fa_mle_fp find_power accepts Fa toggle under estimator = Mle 1.0000 1 NA PASS

Fixed counts at n = 1000: 500/300/200 against expected 500/300/200. The MLE rows are the one scenario knob the mixed-model estimator admits (every other knob is rejected by the engine’s estimator gate — a deterministic L1 assertion owned by the engine test suite, not re-tested here).

Tier B — knob interactions

The shipped presets co-vary every knob, so B2–B4 isolate single interactions through custom scenario pairs on shared seeds (P1 pairing: two scenarios at the same seed draw the same raw noise streams, so cross-scenario deltas are knob-attributable).

B0 — Optimistic ≡ baseline

Case Gate Measured Law z Verdict
optimistic ≡ non-scenario path (bit-identical PowerResult) scen_b0 optimistic ≡ non-scenario path (bit-identical PowerResult) 1 1 NA PASS

The optimistic member of a three-scenario paired call is bit-identical to the plain non-scenario call — the DGP-level companion to the orchestrator’s call-seed test (power here: 0.948, 0.28).

B1 — β̂ unbiased across the presets (leak backstop)

Case Gate Measured Law z Verdict
optimistic: β̂[intercept] unbiased scen_b1_ols optimistic: β̂[intercept] unbiased -0.0025 0.0000 -2.1096 PASS
optimistic: β̂[x1] unbiased scen_b1_ols optimistic: β̂[x1] unbiased 0.3994 0.4000 -0.4687 PASS
optimistic: β̂[x2] unbiased scen_b1_ols optimistic: β̂[x2] unbiased 0.2499 0.2500 -0.1007 PASS
realistic: β̂[intercept] unbiased scen_b1_ols realistic: β̂[intercept] unbiased 0.0001 0.0000 0.0560 PASS
realistic: β̂[x1] unbiased scen_b1_ols realistic: β̂[x1] unbiased 0.3952 0.4000 -1.0430 PASS
realistic: β̂[x2] unbiased scen_b1_ols realistic: β̂[x2] unbiased 0.2500 0.2500 0.0007 PASS
doomer: β̂[intercept] unbiased scen_b1_ols doomer: β̂[intercept] unbiased 0.0009 0.0000 0.7089 PASS
doomer: β̂[x1] unbiased scen_b1_ols doomer: β̂[x1] unbiased 0.4034 0.4000 0.3641 PASS
doomer: β̂[x2] unbiased scen_b1_ols doomer: β̂[x2] unbiased 0.2496 0.2500 -0.0609 PASS
optimistic: β̂[intercept] = per-study pseudo-true scen_b1_glm optimistic: β̂[intercept] = per-study pseudo-true -0.4050 -0.4055 0.1720 PASS
optimistic: β̂[x1] = per-study pseudo-true scen_b1_glm optimistic: β̂[x1] = per-study pseudo-true 0.5007 0.5000 0.2342 PASS
optimistic: β̂[x2] = per-study pseudo-true scen_b1_glm optimistic: β̂[x2] = per-study pseudo-true 0.2988 0.3000 -0.4579 PASS
realistic: β̂[intercept] = per-study pseudo-true scen_b1_glm realistic: β̂[intercept] = per-study pseudo-true -0.3905 -0.4055 1.1979 PASS
realistic: β̂[x1] = per-study pseudo-true scen_b1_glm realistic: β̂[x1] = per-study pseudo-true 0.5066 0.5000 1.0446 PASS
realistic: β̂[x2] = per-study pseudo-true scen_b1_glm realistic: β̂[x2] = per-study pseudo-true 0.2992 0.3000 -0.1913 PASS
doomer: β̂[intercept] = per-study pseudo-true scen_b1_glm doomer: β̂[intercept] = per-study pseudo-true -0.3944 -0.4055 0.4703 PASS
doomer: β̂[x1] = per-study pseudo-true scen_b1_glm doomer: β̂[x1] = per-study pseudo-true 0.4939 0.5004 -0.5737 PASS
doomer: β̂[x2] = per-study pseudo-true scen_b1_glm doomer: β̂[x2] = per-study pseudo-true 0.2983 0.3002 -0.2780 PASS

OLS rows gate on the z-band, intercept included — the mean-leak tripwire (linear averaging keeps OLS β̂ exactly unbiased under every knob). Logit rows need a different law: the heterogeneity β-jitter is drawn once per study, so each study’s data is a clean logit at its own β_eff and the MLE recovers it — averaged over the K studies the fitted coefficient → E[β_eff], not the attenuated coefficient a per-observation population-averaged marginal would show. The clip toward zero (s_j = h·|β_j|) nudges each slope’s magnitude up by ×(Φ(1/h)+h·φ(1/h)) ≈ ×1.0008 at doomer’s h = 0.4; the symmetric unclipped intercept jitter leaves β_0 unchanged. So the logit law is this per-study pseudo-true value (glm_perstudy_beta), gated on the absolute band ±0.02. The GLM calibration gates remain Tier A and the flip rate below.

B2 — He × Hs separation

The β-jitter variance ∝ xᵢⱼ²βⱼ² must be present at λ = 1 and unchanged by a λ toggle (λ is driven by the clean linear predictor, never the jittered one). The λ driver is pinned to x2, so the x1²-decomposed jitter variance is uncontaminated by the λ channel’s even cosh component; the paired Δ across {h = 0.4, λ = 1} vs {h = 0.4, λ = 4} isolates the interaction.

Case Gate Measured Law z Verdict
x1 jitter slope at λ=1 = h²β₁² scen_b2 x1 jitter slope at λ=1 = h²β₁² 0.02174 0.0256 -1.77201 PASS
x1 jitter slope λ-invariant (paired Δ = 0) scen_b2 x1 jitter slope λ-invariant (paired Δ = 0) 0.00002 0.0000 0.04726 PASS

The driver column shows why the pin matters: x2²-slope reads 0.0095 at λ = 1 (law h²β₂² = 0.01) but inflates to 0.0689 at λ = 4 — the cosh(γz) contamination the naive probe would misread as an He × Hs interaction.

B3 — Hs × Re preservation

Case Gate Measured Law z Verdict
γ = ln(λ)/4 under forced high_kurtosis residual scen_b3 γ = ln(λ)/4 under forced high_kurtosis residual 0.34263 0.34657 -1.62754 PASS
γ = ln(λ)/4 under forced right_skewed residual scen_b3 γ = ln(λ)/4 under forced right_skewed residual 0.35076 0.34657 1.69386 PASS

Under a forced t(10) or χ²(10) residual the log-e² slope still reads ln(λ)/4 — the multiplicative variance trend amplifies the swapped tails without bending the ratio.

B4 — Co × Px (NORTA)

Correlation is induced on the latent normals; a swapped marginal transforms them, so the realised Pearson r follows the NORTA law, not the latent spec value. The oracle is computed numerically in R (Gauss–Hermite over the latent bivariate normal; it matches the closed forms (e^ρ−1)/(e−1) and (6/π)asin(ρ/2) to 7 digits).

Case Gate Measured Law z Verdict
right_skewed pair: r = NORTA 0.4541 (latent ρ = 0.50) scen_b4 right_skewed pair: r = NORTA 0.4541 (latent ρ = 0.50) 0.45459 0.45410 0.41461 PASS
uniform pair: r = NORTA 0.4826 (latent ρ = 0.50) scen_b4 uniform pair: r = NORTA 0.4826 (latent ρ = 0.50) 0.48296 0.48258 0.44450 PASS

B5 — Heteroskedasticity-anchor drift (P6), measured and bounded

het_coeffs (the driver moments that standardise the λ driver) is computed once from the base spec; correlation noise moves the realised driver SD σ′ per block while the anchor σ₀ stays put, so the realised ±2σ′ ratio drifts to λ′ = exp(4γσ′/σ₀). This is an accepted approximation — the gates prove the mechanism and bound the drift; they do not fail on the drift itself.

Case Gate Measured Law z Verdict
slope magnitude = γ/σ₀ (stale spec anchor) scen_b5 slope magnitude = γ/σ₀ (stale spec anchor) 0.65938 0.65206 1.59849 PASS
staleness: lm(λ̂′ ~ λ′ predicted) slope = 1 scen_b5 staleness: lm(λ̂′ ~ λ′ predicted) slope = 1 0.90100 1.00000 -0.95987 PASS
drift bounded: max λ′ ≤ clamp-range bound scen_b5 drift bounded: max λ′ ≤ clamp-range bound 5.04465 5.39376 NA PASS
Preset λ set λ′ 5% λ′ median λ′ 95%
realistic 2 1.900 1.988 2.080
doomer 4 3.343 3.960 4.627

The staleness gate is the discriminating one: per block, the measured ratio λ̂′ tracks the moment-predicted λ′ = exp(4γσ̂′/σ₀) with regression slope ≈ 1. (A per-block recompute of the anchor would pin λ̂′ at λ, slope ≈ 0 — ~11σ away.) Note the mean slope alone cannot tell the two designs apart (γ·E[1/σ′] ≈ γ/σ₀ to 0.1%); the per-block tracking is what evidences the stale anchor. The preset table quantifies the documented drift: under realistic/doomer the effective λ′ wobbles around the set λ by the quantiles shown — second-order next to the perturbations themselves.

Family gating

Knobs must be live or inert exactly per family.

Case Gate Measured Law z Verdict
bit-identity under logit: optimistic ≡ glm_l4 fg_glm_ident bit-identity under logit: optimistic ≡ glm_l4 1.00000 1 NA PASS
bit-identity under logit: glm_l4 ≡ glm_l4re fg_glm_ident bit-identity under logit: glm_l4 ≡ glm_l4re 1.00000 1 NA PASS
h-toggle flip rate = MC prediction (paired Δ = 0) fg_glm_flip h-toggle flip rate = MC prediction (paired Δ = 0) 0.00234 0 0.6365 PASS
  • GLM λ / residual swaps — inert by bit-identity. apply_hsk requires a continuous outcome and consumes no RNG, so a λ toggle is an exact no-op under logit; a forced residual swap only consumes scenario-stream draws after every other consumer, leaving the Bernoulli outcomes bit-identical.
  • GLM heterogeneity — live, latent. The per-row log-odds jitter is hidden behind a single Bernoulli draw, so the observable is the paired h-toggle flip rate: X and the uniforms are drawn before the jitter normals, the pair shares them bit-identically, and P(flip) = E|p_h − p₀| — predicted by numerical integration row by row. Realised flip rate 0.0827 vs predicted 0.0804. The Jensen mean-rate shift (+0.0051) is expected and reported, not gated — median latent-rate invariance is unobservable.
  • MLE — only sampled_factor_proportions is live (see the Fa section above); every other knob is rejected by the engine’s estimator gate.

Verdict table — every gate

Case Gate Measured Law z Verdict
scen_he jitter slope on x1² = h²β² (h=0.4) 0.0249 0.0256 -0.2511 PASS
scen_he jitter slope on x2² = h²β² (h=0.4) 0.0083 0.0100 -1.0731 PASS
scen_hs log-e² slope γ = ln(λ)/4 0.3480 0.3466 0.6030 PASS
scen_co_low block-r mean = ±0.8-clamp censored law 0.2962 0.3000 -0.6883 PASS
scen_co_low block-r SD = √(s²/2 + (1−ρ²)²/n) 0.1105 0.1070 0.8802 PASS
scen_co_high block-r mean = ±0.8-clamp censored law 0.7306 0.7281 0.9485 PASS
scen_co_high block-r SD = √(s²/2 + (1−ρ²)²/n) 0.0752 0.0787 -1.8644 PASS
scen_co_psd PSD-repaired r ∈ [0.45, 0.60] (shrinks below ρ=0.6 input, stays positive) 0.5566 0.4500 NA PASS
scen_px swap frequency = q 0.5300 0.5000 1.6598 PASS
scen_px pick share right_skewed = 1/3 0.3726 0.3333 1.7170 PASS
scen_px pick share left_skewed = 1/3 0.2925 0.3333 -1.7857 PASS
scen_px pick share uniform = 1/3 0.3349 0.3333 0.0687 PASS
scen_px right_skewed mean = 0 -0.0019 0.0000 -1.4418 PASS
scen_px right_skewed var = 1 0.9989 1.0000 -0.3234 PASS
scen_px left_skewed mean = 0 0.0007 0.0000 0.5193 PASS
scen_px left_skewed var = 1 1.0000 1.0000 -0.0107 PASS
scen_px uniform mean = 0 -0.0013 0.0000 -0.9827 PASS
scen_px uniform var = 1 1.0004 1.0000 0.3405 PASS
scen_px_t3 every block swapped (q = 1) 1.0000 1.0000 NA PASS
scen_px_t3 t3 mean = 0 -0.0019 0.0000 -1.6790 PASS
scen_px_t3 t3 var = 1 (table-normalized) 0.9988 1.0000 -0.3768 PASS
scen_re swap frequency = q_r 0.5100 0.5000 0.3996 PASS
scen_re pick share heavy_tailed = 1/2 0.5147 0.5000 0.4201 PASS
scen_re pick share skewed = 1/2 0.4853 0.5000 -0.4201 PASS
scen_re skewed: skew = √(8/df), df = 10 0.8888 0.8944 -0.8543 PASS
scen_re heavy: excess kurtosis = 6/(df−4), df = 10 1.0434 1.0000 1.5931 PASS
scen_re_replace pin holds: residual stays high_kurtosis (right_skewed swap inert) 0.9800 1.0000 NA PASS
scen_re_replace symmetric: skew = 0 (pinned t(6), not the χ² swap) -0.0159 0.0000 -0.9557 PASS
scen_re_replace mean = 0 0.0014 0.0000 1.2000 PASS
scen_re_replace var = 1 (t(6) standardized) 0.9977 1.0000 -0.9235 PASS
scen_fa_ols fixed: counts identical across draws (no RNG) 1.0000 1.0000 NA PASS
scen_fa_ols fixed: max |count − n·p| ≤ 1 (largest remainder) 0.0000 1.0000 NA PASS
scen_fa_ols sampled: Var(count) level 1 = n·p(1−p) 243.7875 250.0000 -0.3935 PASS
scen_fa_ols sampled: Var(count) level 2 = n·p(1−p) 182.2775 210.0000 -2.1712 PASS
scen_fa_ols sampled: Var(count) level 3 = n·p(1−p) 175.0250 160.0000 1.3196 PASS
scen_fa_mle fixed: counts identical across draws (no RNG) 1.0000 1.0000 NA PASS
scen_fa_mle fixed: max |count − n·p| ≤ 1 (largest remainder) 0.0000 1.0000 NA PASS
scen_fa_mle sampled: Var(count) level 1 = n·p(1−p) 220.4100 250.0000 -1.3942 PASS
scen_fa_mle sampled: Var(count) level 2 = n·p(1−p) 194.0300 210.0000 -0.9094 PASS
scen_fa_mle sampled: Var(count) level 3 = n·p(1−p) 158.4900 160.0000 -0.1044 PASS
scen_fa_mle_fp find_power accepts Fa toggle under estimator = Mle 1.0000 1.0000 NA PASS
scen_b0 optimistic ≡ non-scenario path (bit-identical PowerResult) 1.0000 1.0000 NA PASS
scen_b1_ols optimistic: β̂[intercept] unbiased -0.0025 0.0000 -2.1096 PASS
scen_b1_ols optimistic: β̂[x1] unbiased 0.3994 0.4000 -0.4687 PASS
scen_b1_ols optimistic: β̂[x2] unbiased 0.2499 0.2500 -0.1007 PASS
scen_b1_ols realistic: β̂[intercept] unbiased 0.0001 0.0000 0.0560 PASS
scen_b1_ols realistic: β̂[x1] unbiased 0.3952 0.4000 -1.0430 PASS
scen_b1_ols realistic: β̂[x2] unbiased 0.2500 0.2500 0.0007 PASS
scen_b1_ols doomer: β̂[intercept] unbiased 0.0009 0.0000 0.7089 PASS
scen_b1_ols doomer: β̂[x1] unbiased 0.4034 0.4000 0.3641 PASS
scen_b1_ols doomer: β̂[x2] unbiased 0.2496 0.2500 -0.0609 PASS
scen_b1_glm optimistic: β̂[intercept] = per-study pseudo-true -0.4050 -0.4055 0.1720 PASS
scen_b1_glm optimistic: β̂[x1] = per-study pseudo-true 0.5007 0.5000 0.2342 PASS
scen_b1_glm optimistic: β̂[x2] = per-study pseudo-true 0.2988 0.3000 -0.4579 PASS
scen_b1_glm realistic: β̂[intercept] = per-study pseudo-true -0.3905 -0.4055 1.1979 PASS
scen_b1_glm realistic: β̂[x1] = per-study pseudo-true 0.5066 0.5000 1.0446 PASS
scen_b1_glm realistic: β̂[x2] = per-study pseudo-true 0.2992 0.3000 -0.1913 PASS
scen_b1_glm doomer: β̂[intercept] = per-study pseudo-true -0.3944 -0.4055 0.4703 PASS
scen_b1_glm doomer: β̂[x1] = per-study pseudo-true 0.4939 0.5004 -0.5737 PASS
scen_b1_glm doomer: β̂[x2] = per-study pseudo-true 0.2983 0.3002 -0.2780 PASS
scen_b2 x1 jitter slope at λ=1 = h²β₁² 0.0217 0.0256 -1.7720 PASS
scen_b2 x1 jitter slope λ-invariant (paired Δ = 0) 0.0000 0.0000 0.0473 PASS
scen_b3 γ = ln(λ)/4 under forced high_kurtosis residual 0.3426 0.3466 -1.6275 PASS
scen_b3 γ = ln(λ)/4 under forced right_skewed residual 0.3508 0.3466 1.6939 PASS
scen_b4 right_skewed pair: r = NORTA 0.4541 (latent ρ = 0.50) 0.4546 0.4541 0.4146 PASS
scen_b4 uniform pair: r = NORTA 0.4826 (latent ρ = 0.50) 0.4830 0.4826 0.4445 PASS
scen_b5 slope magnitude = γ/σ₀ (stale spec anchor) 0.6594 0.6521 1.5985 PASS
scen_b5 staleness: lm(λ̂′ ~ λ′ predicted) slope = 1 0.9010 1.0000 -0.9599 PASS
scen_b5 drift bounded: max λ′ ≤ clamp-range bound 5.0446 5.3938 NA PASS
fg_glm_ident bit-identity under logit: optimistic ≡ glm_l4 1.0000 1.0000 NA PASS
fg_glm_ident bit-identity under logit: glm_l4 ≡ glm_l4re 1.0000 1.0000 NA PASS
fg_glm_flip h-toggle flip rate = MC prediction (paired Δ = 0) 0.0023 0.0000 0.6365 PASS

All 72 gates pass and every case reproduces its golden baseline. Each scenario knob generates its documented law: the realised slope wobble, variance trend, correlation noise (with its clamp truncation), swap frequencies, pool moments, allocation behaviour, NORTA coupling, and heteroskedasticity-anchor drift all land on their predicted values, and the β̂ backstop shows no mean leak.

scen_re_multi — multi-grouping RE knobs (M2)

The uniformity law on a crossed extra beside the primary: random_effect_dist / random_effect_df apply to every grouping’s draw, and icc_noise_sd jitters every grouping’s τ² independently. Measurement mirrors scen_re’s level-mean-variance recovery (solver-free); β̂ unbiased is the leak tripwire. The jitter-presence check is paired against a no-jitter (optimistic) control: the realised τ̂² spread under the knob must exceed the control’s Monte-Carlo floor on the extra grouping too.

scen_re_multi: set == get per grouping, independent jitter present, beta unbiased — PASS

How this was produced

item value
Report generated 21 June 2026
R version R version 4.5.3 (2026-03-11)
mcpower 1.0.0
Gate band (SE-of-mean z) 4
Golden tolerance 1e-09
Cases 21
Gates 72

Cases live in mcpower/validation/formulas.R (SCENARIO_CASES), probes in mcpower/validation/common.R, gates in mcpower/validation/tolerances.R (SCENARIO_TOL). Golden baseline: mcpower/validation/data/scenario_golden.rds (delete it to re-freeze after a deliberate DGP change). To reproduce, from the repository root:

rmarkdown::render("mcpower/validation/validation_scenarios.rmd",
                  output_dir = "mcpower/web/documentation/validation")