MCPower — Validating Scenario Perturbations
What this report shows
MCPower’s scenario analysis deliberately stresses a planned study:
instead of simulating the idealised design you specified, each scenario
perturbs it — slopes wobble between observations, residual variance
trends with the predictors, correlations differ from the plan, predictor
and residual distributions get swapped, group allocation becomes random.
Each perturbation is controlled by a knob in configs/scenarios.json,
and each knob documents a precise statistical law for the disturbance
it injects.
This report checks that every knob generates exactly its documented law — no more, no less. It is the L5 layer of the validation charter: the L3 reports prove the optimistic (unperturbed) generator faithful against the spec as a point oracle; here the spec value is deliberately randomised, so the oracle is the perturbation law itself (including its documented distortions: the ±0.8 correlation clamp, PSD repair, the censored t(3) table, the stale heteroskedasticity anchor).
The gate doctrine. Every gate is set == get on the realised magnitude: the knob’s value, recovered from the generated data by a probe, must match the documented law within an SE-of-mean z-band of 4 (each case draws K independent perturbation blocks; a mis-scaled knob lands tens of σ out, chance alone exceeds the band once in ~10,000 gates). Two checks are explicitly not gates: monotone power across presets (a readable summary that washes out real faults) and a global error-variance invariant (heterogeneity is supposed to inflate it). The β̂-unbiasedness backstop (B1) — including the intercept — is kept as a cheap mean-leak tripwire: a mis-centred swapped distribution lands in β̂₀ while every effect estimate stays clean.
Results at a glance
72 of 72 gates pass. Golden reproducibility: all cases reproduce.
| Case | Verdict | Golden check |
|---|---|---|
| fg_glm_flip | all PASS | reproduces |
| fg_glm_ident | all PASS | reproduces |
| scen_b0 | all PASS | reproduces |
| scen_b1_glm | all PASS | reproduces |
| scen_b1_ols | all PASS | reproduces |
| scen_b2 | all PASS | reproduces |
| scen_b3 | all PASS | reproduces |
| scen_b4 | all PASS | reproduces |
| scen_b5 | all PASS | reproduces |
| scen_co_high | all PASS | reproduces |
| scen_co_low | all PASS | reproduces |
| scen_co_psd | all PASS | reproduces |
| scen_fa_mle | all PASS | reproduces |
| scen_fa_mle_fp | all PASS | reproduces |
| scen_fa_ols | all PASS | reproduces |
| scen_he | all PASS | reproduces |
| scen_hs | all PASS | reproduces |
| scen_px | all PASS | reproduces |
| scen_px_t3 | all PASS | reproduces |
| scen_re | all PASS | reproduces |
| scen_re_replace | all PASS | reproduces |
Tier A — one knob at a time
Each case turns on a single knob on an otherwise-optimistic design, draws K independent perturbation blocks, and recovers the knob’s magnitude with a probe matched to its law.
Heterogeneity (slope wobble)
Effects vary per observation: βⱼ + N(0, (h·βⱼ)²). The probe regresses squared true-β residuals on each squared predictor — the slope recovers h²βⱼ² per predictor, separating He from anything that only moves pooled moments.
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| jitter slope on x1² = h²β² (h=0.4) | scen_he | jitter slope on x1² = h²β² (h=0.4) | 0.02491 | 0.0256 | -0.25106 | PASS |
| jitter slope on x2² = h²β² (h=0.4) | scen_he | jitter slope on x2² = h²β² (h=0.4) | 0.00830 | 0.0100 | -1.07315 | PASS |
Heteroskedasticity (residual-variance trend)
Residual variance follows Var(εᵢ) = σ²·exp(γzᵢ)/exp(γ²/2) with γ = ln(λ)/4, z the standardised driver. log e² is then linear in z with slope exactly γ — shape-blind, so the same probe serves B3. The realised ±2σ ratio is exp(4γ̂).
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| log-e² slope γ = ln(λ)/4 | scen_hs | log-e² slope γ = ln(λ)/4 | 0.34797 | 0.34657 | 0.60302 | PASS |
Realised λ̂ = exp(4γ̂) = 4.022 (set: 4); the raw ±2σ binned variance ratio reads 3.95 (reported only — finite bins make its law approximate; the slope is the gate).
Correlation noise
Per block the off-diagonals get symmetrised Gaussian noise — symmetrisation halves the variance, so the per-block ρ law is N(ρ, s²/2) censored at ±0.8, plus finite-n sampling noise in quadrature. The low-ρ case gates the exact law; the high-ρ case sits against the clamp, gating the censored-normal truncation law itself (mean visibly below the naive ρ).
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| block-r mean = ±0.8-clamp censored law | scen_co_low | block-r mean = ±0.8-clamp censored law | 0.29620 | 0.30000 | -0.68832 | PASS |
| block-r SD = √(s²/2 + (1−ρ²)²/n) | scen_co_low | block-r SD = √(s²/2 + (1−ρ²)²/n) | 0.11048 | 0.10704 | 0.88024 | PASS |
| block-r mean = ±0.8-clamp censored law1 | scen_co_high | block-r mean = ±0.8-clamp censored law | 0.73059 | 0.72807 | 0.94848 | PASS |
| block-r SD = √(s²/2 + (1−ρ²)²/n)1 | scen_co_high | block-r SD = √(s²/2 + (1−ρ²)²/n) | 0.07522 | 0.07873 | -1.86440 | PASS |
At ρ = 0.75, s = 0.15: the censored law predicts mean 0.7281 — the −0.0219 shift below the nominal ρ is the documented clamp truncation, and the realised mean lands on it.
PSD repair (3 predictors, high ρ)
Repair cannot fire at p = 2 (any clamped 2×2 is PD), so a 3-variable all-0.6 design under s = 0.3 is where eigenvalue-floor + diagonal-renormalisation distortion lives. There is no closed-form law; the empirical per-pair moments are frozen as MCPower-golden (table below) and re-checked on every run.
| r12_mean | r13_mean | r23_mean | r12_sd | r13_sd | r23_sd |
|---|---|---|---|---|---|
| 0.58466 | 0.56899 | 0.55664 | 0.17938 | 0.18598 | 0.19204 |
Distribution swaps (predictors)
Each continuous-normal column is swapped per block with probability q to a uniform pick from the pool; every pool candidate is standardised (mean 0, var 1), so a swap perturbs shape only — a mis-centred candidate would be a mean leak straight into β̂₀.
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| swap frequency = q | scen_px | swap frequency = q | 0.53000 | 0.50000 | 1.65979 | PASS |
| pick share right_skewed = 1/3 | scen_px | pick share right_skewed = 1/3 | 0.37264 | 0.33333 | 1.71701 | PASS |
| pick share left_skewed = 1/3 | scen_px | pick share left_skewed = 1/3 | 0.29245 | 0.33333 | -1.78569 | PASS |
| pick share uniform = 1/3 | scen_px | pick share uniform = 1/3 | 0.33491 | 0.33333 | 0.06868 | PASS |
| right_skewed mean = 0 | scen_px | right_skewed mean = 0 | -0.00186 | 0.00000 | -1.44183 | PASS |
| right_skewed var = 1 | scen_px | right_skewed var = 1 | 0.99895 | 1.00000 | -0.32336 | PASS |
| left_skewed mean = 0 | scen_px | left_skewed mean = 0 | 0.00075 | 0.00000 | 0.51932 | PASS |
| left_skewed var = 1 | scen_px | left_skewed var = 1 | 0.99996 | 1.00000 | -0.01066 | PASS |
| uniform mean = 0 | scen_px | uniform mean = 0 | -0.00133 | 0.00000 | -0.98269 | PASS |
| uniform var = 1 | scen_px | uniform var = 1 | 1.00042 | 1.00000 | 0.34051 | PASS |
Block classifications: left_skewed 124, normal 376, right_skewed 158, uniform 142.
Custom pool: high_kurtosis
The only swappable marginal outside the presets. Its engine identity is a censored standardised t(3): a 2048-knot inverse-CDF table on percentiles [0.00121, 0.99879], normalised at build to the censored table’s own SD (1.5958 raw, vs √3 ≈ 1.7321 for the full t(3)) — so the marginal has exactly unit variance, excess kurtosis ≈ 6.39, and support ±6.0 SD. The censoring is deliberate: it bounds every synthetic marginal at ±6 SD while keeping this the heaviest-tailed shape (t(3)’s own kurtosis is infinite). v1 — and this engine until 2026-06 — divided by √3, which standardises the full t(3) and left the censored marginal at var ≈ 0.858, a silent ~14% effect-size shrink for every high-kurtosis predictor; the L5 gate below caught it, and the table is now normalised by construction.
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| every block swapped (q = 1) | scen_px_t3 | every block swapped (q = 1) | 1.00000 | 1 | NA | PASS |
| t3 mean = 0 | scen_px_t3 | t3 mean = 0 | -0.00194 | 0 | -1.67897 | PASS |
| t3 var = 1 (table-normalized) | scen_px_t3 | t3 var = 1 (table-normalized) | 0.99882 | 1 | -0.37682 | PASS |
Residual swaps
With probability q_r the block’s residual distribution is replaced — distribution and df — by a pool pick (t(df)·√((df−2)/df) or (χ²(df)−df)/√(2df)). The shape laws (skew = √(8/df), excess kurtosis = 6/(df−4)) recover the df, proving the scenario’s df is carried, not the spec’s.
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| swap frequency = q_r | scen_re | swap frequency = q_r | 0.51000 | 0.50000 | 0.39958 | PASS |
| pick share heavy_tailed = 1/2 | scen_re | pick share heavy_tailed = 1/2 | 0.51471 | 0.50000 | 0.42008 | PASS |
| pick share skewed = 1/2 | scen_re | pick share skewed = 1/2 | 0.48529 | 0.50000 | -0.42008 | PASS |
| skewed: skew = √(8/df), df = 10 | scen_re | skewed: skew = √(8/df), df = 10 | 0.88878 | 0.89443 | -0.85425 | PASS |
| heavy: excess kurtosis = 6/(df−4), df = 10 | scen_re | heavy: excess kurtosis = 6/(df−4), df = 10 | 1.04341 | 1.00000 | 1.59313 | PASS |
| pin holds: residual stays high_kurtosis (right_skewed swap inert) | scen_re_replace | pin holds: residual stays high_kurtosis (right_skewed swap inert) | 0.98000 | 1.00000 | NA | PASS |
| symmetric: skew = 0 (pinned t(6), not the χ² swap) | scen_re_replace | symmetric: skew = 0 (pinned t(6), not the χ² swap) | -0.01593 | 0.00000 | -0.95569 | PASS |
| mean = 0 | scen_re_replace | mean = 0 | 0.00140 | 0.00000 | 1.20001 | PASS |
| var = 1 (t(6) standardized) | scen_re_replace | var = 1 (t(6) standardized) | 0.99766 | 1.00000 | -0.92347 | PASS |
The pinned case (scen_re_replace) verifies the swap-eligibility rule:
it pins the spec residual with
set_residual_distribution("high_kurtosis"), then configures a forced
right_skewed swap (q_r = 1). Because pick_residual only swaps an
unpinned default-normal residual, the swap is inert — every draw
keeps the pinned, symmetric censored-t3 high_kurtosis residual (skew ≈
0, table-normalised to var 1), never the χ²(6) the config asks for. The
skew = 0 gate is the tripwire: a fired swap would force skew = √(8/6) ≈
1.15.
Factor-proportion sampling
sampled_factor_proportions = FALSE (the optimistic default) assigns
factor levels by a deterministic largest-remainder walk — counts are a
pure function of (n, p), identical across draws, each within 1 of
n·p. TRUE draws levels per row: counts are Binomial(n, p) with
variance n·p(1−p).
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| fixed: counts identical across draws (no RNG) | scen_fa_ols | fixed: counts identical across draws (no RNG) | 1.0000 | 1 | NA | PASS |
| fixed: max |count − n·p| ≤ 1 (largest remainder) | scen_fa_ols | fixed: max |count − n·p| ≤ 1 (largest remainder) | 0.0000 | 1 | NA | PASS |
| sampled: Var(count) level 1 = n·p(1−p) | scen_fa_ols | sampled: Var(count) level 1 = n·p(1−p) | 243.7875 | 250 | -0.3935 | PASS |
| sampled: Var(count) level 2 = n·p(1−p) | scen_fa_ols | sampled: Var(count) level 2 = n·p(1−p) | 182.2775 | 210 | -2.1712 | PASS |
| sampled: Var(count) level 3 = n·p(1−p) | scen_fa_ols | sampled: Var(count) level 3 = n·p(1−p) | 175.0250 | 160 | 1.3196 | PASS |
| fixed: counts identical across draws (no RNG)1 | scen_fa_mle | fixed: counts identical across draws (no RNG) | 1.0000 | 1 | NA | PASS |
| fixed: max |count − n·p| ≤ 1 (largest remainder)1 | scen_fa_mle | fixed: max |count − n·p| ≤ 1 (largest remainder) | 0.0000 | 1 | NA | PASS |
| sampled: Var(count) level 1 = n·p(1−p)1 | scen_fa_mle | sampled: Var(count) level 1 = n·p(1−p) | 220.4100 | 250 | -1.3942 | PASS |
| sampled: Var(count) level 2 = n·p(1−p)1 | scen_fa_mle | sampled: Var(count) level 2 = n·p(1−p) | 194.0300 | 210 | -0.9094 | PASS |
| sampled: Var(count) level 3 = n·p(1−p)1 | scen_fa_mle | sampled: Var(count) level 3 = n·p(1−p) | 158.4900 | 160 | -0.1044 | PASS |
| find_power accepts Fa toggle under estimator = Mle | scen_fa_mle_fp | find_power accepts Fa toggle under estimator = Mle | 1.0000 | 1 | NA | PASS |
Fixed counts at n = 1000: 500/300/200 against expected 500/300/200. The MLE rows are the one scenario knob the mixed-model estimator admits (every other knob is rejected by the engine’s estimator gate — a deterministic L1 assertion owned by the engine test suite, not re-tested here).
Tier B — knob interactions
The shipped presets co-vary every knob, so B2–B4 isolate single interactions through custom scenario pairs on shared seeds (P1 pairing: two scenarios at the same seed draw the same raw noise streams, so cross-scenario deltas are knob-attributable).
B0 — Optimistic ≡ baseline
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| optimistic ≡ non-scenario path (bit-identical PowerResult) | scen_b0 | optimistic ≡ non-scenario path (bit-identical PowerResult) | 1 | 1 | NA | PASS |
The optimistic member of a three-scenario paired call is bit-identical to the plain non-scenario call — the DGP-level companion to the orchestrator’s call-seed test (power here: 0.948, 0.28).
B1 — β̂ unbiased across the presets (leak backstop)
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| optimistic: β̂[intercept] unbiased | scen_b1_ols | optimistic: β̂[intercept] unbiased | -0.0025 | 0.0000 | -2.1096 | PASS |
| optimistic: β̂[x1] unbiased | scen_b1_ols | optimistic: β̂[x1] unbiased | 0.3994 | 0.4000 | -0.4687 | PASS |
| optimistic: β̂[x2] unbiased | scen_b1_ols | optimistic: β̂[x2] unbiased | 0.2499 | 0.2500 | -0.1007 | PASS |
| realistic: β̂[intercept] unbiased | scen_b1_ols | realistic: β̂[intercept] unbiased | 0.0001 | 0.0000 | 0.0560 | PASS |
| realistic: β̂[x1] unbiased | scen_b1_ols | realistic: β̂[x1] unbiased | 0.3952 | 0.4000 | -1.0430 | PASS |
| realistic: β̂[x2] unbiased | scen_b1_ols | realistic: β̂[x2] unbiased | 0.2500 | 0.2500 | 0.0007 | PASS |
| doomer: β̂[intercept] unbiased | scen_b1_ols | doomer: β̂[intercept] unbiased | 0.0009 | 0.0000 | 0.7089 | PASS |
| doomer: β̂[x1] unbiased | scen_b1_ols | doomer: β̂[x1] unbiased | 0.4034 | 0.4000 | 0.3641 | PASS |
| doomer: β̂[x2] unbiased | scen_b1_ols | doomer: β̂[x2] unbiased | 0.2496 | 0.2500 | -0.0609 | PASS |
| optimistic: β̂[intercept] = per-study pseudo-true | scen_b1_glm | optimistic: β̂[intercept] = per-study pseudo-true | -0.4050 | -0.4055 | 0.1720 | PASS |
| optimistic: β̂[x1] = per-study pseudo-true | scen_b1_glm | optimistic: β̂[x1] = per-study pseudo-true | 0.5007 | 0.5000 | 0.2342 | PASS |
| optimistic: β̂[x2] = per-study pseudo-true | scen_b1_glm | optimistic: β̂[x2] = per-study pseudo-true | 0.2988 | 0.3000 | -0.4579 | PASS |
| realistic: β̂[intercept] = per-study pseudo-true | scen_b1_glm | realistic: β̂[intercept] = per-study pseudo-true | -0.3905 | -0.4055 | 1.1979 | PASS |
| realistic: β̂[x1] = per-study pseudo-true | scen_b1_glm | realistic: β̂[x1] = per-study pseudo-true | 0.5066 | 0.5000 | 1.0446 | PASS |
| realistic: β̂[x2] = per-study pseudo-true | scen_b1_glm | realistic: β̂[x2] = per-study pseudo-true | 0.2992 | 0.3000 | -0.1913 | PASS |
| doomer: β̂[intercept] = per-study pseudo-true | scen_b1_glm | doomer: β̂[intercept] = per-study pseudo-true | -0.3944 | -0.4055 | 0.4703 | PASS |
| doomer: β̂[x1] = per-study pseudo-true | scen_b1_glm | doomer: β̂[x1] = per-study pseudo-true | 0.4939 | 0.5004 | -0.5737 | PASS |
| doomer: β̂[x2] = per-study pseudo-true | scen_b1_glm | doomer: β̂[x2] = per-study pseudo-true | 0.2983 | 0.3002 | -0.2780 | PASS |
OLS rows gate on the z-band, intercept included — the mean-leak
tripwire (linear averaging keeps OLS β̂ exactly unbiased under every
knob). Logit rows need a different law: the heterogeneity β-jitter is
drawn once per study, so each study’s data is a clean logit at its
own β_eff and the MLE recovers it — averaged over the K studies the
fitted coefficient → E[β_eff], not the attenuated coefficient a
per-observation population-averaged marginal would show. The clip toward
zero (s_j = h·|β_j|) nudges each slope’s magnitude up by
×(Φ(1/h)+h·φ(1/h)) ≈ ×1.0008 at doomer’s h = 0.4; the symmetric
unclipped intercept jitter leaves β_0 unchanged. So the logit law is
this per-study pseudo-true value (glm_perstudy_beta), gated on the
absolute band ±0.02. The GLM calibration gates remain Tier A and the
flip rate below.
B2 — He × Hs separation
The β-jitter variance ∝ xᵢⱼ²βⱼ² must be present at λ = 1 and unchanged by a λ toggle (λ is driven by the clean linear predictor, never the jittered one). The λ driver is pinned to x2, so the x1²-decomposed jitter variance is uncontaminated by the λ channel’s even cosh component; the paired Δ across {h = 0.4, λ = 1} vs {h = 0.4, λ = 4} isolates the interaction.
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| x1 jitter slope at λ=1 = h²β₁² | scen_b2 | x1 jitter slope at λ=1 = h²β₁² | 0.02174 | 0.0256 | -1.77201 | PASS |
| x1 jitter slope λ-invariant (paired Δ = 0) | scen_b2 | x1 jitter slope λ-invariant (paired Δ = 0) | 0.00002 | 0.0000 | 0.04726 | PASS |
The driver column shows why the pin matters: x2²-slope reads 0.0095 at λ = 1 (law h²β₂² = 0.01) but inflates to 0.0689 at λ = 4 — the cosh(γz) contamination the naive probe would misread as an He × Hs interaction.
B3 — Hs × Re preservation
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| γ = ln(λ)/4 under forced high_kurtosis residual | scen_b3 | γ = ln(λ)/4 under forced high_kurtosis residual | 0.34263 | 0.34657 | -1.62754 | PASS |
| γ = ln(λ)/4 under forced right_skewed residual | scen_b3 | γ = ln(λ)/4 under forced right_skewed residual | 0.35076 | 0.34657 | 1.69386 | PASS |
Under a forced t(10) or χ²(10) residual the log-e² slope still reads ln(λ)/4 — the multiplicative variance trend amplifies the swapped tails without bending the ratio.
B4 — Co × Px (NORTA)
Correlation is induced on the latent normals; a swapped marginal transforms them, so the realised Pearson r follows the NORTA law, not the latent spec value. The oracle is computed numerically in R (Gauss–Hermite over the latent bivariate normal; it matches the closed forms (e^ρ−1)/(e−1) and (6/π)asin(ρ/2) to 7 digits).
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| right_skewed pair: r = NORTA 0.4541 (latent ρ = 0.50) | scen_b4 | right_skewed pair: r = NORTA 0.4541 (latent ρ = 0.50) | 0.45459 | 0.45410 | 0.41461 | PASS |
| uniform pair: r = NORTA 0.4826 (latent ρ = 0.50) | scen_b4 | uniform pair: r = NORTA 0.4826 (latent ρ = 0.50) | 0.48296 | 0.48258 | 0.44450 | PASS |
B5 — Heteroskedasticity-anchor drift (P6), measured and bounded
het_coeffs (the driver moments that standardise the λ driver) is
computed once from the base spec; correlation noise moves the
realised driver SD σ′ per block while the anchor σ₀ stays put, so the
realised ±2σ′ ratio drifts to λ′ = exp(4γσ′/σ₀). This is an accepted
approximation — the gates prove the mechanism and bound the drift;
they do not fail on the drift itself.
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| slope magnitude = γ/σ₀ (stale spec anchor) | scen_b5 | slope magnitude = γ/σ₀ (stale spec anchor) | 0.65938 | 0.65206 | 1.59849 | PASS |
| staleness: lm(λ̂′ ~ λ′ predicted) slope = 1 | scen_b5 | staleness: lm(λ̂′ ~ λ′ predicted) slope = 1 | 0.90100 | 1.00000 | -0.95987 | PASS |
| drift bounded: max λ′ ≤ clamp-range bound | scen_b5 | drift bounded: max λ′ ≤ clamp-range bound | 5.04465 | 5.39376 | NA | PASS |
| Preset | λ set | λ′ 5% | λ′ median | λ′ 95% |
|---|---|---|---|---|
| realistic | 2 | 1.900 | 1.988 | 2.080 |
| doomer | 4 | 3.343 | 3.960 | 4.627 |
The staleness gate is the discriminating one: per block, the measured ratio λ̂′ tracks the moment-predicted λ′ = exp(4γσ̂′/σ₀) with regression slope ≈ 1. (A per-block recompute of the anchor would pin λ̂′ at λ, slope ≈ 0 — ~11σ away.) Note the mean slope alone cannot tell the two designs apart (γ·E[1/σ′] ≈ γ/σ₀ to 0.1%); the per-block tracking is what evidences the stale anchor. The preset table quantifies the documented drift: under realistic/doomer the effective λ′ wobbles around the set λ by the quantiles shown — second-order next to the perturbations themselves.
Family gating
Knobs must be live or inert exactly per family.
| Case | Gate | Measured | Law | z | Verdict | |
|---|---|---|---|---|---|---|
| bit-identity under logit: optimistic ≡ glm_l4 | fg_glm_ident | bit-identity under logit: optimistic ≡ glm_l4 | 1.00000 | 1 | NA | PASS |
| bit-identity under logit: glm_l4 ≡ glm_l4re | fg_glm_ident | bit-identity under logit: glm_l4 ≡ glm_l4re | 1.00000 | 1 | NA | PASS |
| h-toggle flip rate = MC prediction (paired Δ = 0) | fg_glm_flip | h-toggle flip rate = MC prediction (paired Δ = 0) | 0.00234 | 0 | 0.6365 | PASS |
- GLM λ / residual swaps — inert by bit-identity.
apply_hskrequires a continuous outcome and consumes no RNG, so a λ toggle is an exact no-op under logit; a forced residual swap only consumes scenario-stream draws after every other consumer, leaving the Bernoulli outcomes bit-identical. - GLM heterogeneity — live, latent. The per-row log-odds jitter is hidden behind a single Bernoulli draw, so the observable is the paired h-toggle flip rate: X and the uniforms are drawn before the jitter normals, the pair shares them bit-identically, and P(flip) = E|p_h − p₀| — predicted by numerical integration row by row. Realised flip rate 0.0827 vs predicted 0.0804. The Jensen mean-rate shift (+0.0051) is expected and reported, not gated — median latent-rate invariance is unobservable.
- MLE — only
sampled_factor_proportionsis live (see the Fa section above); every other knob is rejected by the engine’s estimator gate.
Verdict table — every gate
| Case | Gate | Measured | Law | z | Verdict |
|---|---|---|---|---|---|
| scen_he | jitter slope on x1² = h²β² (h=0.4) | 0.0249 | 0.0256 | -0.2511 | PASS |
| scen_he | jitter slope on x2² = h²β² (h=0.4) | 0.0083 | 0.0100 | -1.0731 | PASS |
| scen_hs | log-e² slope γ = ln(λ)/4 | 0.3480 | 0.3466 | 0.6030 | PASS |
| scen_co_low | block-r mean = ±0.8-clamp censored law | 0.2962 | 0.3000 | -0.6883 | PASS |
| scen_co_low | block-r SD = √(s²/2 + (1−ρ²)²/n) | 0.1105 | 0.1070 | 0.8802 | PASS |
| scen_co_high | block-r mean = ±0.8-clamp censored law | 0.7306 | 0.7281 | 0.9485 | PASS |
| scen_co_high | block-r SD = √(s²/2 + (1−ρ²)²/n) | 0.0752 | 0.0787 | -1.8644 | PASS |
| scen_co_psd | PSD-repaired r ∈ [0.45, 0.60] (shrinks below ρ=0.6 input, stays positive) | 0.5566 | 0.4500 | NA | PASS |
| scen_px | swap frequency = q | 0.5300 | 0.5000 | 1.6598 | PASS |
| scen_px | pick share right_skewed = 1/3 | 0.3726 | 0.3333 | 1.7170 | PASS |
| scen_px | pick share left_skewed = 1/3 | 0.2925 | 0.3333 | -1.7857 | PASS |
| scen_px | pick share uniform = 1/3 | 0.3349 | 0.3333 | 0.0687 | PASS |
| scen_px | right_skewed mean = 0 | -0.0019 | 0.0000 | -1.4418 | PASS |
| scen_px | right_skewed var = 1 | 0.9989 | 1.0000 | -0.3234 | PASS |
| scen_px | left_skewed mean = 0 | 0.0007 | 0.0000 | 0.5193 | PASS |
| scen_px | left_skewed var = 1 | 1.0000 | 1.0000 | -0.0107 | PASS |
| scen_px | uniform mean = 0 | -0.0013 | 0.0000 | -0.9827 | PASS |
| scen_px | uniform var = 1 | 1.0004 | 1.0000 | 0.3405 | PASS |
| scen_px_t3 | every block swapped (q = 1) | 1.0000 | 1.0000 | NA | PASS |
| scen_px_t3 | t3 mean = 0 | -0.0019 | 0.0000 | -1.6790 | PASS |
| scen_px_t3 | t3 var = 1 (table-normalized) | 0.9988 | 1.0000 | -0.3768 | PASS |
| scen_re | swap frequency = q_r | 0.5100 | 0.5000 | 0.3996 | PASS |
| scen_re | pick share heavy_tailed = 1/2 | 0.5147 | 0.5000 | 0.4201 | PASS |
| scen_re | pick share skewed = 1/2 | 0.4853 | 0.5000 | -0.4201 | PASS |
| scen_re | skewed: skew = √(8/df), df = 10 | 0.8888 | 0.8944 | -0.8543 | PASS |
| scen_re | heavy: excess kurtosis = 6/(df−4), df = 10 | 1.0434 | 1.0000 | 1.5931 | PASS |
| scen_re_replace | pin holds: residual stays high_kurtosis (right_skewed swap inert) | 0.9800 | 1.0000 | NA | PASS |
| scen_re_replace | symmetric: skew = 0 (pinned t(6), not the χ² swap) | -0.0159 | 0.0000 | -0.9557 | PASS |
| scen_re_replace | mean = 0 | 0.0014 | 0.0000 | 1.2000 | PASS |
| scen_re_replace | var = 1 (t(6) standardized) | 0.9977 | 1.0000 | -0.9235 | PASS |
| scen_fa_ols | fixed: counts identical across draws (no RNG) | 1.0000 | 1.0000 | NA | PASS |
| scen_fa_ols | fixed: max |count − n·p| ≤ 1 (largest remainder) | 0.0000 | 1.0000 | NA | PASS |
| scen_fa_ols | sampled: Var(count) level 1 = n·p(1−p) | 243.7875 | 250.0000 | -0.3935 | PASS |
| scen_fa_ols | sampled: Var(count) level 2 = n·p(1−p) | 182.2775 | 210.0000 | -2.1712 | PASS |
| scen_fa_ols | sampled: Var(count) level 3 = n·p(1−p) | 175.0250 | 160.0000 | 1.3196 | PASS |
| scen_fa_mle | fixed: counts identical across draws (no RNG) | 1.0000 | 1.0000 | NA | PASS |
| scen_fa_mle | fixed: max |count − n·p| ≤ 1 (largest remainder) | 0.0000 | 1.0000 | NA | PASS |
| scen_fa_mle | sampled: Var(count) level 1 = n·p(1−p) | 220.4100 | 250.0000 | -1.3942 | PASS |
| scen_fa_mle | sampled: Var(count) level 2 = n·p(1−p) | 194.0300 | 210.0000 | -0.9094 | PASS |
| scen_fa_mle | sampled: Var(count) level 3 = n·p(1−p) | 158.4900 | 160.0000 | -0.1044 | PASS |
| scen_fa_mle_fp | find_power accepts Fa toggle under estimator = Mle | 1.0000 | 1.0000 | NA | PASS |
| scen_b0 | optimistic ≡ non-scenario path (bit-identical PowerResult) | 1.0000 | 1.0000 | NA | PASS |
| scen_b1_ols | optimistic: β̂[intercept] unbiased | -0.0025 | 0.0000 | -2.1096 | PASS |
| scen_b1_ols | optimistic: β̂[x1] unbiased | 0.3994 | 0.4000 | -0.4687 | PASS |
| scen_b1_ols | optimistic: β̂[x2] unbiased | 0.2499 | 0.2500 | -0.1007 | PASS |
| scen_b1_ols | realistic: β̂[intercept] unbiased | 0.0001 | 0.0000 | 0.0560 | PASS |
| scen_b1_ols | realistic: β̂[x1] unbiased | 0.3952 | 0.4000 | -1.0430 | PASS |
| scen_b1_ols | realistic: β̂[x2] unbiased | 0.2500 | 0.2500 | 0.0007 | PASS |
| scen_b1_ols | doomer: β̂[intercept] unbiased | 0.0009 | 0.0000 | 0.7089 | PASS |
| scen_b1_ols | doomer: β̂[x1] unbiased | 0.4034 | 0.4000 | 0.3641 | PASS |
| scen_b1_ols | doomer: β̂[x2] unbiased | 0.2496 | 0.2500 | -0.0609 | PASS |
| scen_b1_glm | optimistic: β̂[intercept] = per-study pseudo-true | -0.4050 | -0.4055 | 0.1720 | PASS |
| scen_b1_glm | optimistic: β̂[x1] = per-study pseudo-true | 0.5007 | 0.5000 | 0.2342 | PASS |
| scen_b1_glm | optimistic: β̂[x2] = per-study pseudo-true | 0.2988 | 0.3000 | -0.4579 | PASS |
| scen_b1_glm | realistic: β̂[intercept] = per-study pseudo-true | -0.3905 | -0.4055 | 1.1979 | PASS |
| scen_b1_glm | realistic: β̂[x1] = per-study pseudo-true | 0.5066 | 0.5000 | 1.0446 | PASS |
| scen_b1_glm | realistic: β̂[x2] = per-study pseudo-true | 0.2992 | 0.3000 | -0.1913 | PASS |
| scen_b1_glm | doomer: β̂[intercept] = per-study pseudo-true | -0.3944 | -0.4055 | 0.4703 | PASS |
| scen_b1_glm | doomer: β̂[x1] = per-study pseudo-true | 0.4939 | 0.5004 | -0.5737 | PASS |
| scen_b1_glm | doomer: β̂[x2] = per-study pseudo-true | 0.2983 | 0.3002 | -0.2780 | PASS |
| scen_b2 | x1 jitter slope at λ=1 = h²β₁² | 0.0217 | 0.0256 | -1.7720 | PASS |
| scen_b2 | x1 jitter slope λ-invariant (paired Δ = 0) | 0.0000 | 0.0000 | 0.0473 | PASS |
| scen_b3 | γ = ln(λ)/4 under forced high_kurtosis residual | 0.3426 | 0.3466 | -1.6275 | PASS |
| scen_b3 | γ = ln(λ)/4 under forced right_skewed residual | 0.3508 | 0.3466 | 1.6939 | PASS |
| scen_b4 | right_skewed pair: r = NORTA 0.4541 (latent ρ = 0.50) | 0.4546 | 0.4541 | 0.4146 | PASS |
| scen_b4 | uniform pair: r = NORTA 0.4826 (latent ρ = 0.50) | 0.4830 | 0.4826 | 0.4445 | PASS |
| scen_b5 | slope magnitude = γ/σ₀ (stale spec anchor) | 0.6594 | 0.6521 | 1.5985 | PASS |
| scen_b5 | staleness: lm(λ̂′ ~ λ′ predicted) slope = 1 | 0.9010 | 1.0000 | -0.9599 | PASS |
| scen_b5 | drift bounded: max λ′ ≤ clamp-range bound | 5.0446 | 5.3938 | NA | PASS |
| fg_glm_ident | bit-identity under logit: optimistic ≡ glm_l4 | 1.0000 | 1.0000 | NA | PASS |
| fg_glm_ident | bit-identity under logit: glm_l4 ≡ glm_l4re | 1.0000 | 1.0000 | NA | PASS |
| fg_glm_flip | h-toggle flip rate = MC prediction (paired Δ = 0) | 0.0023 | 0.0000 | 0.6365 | PASS |
All 72 gates pass and every case reproduces its golden baseline. Each scenario knob generates its documented law: the realised slope wobble, variance trend, correlation noise (with its clamp truncation), swap frequencies, pool moments, allocation behaviour, NORTA coupling, and heteroskedasticity-anchor drift all land on their predicted values, and the β̂ backstop shows no mean leak.
scen_re_multi — multi-grouping RE knobs (M2)
The uniformity law on a crossed extra beside the primary:
random_effect_dist / random_effect_df apply to every grouping’s
draw, and icc_noise_sd jitters every grouping’s τ² independently.
Measurement mirrors scen_re’s level-mean-variance recovery
(solver-free); β̂ unbiased is the leak tripwire. The jitter-presence
check is paired against a no-jitter (optimistic) control: the realised
τ̂² spread under the knob must exceed the control’s Monte-Carlo floor on
the extra grouping too.
scen_re_multi: set == get per grouping, independent jitter present, beta unbiased — PASS
How this was produced
| item | value |
|---|---|
| Report generated | 21 June 2026 |
| R version | R version 4.5.3 (2026-03-11) |
| mcpower | 1.0.0 |
| Gate band (SE-of-mean z) | 4 |
| Golden tolerance | 1e-09 |
| Cases | 21 |
| Gates | 72 |
Cases live in mcpower/validation/formulas.R (SCENARIO_CASES), probes
in mcpower/validation/common.R, gates in
mcpower/validation/tolerances.R (SCENARIO_TOL). Golden baseline:
mcpower/validation/data/scenario_golden.rds (delete it to re-freeze
after a deliberate DGP change). To reproduce, from the repository
root:
rmarkdown::render("mcpower/validation/validation_scenarios.rmd",
output_dir = "mcpower/web/documentation/validation")