MCPower — Validating get_effects_from_data
What this report validates
get_effects_from_data recovers standardized effect sizes from uploaded
pilot data by fitting the model the data came from. This report is the
acceptance gate for that recovery convention: for each estimator we
specify an effect of size s, let MCPower simulate a dataset from
it, then recover the effect from that dataset — and check the recovery
lands where the convention says it should.
The estimator is dispatched off the model family, and the recovery scale differs by estimator:
- OLS (continuous outcome) z-scores the outcome, so it recovers the
standardized regression coefficient —
s / sqrt(Σs² + 1)for independent continuous predictors with the engine’s default N(0, 1) residual. The shrinkage is the residual variance the standardization divides through; it is small for the small/medium effects benchmarked here and is the same approximation the shipped OLS recovery has always carried. - GLM (logistic) and MLE (mixed) fit the native outcome (raw
0/1 for logit, the raw response for mixed), so they recover the
coefficient directly: expected recovery is
sitself. MLE recovery is fixed-effects-only and reads the grouping column from the uploaded data.
Scope: continuous main effects only — the settled convention. Factor-dummy and interaction scaling under non-continuous outcomes is a separate, still-open item and is intentionally not gated here.
How the check works
- Simulate. Generate a dataset from each case’s data-generating
process at n = 4000 (LME scales the cluster count to match), via the
engine’s own
create_data()— the same DGP the power simulation uses. - Recover. Rebuild the raw predictor frame (plus the outcome, plus
the grouping column for LME),
upload_data()it, and callget_effects_from_data(). - Average. Repeat over 20 draws (seeds 2137…2156) and take the mean — single-draw noise is ~1/sqrt(n), so averaging isolates the convention from sampling scatter (the A↔B harness pattern).
- Gate. The mean recovered effect must land within 0.02 (absolute) of the convention-predicted value.
The threshold
| Quantity | Allowed difference | Why |
|---|---|---|
| Mean recovered − expected | 0.02 absolute | K-draw mean vs the convention-predicted value; worst measured margin 0.007 → ~2.8× headroom |
This is the GETEFFECTS_TOL gate from tolerances.R. A wrong
estimator, a sign flip, a missing/inverted standardization, or a 2×
scale would move the mean by 0.1–0.5 — far outside this band.
| Case | Estimator | Formula | n | K | Verdict |
|---|---|---|---|---|---|
| ols_simple_a | OLS (continuous) | y ~ x1 | 4000 | 20 | PASS |
| ols_two_a | OLS (continuous) | y ~ x1 + x2 | 4000 | 20 | PASS |
| ols_corr_a | OLS (continuous) | y ~ x1 + x2 | 4000 | 20 | PASS |
| glm_simple_a | GLM (logistic) | y ~ x1 | 4000 | 20 | PASS |
| glm_two_b | GLM (logistic) | y ~ x1 + x2 | 4000 | 20 | PASS |
| lme_simple_a | MLE (mixed) | y ~ x1 + (1|grp) | 3990 | 20 | PASS |
| lme_two_a | MLE (mixed) | y ~ x1 + x2 + (1|grp) | 3990 | 20 | PASS |
All estimators round-trip: every specified continuous main effect is recovered to within the gate of the convention-predicted value — OLS at the shrunk standardized scale, GLM and MLE at the native coefficient.
png 2 
Solid line = exact recovery (identity); dotted lines = the ±0.02 absolute gate. Every term lands on the identity line, well inside the band.
ols_simple_a · OLS (continuous)
R formula y ~ x1 · n=4000 · K=20 draws (seeds 2137–2156).
| Term | Specified (s) | Expected | Mean recovered | |err| | Verdict |
|---|---|---|---|---|---|
| x1 | 0.25 | 0.24254 | 0.24393 | 0.0014 | PASS |
ols_two_a · OLS (continuous)
R formula y ~ x1 + x2 · n=4000 · K=20 draws (seeds 2137–2156).
| Term | Specified (s) | Expected | Mean recovered | |err| | Verdict |
|---|---|---|---|---|---|
| x1 | 0.25 | 0.24140 | 0.24286 | 0.00145 | PASS |
| x2 | 0.10 | 0.09656 | 0.09490 | 0.00166 | PASS |
ols_corr_a · OLS (continuous)
R formula y ~ x1 + x2 · n=4000 · K=20 draws (seeds 2137–2156).
| Term | Specified (s) | Expected | Mean recovered | |err| | Verdict |
|---|---|---|---|---|---|
| x1 | 0.25 | 0.23864 | 0.24102 | 0.00238 | PASS |
| x2 | 0.10 | 0.09545 | 0.09343 | 0.00202 | PASS |
glm_simple_a · GLM (logistic)
R formula y ~ x1 · n=4000 · K=20 draws (seeds 2137–2156).
| Term | Specified (s) | Expected | Mean recovered | |err| | Verdict |
|---|---|---|---|---|---|
| x1 | 0.5 | 0.5 | 0.48882 | 0.01118 | PASS |
glm_two_b · GLM (logistic)
R formula y ~ x1 + x2 · n=4000 · K=20 draws (seeds 2137–2156).
| Term | Specified (s) | Expected | Mean recovered | |err| | Verdict |
|---|---|---|---|---|---|
| x1 | 0.8 | 0.8 | 0.79534 | 0.00467 | PASS |
| x2 | 0.5 | 0.5 | 0.50462 | 0.00462 | PASS |
lme_simple_a · MLE (mixed)
R formula y ~ x1 + (1|grp) · n=3990 · K=20 draws (seeds 2137–2156).
| Term | Specified (s) | Expected | Mean recovered | |err| | Verdict |
|---|---|---|---|---|---|
| x1 | 0.5 | 0.5 | 0.50154 | 0.00154 | PASS |
lme_two_a · MLE (mixed)
R formula y ~ x1 + x2 + (1|grp) · n=3990 · K=20 draws (seeds
2137–2156).
| Term | Specified (s) | Expected | Mean recovered | |err| | Verdict |
|---|---|---|---|---|---|
| x1 | 0.5 | 0.5 | 0.50152 | 0.00152 | PASS |
| x2 | 0.3 | 0.3 | 0.29892 | 0.00107 | PASS |
| item | value |
|---|---|
| generated | 2026-06-21 |
| R | R version 4.5.3 (2026-03-11) |
| mcpower | 1.0.0 |
| round-trip gate (mean abs) | 0.02 abs |
| n / K / seed0 | 4000 / 20 / 2137 |
Reproduce:
rmarkdown::render("mcpower/validation/validation_get_effects.rmd", output_dir = "mcpower/web/documentation/validation").