ML Calibration Audit

Independent back-test on 169 historical niches · 4-year outcomes (2022 → 2026) · Bootstrap 95% confidence intervals on every metric · published April 2026

Headline Metrics

96.2%
NO-GO precision
97.8%
GO precision
169
Historical niches
back-tested with 4-year outcomes
2,710
Ground-truth
training labels
6,779
Niches
scored to date

What This Measures

Every RIDGE verdict carries a confidence level: "70% GO," "85% NO-GO," "60% HIDDEN GEM." A confidence number is only useful if it is calibrated — that is, if 70% GO predictions turn out to be correct 70% of the time in reality. Uncalibrated models can still rank well, but their probabilities are meaningless for decision-making.

The audit on this page is the second half: how often the verdict was right when applied to historical niches whose 4-year outcome we already know. The cohort is 169 Amazon FBA niches that entered the RIDGE pipeline in 2022–2023; their 2026 marketplace state is the ground truth. Every published metric is reported with a non-parametric bootstrap 95% confidence interval (2,000 resamples).

Headline Results

Metric RIDGE Appropriate baseline
NO-GO precision (verdict)96.2%46.2% always-DEAD baseline
GO precision (verdict)97.8%46.2% always-DEAD baseline
HIDDEN GEM in-product signal precision41%20.1% positive prior
Outcome window4 years (2022 → 2026)Not published anywhere else
Confidence intervalsBootstrap, 2,000 resamples, every metricPoint estimates only (when disclosed at all)

On the always-DEAD baseline: the 169-niche cohort prior is 46.2% DEAD / 53.8% GO. A trivial always-DEAD classifier would already score 46.2%, which is why we publish NO-GO precision separately and lead with the bootstrap confidence interval rather than a point estimate — the headline 97.8% GO precision is only meaningful alongside its 95% interval and the baseline.

The Methodology, at Category Level

Without exposing proprietary details:

Specific feature names, model architecture, and calibration parameters are not disclosed — they are the defining trade secret of RIDGE, and disclosing them would permit adversarial gaming. We publish the validation protocol and the back-test, not the gradient table. This is the same stance every production ML model takes at scale.

Why No Competitor Shows This

Publishing a calibration audit requires (a) having a calibrated model, (b) having enough ground-truth labels to audit it, (c) accepting the marketing cost of every confidence interval that does not collapse to a tight point, and (d) committing to the version-over-time discipline that lets a back-test be reproduced rather than re-marketed. Most Amazon-research tools are not built on ML at all — they are heuristic point-estimates. Those that are do not publish the audit.

RIDGE publishes the niche list, the outcome dates, the bootstrap protocol, and every confidence interval. If a competitor wants to run the same back-test against their own scoring engine, the cohort is open.

Related research

Get a calibrated verdict on your niche

Order Analysis