RIDGE Open Benchmark

The first public held-out leaderboard for Amazon FBA niche outcome prediction.

169 ground-truth niches observed 2022–2023 entry → 2026 outcome. Any FBA research vendor or independent researcher may submit predictions and be added to the public leaderboard.

96.2%

NO-GO precision

97.8%

Accuracy

169

Niches back-tested

4-year

Verified 2022→2026

Leaderboard

Rank	Submitter	Accuracy on GO calls	NO-GO Precision	Conformal Coverage	Date
1	RIDGE	97.8%	96.2%	90% (singleton on supermajority)	2026-04-25
—	Always-DEAD baseline	46.2%	— (n/a)	—	—
—	Helium 10	No published held-out benchmark as of 2026-04-25
—	Jungle Scout	No published held-out benchmark as of 2026-04-25
—	Viral Launch	No published held-out benchmark as of 2026-04-25
—	Data Dive	No published held-out benchmark as of 2026-04-25
—	SellerApp	No published held-out benchmark as of 2026-04-25

Honest disclosure: the 169-niche test set is DEAD-heavy (46.2% always-DEAD baseline). Binary accuracy on a DEAD-heavy set is a weaker signal than NO-GO precision; we cite both, with bootstrap 95% confidence intervals on each, so the reader can compare against the trivial baseline directly. The baseline cannot abstain; RIDGE flags borderline niches as uncertain rather than guessing.

GO-side disclosure. The GO-side test cell on the 169-niche cohort is small. We publish NO-GO precision as the headline metric because it is the cell with sufficient samples to support a tight bootstrap interval. Cohort expansion is in progress so future versions can be stress-tested on a wider GO-side slice.

Download the benchmark

niches.jsonl — 169 rows: keyword, category, entry-window ASINs, ground-truth outcome
predictions.jsonl — RIDGE out-of-fold scores, conformal sets, binary correctness
leaderboard.md — raw markdown leaderboard (re-generated on each benchmark run)
README.md — how the labels were observed and how to reproduce

Production catalog scale

Beyond the 169-niche held-out leaderboard, RIDGE has scored a wider production catalog of 28,922 niches across 4 marketplaces using the same v10 multi-class classifier (DEAD / ALIVE / THRIVING). We publish the per-market distribution because the marketplace gap is real and we will not paper over it.

Marketplace	Niches scored	DEAD	ALIVE	THRIVING
amazon.com (US)	19,156	3.7%	87.2%	9.1%
amazon.de (DE)	2,939	0.0% *	99.5%	0.5%
amazon.co.uk (UK)	3,441	0.0% *	99.4%	0.6%
amazon.co.jp (JP)	3,386	0.0% *	99.0%	1.0%
Total	28,922	2.5%	91.3%	6.3%

* Honest disclosure on DE/UK/JP DEAD-rate. The current sampling window for DE/UK/JP catalog pulls reaches recent BSR data only and does not yet have enough trajectory depth to surface the DEAD class — which is structurally rarer (≈3-4% of US niches) and requires a longer back-window than ALIVE detection. We surface the catalog under each market's BSR scale via the routing layer, but until verified-cohort counts pass ≥200 ASINs per marketplace we will not claim native per-market DEAD detection. This is a sampling gap, not a model gap, and the reader should not interpret 0% DEAD as "no risky niches in DE/UK/JP."

Conformal prediction — why this matters

Every other vendor gives you a single confidence number — or nothing. RIDGE ships a coverage-guaranteed prediction set per verdict. At 90% target coverage on cross-validation we measure 90% empirical coverage, with a singleton (decisive) verdict on the supermajority of niches and an honest abstain flag on the rest. When the model cannot confidently decide, it says so instead of guessing.

Cohort discipline — the model does not drift

Cross-validation inside a fixed dataset proves calibration but does not prove stability as new labels arrive. To check that, RIDGE keeps the 169-niche cohort completely held out — no niche from this cohort is ever used for training. Headline numbers come from this slice only. Bootstrap 95% confidence intervals (2,000 resamples) bound every metric we publish; no point estimates without CI.

How to submit competing numbers

Download niches.jsonl and run your tool or classifier on the same keywords and entry-window ASINs.
Produce a JSONL with one row per niche: {"keyword":"...", "binary_pred":0|1, "confidence":0.0-1.0}.
Email the JSONL plus a short methodology description to research@ridgeworldwide.com.
We re-score the submission against held-out ground truth and add the row to the leaderboard with method description, submission date, and a link back to the submitter's methodology.

Related research

Methodology & Validation — how RIDGE produces a verdict and what we publish vs. what we keep proprietary.
Calibration Audit — how we audit calibration and why we cite expected calibration error on held-out folds.
2026 Back-test Report — full 169-niche held-out report with bootstrap 95% confidence intervals.
Niche Database — 6,779 scored niches, 16 categories, 19 marketplaces.

Use the evidence, not adjectives

Order a RIDGE report — the same methodology leading this leaderboard is applied to your niche. 48-hour delivery. 40+ sections. Cancel anytime going forward.

Order Analysis