Methodology & Validation
How RIDGE produces a verdict, and how we validate it.
One-paragraph summary
RIDGE takes a niche keyword, collects evidence from 39 data sources (SERP, reviews, regulator registries, trade databases, listing snapshots, financial benchmarks), reduces that evidence to a feature vector, and feeds it into a calibrated machine-learning classifier trained on 2,710 ground-truth niche outcome labels. The classifier outputs a probability that the niche will be commercially viable for a new entrant; that probability is combined with a deterministic rule layer to produce the shipped GO / HIGH RISK / NO GO verdict. Independently back-tested on a separately-held cohort of 169 historical Amazon FBA niches with four-year outcomes (2022 entry → 2026 result), the pipeline achieves 96.2% NO-GO precision and 97.8% GO precision — bootstrap 95% confidence intervals on every metric, no point estimates.
The five layers of a verdict
- Evidence collection. 39 data sources are queried for a fixed set of signals: SERP rankings, review density and authenticity, regulatory registries (FCC, CPSC, Comtrade), historical listing snapshots, financial benchmarks, supply-chain proxies. Every source has explicit fall-back behaviour and freshness watermarks.
- Feature reduction. Raw evidence is reduced to a feature vector covering market structure, competitive intensity, regulatory exposure, financial viability, and seasonality. Quantitative features (Gini, coefficient-of-variation, log-mean) live alongside binary signals.
- Calibrated probability. A multi-class machine-learning classifier produces three probabilities — P(DEAD), P(ALIVE), P(THRIVING) — that sum to 1. The DEAD probability drives the NO GO branch, the THRIVING probability drives the GO branch, and ALIVE is the abstain-eligible middle. Each class probability is independently calibrated, so a 0.80 prediction empirically corresponds to roughly an 80% outcome rate within that class.
- Rule layer. A deterministic, auditable verdict rule combines P(viable) with explicit gates (FCC-regulated → HIGH RISK; review-fraud cluster detected → HIGH RISK; saturated review distribution + low BSR → NO GO). Every gate fires a public evidence card on the report.
- Conformal abstain. Borderline niches where the model's probability is too uncertain to commit to a label are flagged "abstain" rather than forced into GO or NO GO. Abstain rate is published, not hidden.
Validation protocol
- Cross-validation: stratified k-fold, niche-family-grouped to prevent leakage from sibling niches sharing the same ASIN pool.
- Label sourcing: the 2,710 ground-truth training labels are derived algorithmically from longitudinal Amazon trajectory signals (BSR drift, review velocity, listing alive/dead state, price stability) cross-referenced against 2022→2026 marketplace observations — not hand-annotated. Algorithmic derivation is reproducible from public Amazon data, eliminating annotator bias and scaling beyond what manual labeling could cover. A temporal cutoff is enforced so outcome-period features cannot leak into training.
- Held-out 169-cohort: a separate cohort of 169 historical niches with verified 2022→2026 outcomes. Never used during training. The publishable headline numbers come from this cohort.
- Bootstrap CI: 2,000 resamples for every headline number. 96.2% NO-GO precision is a resampled bound, not a point estimate.
- Calibration: equal-count-binned expected calibration error on held-out folds — never in-sample.
- Conformal wrapper: inductive conformal prediction. At α=0.10 (90% coverage target) the model commits to a singleton label on the supermajority of niches and abstains on the rest. Guarantees marginal coverage without distributional assumptions.
- Cohort prior disclosure: the training distribution is 75.4% positive while the 169-cohort prior is 53.8% positive. Both numbers are public so the reader can see the gap between training prevalence and the gold-standard slice.
Cohort shift and prior calibration
The training pool and the held-out 169-cohort have different class priors: the training pool skews positive (75.4% viable) because Amazon-historical data over-represents niches that survived long enough to leave a usable signal trail, while the gold-standard 169-cohort is closer to a balanced 53.8% / 46.2% split. A naive classifier trained on the skewed pool would systematically over-predict GO when applied to the more balanced cohort.
We correct for this in two stages, both standard in the prior-shift literature:
- Prior-shift adjustment: the raw class probabilities are reweighted using the Saerens-Latinne-Decaestecker formula so the predictive distribution matches the cohort prior, not the training prior.
- Per-class isotonic calibration: the prior-adjusted probabilities are passed through an out-of-fold isotonic regression separately per class, so that a stated 0.80 confidence empirically delivers ~80% precision within that class on the held-out cohort.
After this two-stage correction, expected calibration error on the 169-cohort sits in the low single-digit-percent range with the bootstrap CI we publish. The training-vs-cohort prior gap is a fact of the data, not a bug — but it would become a bug the moment we let it leak through into the headline numbers without the correction layer.
Honesty matters more than a clean marketing chart
A few facts we keep visible on the public site even when they cut against the headline:
- The 169-cohort prior is 46.2% DEAD / 53.8% GO, so an always-DEAD trivial classifier would already match 46.2%. We cite this baseline so the reader sees that "97.8% GO precision" is a real lift, and we report binary accuracy at our default 0.5 threshold separately (88.2%, bootstrap CI 82.8–92.9%) — never as a single headline.
- HIDDEN GEM, an in-product opportunity tag we apply to a small slice of the 6,779 catalog, has 41% precision on the historical cohort — well above base rate, but we publish the number because 41% is a substantial fraction of false positives. It is a screening tag, not a verdict.
- Where ML and the rule layer disagree, the report carries a "model vs. rule divergence" card so the reader can see the conflict instead of having one signal silently overrule the other.
- Bootstrap 95% confidence intervals are mandatory for every metric we publish. No point estimates without CI.
- "20 marketplaces" means: the BSR-normalisation layer and the routing infrastructure cover all 20 markets, but the underlying ML model is currently US-trained and routed per marketplace via BSR-normalised features. Native per-market models will ship as the verified-cohort count crosses ≥200 ASINs per marketplace; until then we surface the US-trained inference under each market's BSR scale and label it as such on each report.
What we do not publish
The architecture is public; the coefficients are not. Specifically:
- Feature definitions and the directional constraints we apply to them.
- Hyperparameters (model capacity, learning rate, regularisation).
- Exact weighting of the 39 data sources.
- The rule-layer threshold values.
Publishing those would let adversaries game individual RIDGE reports — inflate the signals that score positive, suppress those that score negative. The academic discipline we do publish (calibration, validation protocol, held-out cohort, conformal coverage, bootstrap CI) is enough for any ML practitioner to verify the pipeline is not cooked. Nothing on this page prevents a reader from reproducing the methodology; it only prevents them from forging a RIDGE verdict.
See the same methodology applied to your niche
48-hour delivery, 40+ report sections, calibrated confidence with abstain on borderline niches.
Order Analysis