The Cart Knows First – Machine Learning Prediction In Consumer Markets
Contents

There’s a moment, weeks before any announcement, before a test, sometimes before conscious awareness, when a grocery basket quietly starts to change.
More ginger ale. Folic acid appearing for the first time. Organic produce creeping in. The wine that used to arrive every two weeks, gone.
Nobody instructed these changes. No algorithm nudged them. They’re driven by something far more fundamental: the body responding to a life in transition.
In a new peer-reviewed paper published in the Journal of Artificial Intelligence General Science, I set out to answer a question that’s been sitting at the edge of data science for over a decade: can machine learning detect a major life event from purchase sequences alone, before the consumer has announced it, or in some cases, before they even know it themselves?
The short answer is yes. And the implications are bigger than pregnancy prediction.
The Target Story You Already Know (And Why We Did It Differently)
In 2012, Forbes reported that Target’s data team had identified a cluster of products, unscented lotion, supplements, certain foods, that reliably predicted pregnancy among shoppers. In some cases, they knew before the family did.
It was a remarkable finding. It was also widely criticised, and rightly so, for the covert, proprietary way it was conducted.
Our approach is the opposite. Every dataset used in this research is publicly available. Every methodology is fully disclosed. No individual is identified. The goal isn’t surveillance, it’s understanding what’s possible within ethical, anonymised, transparent constraints, and making that capability accessible.
What We Built and How It Works
Using Skubl’s predictive intelligence engine, we trained a machine learning model (an XGBoost gradient-boosted classifier, for the technically inclined) on a labelled dataset of 1,000 consumer records with confirmed pregnancy status. We then applied it at scale to 3.4 million grocery orders from 206,000 users in the publicly available Instacart dataset.
Rather than looking at what people buy at a fixed point in time, we engineered features that capture how purchasing patterns change over time. Four key signals drive the model:
- Basket Composition Shift Index — how much is a user deviating from their own historical norms?
- New-Category Introduction Rate — are entirely new product types suddenly appearing?
- Reorder Suppression Score — are previously regular purchases quietly disappearing?
- Temporal Order Spacing Delta — is shopping frequency accelerating?
The combination of these signals, rather than any single product, is what makes the model powerful.
What We Found
The model achieved an AUC-ROC of 0.901 on validation, meaning it’s highly effective at distinguishing life-stage transition signals from background noise.
One of the most commercially significant findings wasn’t what predicted pregnancy, but when the signals appeared:
The strongest early predictors aren’t the ones you’d expect.
Alcohol suppression and stopping smoking are associated with pregnancy — but they tend to reflect conscious behaviour change after the consumer already knows. These signals lag the transition.
The more powerful early signals are hormonally mediated: nausea-driven purchases like ginger ale, supplement introductions, dietary shifts. These changes happen before conscious decision-making. In some cases, they precede clinical confirmation.
That’s the window. That’s the commercial opportunity.
Applied to the Instacart cohort, the model flagged 4,776 users (19%) as probable early-stage pregnancy transitions — consistent with population-level prevalence rates adjusted for the demographic profile of online grocery shoppers.
It’s Not Just Pregnancy
Pregnancy is the clearest signal, but the same framework generalises. We trained equivalent models for:
- 🐾 New pet ownership
- 📦 Residential relocation
- 🕯️ Bereavement
Each life transition reshapes household consumption in structured, temporally predictable ways. The basket-sequence approach works across all of them.
This means what we’ve built isn’t a pregnancy detector. It’s a life-stage intelligence infrastructure — a reusable methodology for identifying consumers at the exact moment their world is changing.
Why This Matters for Brands
Life transitions are the highest-value moments in a consumer’s purchasing life. Brand loyalties are at their most fluid. New habits form. New products enter the basket. New relationships with retailers begin.
The brand that reaches a consumer at the onset of that transition — not after the announcement, not after the Amazon search, not after the first nappy purchase — holds a first-mover advantage that compounds over months and years.
Skubl operationalises this as a probabilistic enrichment layer on top of first-party data. No data science team required. No bespoke model builds. Life-stage probability scores, surfaced directly into existing marketing workflows.
A Note on Ethics
This work was conducted openly, on public datasets, with no individual identification, precisely because we believe this capability should be built on transparent, consent-based foundations.
The commercial application within Skubl runs on first-party data, with customer consent, as a statistical signal — not a surveillance mechanism. We’re advocates for industry-wide standards here, and we offer this research as a contribution to what’s possible within those standards, not around them.
The Takeaway
The grocery basket is a behavioural record. It captures not just what we need today, but who we’re becoming. Machine learning can now read that record in real time, at scale, and act on it — ethically, anonymously, and earlier than any conventional signal would allow.
The basket knows first.
This post is based on the peer-reviewed paper “Cart Knows First: Machine Learning Life-Stage Prediction from Large-Scale Consumer Purchase Data,” published in the Journal of Artificial Intelligence General Science (Vol. 9, Issue 1, 2026).
Cameron Batt
Founder @ Skubl and published machine learning researcher.