All posts
Ray Iyer
Ray Iyer
Co-founder, Anglera

Attribution done right: connecting product-data work to revenue

A defensible attribution model for product-data investment: holdouts, geo tests, staged rollouts, and matched pairs finance will actually accept.

Attribution done right: connecting product-data work to revenue

Ask a merchandising team what better product data is worth and you'll get a shrug, a gut number, or a vague nod to "SEO." That's not a number finance can plan against. The good news is that product data behaves like any other operational lever — it can be tested with the same causal-inference toolkit marketers use to prove incrementality, and it produces cleaner signal than most marketing spend does, because you can gate exactly which SKUs get touched and when.

Why product-data attribution is genuinely hard

The core problem is confounding. If you enrich your top 500 SKUs this quarter and revenue on those SKUs goes up, you don't know how much of that lift came from richer content versus a merchandiser also fixing pricing, a category getting more paid traffic, or plain seasonality. Correlation between "we enriched this" and "sales went up" is not proof — it's the same trap marketers fell into with last-click attribution, where companies moving from single-touch to multi-touch models typically find 20-30% of perceived impact was misallocated to the wrong cause. Product data needs the same discipline: a counterfactual, not a before/after.

The fix is to borrow directly from incrementality testing, which has become the standard way marketers prove a channel or tactic actually caused a result rather than just correlating with one. As one 2025-2026 industry survey puts it, incrementality testing has moved from a niche practice to mainstream adoption, driven by pressure to prove that spend generates real business impact rather than just tracked activity. Product-data teams should hold themselves to the same bar.

Method 1: SKU-level holdouts

The simplest test: take a category or SKU list slated for enrichment, randomly split it in two, enrich one half now and hold the other half back on a fixed delay (2-4 weeks is usually enough to see a PDP-level effect). Compare conversion rate, organic sessions, and on-site search click-through between the two groups over the same window. Because both groups sit in the same site, same season, same traffic mix, most confounders cancel out. This is the cleanest version of the holdout method used in ad incrementality testing, where the difference between treatment and control isolates the causal effect of the intervention rather than the ambient trend.

Requirements for it to hold up: the two groups need to be similar in baseline traffic and price point (don't hold back your best-sellers against your long tail), and the holdout has to be a real holdout — no manual fixes creeping into the control group because a merchandiser "just fixed one thing."

Method 2: Geo and cohort tests

Geo testing is the standard method retailers and marketers use to prove causal lift when you can't cleanly hold out individual products — for instance, when enrichment ships site-wide but you can stagger it by region, store cluster, or customer cohort. You apply the change to one set of matched regions or a matched customer segment and leave a comparable set untouched, then compare outcomes. The gold-standard version of this requires roughly 10-15 matched markets with 95%+ historical correlation, sized to detect the lift you actually expect (typically 2-5%), run for four to six weeks. For most mid-market retailers, a lighter version — two or three matched DMAs or a randomized customer-cohort split in email/on-site personalization — is enough to get directionally solid numbers without a data-science team.

Method 3: Staged rollouts

When a full holdout isn't practical — you want every SKU enriched eventually and can't justify permanently withholding fixes from some products — stage the rollout instead. Enrich category A in week 1, category B in week 3, category C in week 5, and use the not-yet-enriched categories as a rolling control while they wait their turn. Track the change in each category's own trend line the week it goes live, relative to categories still in queue. This sacrifices some statistical rigor versus a true randomized holdout, but it's far better than a single before/after comparison, and it has the practical benefit of never leaving revenue on the table by design.

Method 4: Matched pairs

For catalogs too small or too heterogeneous for geo tests, matched-pair analysis works well: pair each enriched SKU with a similar un-enriched SKU (same subcategory, similar price band, similar traffic volume before the change) and compare the delta between pairs rather than absolute performance. This controls for the fact that a $40 accessory and a $400 appliance don't move the same way. It's the same logic as matched-market testing in geo experiments, just applied at the product level instead of the regional level.

The measurement stack, side by side

MethodBest forWhat it isolatesWatch-out
SKU holdoutEnrichment projects on large catalogsPDP conversion, on-site search CTRGroups must be balanced on baseline traffic and price
Geo/cohort testSite-wide or personalization changesOrganic sessions, revenue per visitorNeeds enough markets/cohorts for statistical power
Staged rolloutFull-catalog projects with no permanent holdoutDirectional lift, time-to-impactWeaker control than randomized holdout
Matched pairsSmall or highly varied catalogsSKU-level lift controlling for category/pricePair quality determines validity

Avoiding the over-claim

The fastest way to lose finance's trust is to report a number that can't survive a follow-up question. Three guardrails: report a range, not a point estimate, and say how it was measured. Never attribute 100% of a revenue change to product data when paid spend, pricing, or seasonality moved in the same window — marketing mix modeling exists precisely because single-method attribution overstates impact when multiple levers move at once, and the current industry consensus is to triangulate rather than rely on one model. And separate the metrics that are genuinely causal (holdout-tested conversion lift) from the ones that are merely correlated but still useful context (return-rate trend, support-ticket volume, AOV). Finance will accept "enrichment lifted PDP conversion 4-7% in a holdout test, with directional support from a 12% drop in spec-related returns" far more readily than a single blended ROI number with no method behind it.

None of this requires a data-science team — it requires discipline about what gets tested, what gets held out, and what gets reported as causal versus correlated. That discipline is only possible when the underlying data changes are trackable at the SKU level in the first place, which is the part most catalogs get wrong before they ever get to measurement. Anglera scores, gap-fills, and enriches product data continuously and keeps a record of what changed and when, so retailers running these tests have a clean, timestamped treatment group instead of a fuzzy "sometime last quarter" — the difference between a real experiment and a guess dressed up as one.

Ray Iyer

About the author

Ray IyerCo-founder, Anglera

Ray is a co-founder of Anglera, building the product-data infrastructure for agentic commerce — turning messy catalogs into structured, AI-readable data that buyers and answer engines can find. Previously product at Uber; Stanford CS.

See it on your own SKUs.

A 30-minute walkthrough on your categories and your supplier data.

Book a demo