All posts
Ray Iyer
Ray Iyer
Co-founder, Anglera

The state of product data in Grocery & CPG retail (2026)

Grocery and CPG catalogs are still thin and inconsistent in 2026, and AI shopping agents now punish that instantly. Here is the real cost and what to fix first.

The state of product data in Grocery & CPG retail (2026)

Grocery and CPG product data has never had more places to fail: PIMs, retailer portals, GDSN data pools, marketplace feeds, and now AI shopping agents that read a catalog once and decide whether a product exists. Most catalogs still aren't built for that. Here is what the data actually looks like heading into the back half of 2026, what the gaps cost, and why this year makes the problem harder to ignore.

The catalog is bigger and thinner at the same time

Grocery assortments keep expanding: private label lines, seasonal SKUs, better-for-you reformulations, and long-tail specialty items all get added faster than anyone documents them. A single grocery data catalog now tracks well over a million distinct products once private label and regional variants are counted, according to Foodgraph, and even that dataset admits every one of its dozens of ingested sources has gaps.

The pattern repeats at the retailer level. A new item shows up with a UPC, a category, and a price. Ingredients, allergens, diet certifications, pack size logic, and country of origin arrive later, if at all, because nobody owns the follow-up. GS1's own case for its Global Data Model exists because this is a chronic, industry-wide condition, not a one-retailer problem: incomplete and inconsistent product information across the GDSN network is common enough that GS1 built a new global attribute standard specifically to force alignment across trading partners.

Grocery has a second complication packaged goods in other categories don't: the data changes underneath the product. Reformulations, allergen updates, and package-size "shrinkflation" changes all require someone to re-sync the record, and in most catalogs that sync lags the physical change by weeks.

What thin data actually costs

Missing or wrong attributes don't just look sloppy. They cost specific, measurable things.

  • Lost search. A shopper filtering for "gluten-free" or "no added sugar" never sees a product that qualifies but isn't tagged that way. The item isn't ranked lower — it's invisible.
  • Poor conversion. Grocery shoppers compare pack size, serving count, and price-per-unit constantly. When those fields are blank or wrong, carts get abandoned at the comparison step, not the checkout step.
  • Returns and complaints. Allergen and ingredient errors in grocery are not a UX problem, they're a safety and trust problem. A wrong "contains nuts" flag drives a return; a missing one drives something worse.
  • Retail media waste. Retail media networks are entirely dependent on clean, standardized product data to place ads against the right queries and attribute sales correctly. Dirty data means brands pay for placements that never had a chance to convert.

None of this shows up as a single line item on a P&L. It shows up as flat conversion rates, rising ad costs per click, and a returns line that never quite explains itself.

Why 2025-2026 raises the stakes

Two things changed the urgency calculus this year.

First, AI shopping agents went from experiment to infrastructure. ChatGPT, Google's AI Mode, Perplexity, and Copilot now field a meaningful share of product discovery, and they work exclusively from structured data — titles, attributes, availability, and images — not marketing copy. As one industry analysis put it, product data is becoming the new packaging: if an attribute like "sustainable packaging" or "high-protein" isn't tagged cleanly, an agent searching for it simply won't surface the product, no matter how good the product actually is.

Second, online grocery itself is no longer a side channel. Online grocery sales are projected to reach roughly $452 billion, or 25.5 percent of total grocery spending, by 2028, according to the Food Industry Association, up from about 18 percent in 2024 — and the same report finds most shoppers have now tried generative AI tools for food-related tasks. Every one of those digital and AI-assisted sessions runs entirely on the attribute data behind the SKU. There's no shelf label or in-store associate to compensate for a blank field.

Put those two together and thin grocery data stops being a quiet inefficiency. It becomes the reason a product doesn't get discovered at all, in the channel that's growing fastest.

What "complete" actually looks like

Here's a typical raw grocery feed record next to what an AI shopping agent — or a shopper filtering by diet — actually needs:

AttributeRaw feedEnriched record
Product nameOat Milk Original 32ozOrganic Original Oat Milk, Unsweetened, 32 fl oz
Diet tags(blank)Vegan, Gluten-Free, Non-GMO
Allergens(blank)Contains: none listed; made in a facility that processes tree nuts
Ingredients(blank)Water, organic oats, sunflower oil, sea salt
Nutrition (per serving)(blank)90 cal, 2.5g fat, 16g carbs, 3g protein
Pack size / unit price32oz32 fl oz (1 qt) / $0.12 per fl oz
Country of origin(blank)Product of USA

Ask an AI shopping assistant to "recommend a vegan oat milk with no added sugar under $5" and it will only surface the row on the right. The row on the left doesn't exist to the agent, even though it's sitting on the same physical shelf.

The fix isn't a rip-and-replace

Retailers and brands don't need a new system of record to fix this — most already have a perfectly good PIM or catalog platform. What's missing is the continuous work of scoring every SKU against what a modern shopper or AI agent actually needs, filling the gaps, and keeping the record in sync as formulations and packaging change. That's the layer Anglera adds on top of any PIM or commerce platform, without replacing it: it scores catalog completeness, gap-fills missing attributes, and keeps grocery and CPG product data accurate and machine-readable as it changes, so every SKU stays visible to shoppers and AI shopping agents alike.

Ray Iyer

About the author

Ray IyerCo-founder, Anglera

Ray is a co-founder of Anglera, building the product-data infrastructure for agentic commerce — turning messy catalogs into structured, AI-readable data that buyers and answer engines can find. Previously product at Uber; Stanford CS.

See it on your own SKUs.

A 30-minute walkthrough on your categories and your supplier data.

Book a demo