All posts
Ray Iyer
Ray Iyer
Co-founder, Anglera

Building an attribute schema for Grocery & CPG that shoppers and AI can actually use

Grocery and CPG shoppers filter on allergens, diet, and pack size. Here's the attribute schema that keeps products visible in search and AI answers.

Building an attribute schema for Grocery & CPG that shoppers and AI can actually use

Grocery and CPG catalogs live or die on a small set of attributes apparel and electronics never have to think about: allergens, dietary claims, pack configuration, net weight. Miss one on a single SKU and that product doesn't rank poorly in filtered search or AI shopping answers, it disappears from them entirely. Here's the schema that keeps it visible, worked through with a cereal box example.

Grocery filters are pass/fail, not ranking signals

In apparel, a missing "material" attribute costs you a little relevance. In grocery, a missing "contains tree nuts" attribute costs you the entire shopper who has a tree nut allergy, because faceted search and AI assistants treat allergen and diet fields as exclusion filters, not ranking boosts. A high "filtered to zero" rate is a red flag that facets aren't dynamic enough or product data is incomplete, and grocery shows that failure most: a shopper filters for "gluten-free" or "vegan," and a product without a populated diet attribute never enters the result set, regardless of whether it actually qualifies.

A grocery attribute schema has to do two jobs at once: help a shopper narrow 40 cereal SKUs down to three, and help an AI agent answer "what's a low-sugar cereal without common allergens" without ever showing a list at all.

The attributes that actually gate visibility

Grocery facets cluster around a short list that shows up on nearly every retailer's shelf-page filters, and it maps closely to what regulators and standards bodies already require:

Attribute groupExamplesWhy it gates search
Allergensmilk, egg, peanut, tree nut, soy, wheat, fish, shellfish, sesameExclusion filter; a blank field reads as "unknown," which many systems treat as "unsafe, hide it"
Dietary claimsgluten-free, vegan, kosher, halal, organic, non-GMOExclusion filter, same failure mode as allergens
Nutrition factscalories, sugar (g), sodium (mg), protein (g), serving sizeRange filters ("under 10g sugar"); AI agents parse these directly to answer comparison questions
Pack & sizenet weight, count per pack, unit size, case packDrives "size" facet and unit-price comparisons; also the field most often wrong across a retailer's own SKUs
Ingredientsfull ingredient list, in descending order by weightBacks allergen/diet claims and lets AI agents verify claims instead of just trusting a badge
Storage & preprefrigerated, frozen, shelf-stable, cook timeFilters delivery/pickup eligibility and meal-planning queries

Since January 1, 2023, sesame has been the ninth major food allergen the FDA requires on packaged food labels, joining milk, eggs, fish, shellfish, tree nuts, peanuts, wheat, and soy. An allergen attribute list still stuck at eight fields makes every sesame-containing SKU technically mislabeled for search, even when the physical package is compliant.

The underlying data model already exists. GS1's GDSN nutrition and allergen attribute group is built around this exact structure, ingredients, allergens, additives, nutrients, serving size, tied to a trade item at the lowest GTIN in the hierarchy. Retailers don't need a new taxonomy; they need to populate the one the industry already agreed on.

What AI shopping agents need beyond the filter bar

Ask an AI assistant to "recommend a whole-grain cereal under 8 grams of sugar with no tree nuts for a kid's lunch," and it isn't clicking checkboxes. It's reading structured fields, nutrition per serving, ingredient list, allergen flags, and reasoning across them. If any one of those fields is missing, the AI either drops the product from consideration or, worse, gives a wrong answer with confidence because it inferred an allergen status from a category default.

This is now an explicit feed requirement, not a nice-to-have. OpenAI's product feed specification for ChatGPT shopping expects structured attributes delivered on a schedule as fast as every 15 minutes, and treats the feed as the source of truth rather than a supplement to on-page content. Google's Merchant Center product data spec requires GTIN for matching and disapproves listings with incorrect identifiers. Both are structurally allergic to blank or inferred fields in exactly the categories grocery cares about most.

Cereal box, before and after

Here's a raw supplier feed row for a store-brand cereal, next to what a properly enriched attribute set looks like.

AttributeRaw feed (before)Enriched (after)
Title"Cereal 18oz""Honey Toasted Oats Cereal, Whole Grain, 18 oz Box"
Net weightmissing18 oz (510 g)
Servings per containermissing17
Sugar per servingmissing9g
Allergensmissingcontains: wheat; may contain: tree nuts
Dietary claimsmissingnon-GMO; kosher
Ingredientsmissingwhole grain oats, sugar, corn syrup, honey, salt, vitamin/mineral blend
Storagemissingshelf-stable

The "before" row can still show up in a plain keyword search for "cereal." It cannot show up in a "gluten-free cereal under 10g sugar" filter, and it cannot be recommended by an AI agent comparing sugar content across three cereals, because there's nothing to compare. The "after" row does both, built entirely from fields that already exist in GDSN and on the physical Nutrition Facts panel; someone just has to extract, normalize, and attach them to the SKU.

Structuring the schema without boiling the ocean

Retailers don't need a 200-field grocery taxonomy on day one. A workable rollout order:

  1. Allergens and dietary claims first, since they're pass/fail filters with regulatory backing.
  2. Nutrition facts (sugar, sodium, calories, protein per serving), since they power range filters and AI comparison questions.
  3. Pack, size, and unit-of-measure, since size mismatches are the most common cause of duplicate or conflicting listings across a chain's own stores.
  4. Ingredients as free text, tied back to allergens so claims are auditable rather than asserted.
  5. Storage and prep attributes last, since they affect fulfillment eligibility more than discovery.

Finish each tier before starting the next. A catalog with allergens fully populated and nutrition half-done beats one that's 60% populated across all five tiers, because exclusion filters are what remove products from consideration entirely.

Anglera plugs into whatever PIM or feed a grocery retailer already runs, or none, and continuously scores every SKU against a schema like this one: gap-filling allergens, nutrition, and pack data from source documents and feeds, flagging conflicts between ingredient lists and allergen claims, and keeping attributes current as suppliers reformulate. Your PIM stores the data; Anglera does the work of making sure every field a shopper or an AI agent needs is actually there.

Ray Iyer

About the author

Ray IyerCo-founder, Anglera

Ray is a co-founder of Anglera, building the product-data infrastructure for agentic commerce — turning messy catalogs into structured, AI-readable data that buyers and answer engines can find. Previously product at Uber; Stanford CS.

See it on your own SKUs.

A 30-minute walkthrough on your categories and your supplier data.

Book a demo