All posts
Ray Iyer
Ray Iyer
Co-founder, Anglera

What messy product data actually costs Beauty retailers

Beauty catalogs are full of missing shades, vague claims, and inconsistent INCI lists. Here's what that actually costs, and why AI shopping agents raise the stakes.

What messy product data actually costs Beauty retailers

Beauty has the messiest product data of any retail category, and the numbers back it up: shade-driven returns, invisible SKUs, and inconsistent ingredient lists that quietly tax every channel a brand sells through. In 2025-2026, that mess stops being a merchandising nuisance and starts being a discovery problem, because the shopper asking for a recommendation is increasingly an AI agent that cannot see what your catalog doesn't say clearly.

The category that resists standardization

Beauty catalogs are structurally harder to keep clean than almost any other vertical. A single SKU can carry a dozen shade variants, an INCI ingredient list that changes with reformulation, marketing claims ("clean," "vegan," "reef-safe") that need substantiation, and skin-type or concern tags that shoppers actually search by. Most PIMs store a place for all of this. Almost none of them enforce that it gets filled in consistently across every variant, every retailer feed, and every reformulation.

The result is catalogs that are technically complete but functionally thin: a product page with a name, a price, and a photo, and not much else that a search engine, a shopper, or an AI agent can use to tell it apart from the next fifteen near-identical serums.

What thin data actually costs

The clearest evidence is in returns, which is where bad data on a beauty product page becomes a real dollar figure. Beauty's blended online return rate sits around 4.3 to 5 percent, and the leading driver in color cosmetics is shade mismatch, not product defects. As one former Sephora employee put it, foundation shade confusion made products like Giorgio Armani's Luminous Silk Foundation among the most-returned items in the store precisely because finding the right shade online was hard. Banuba's analysis points to the same root cause from a different angle: poor photos, inconsistent color reproduction, and incomplete or misleading descriptions create false expectations that a real product then fails to meet.

Here's what that looks like on an actual product page. This is a stripped-down version of a raw supplier feed next to what an enriched, shopper-ready listing needs to carry:

AttributeRaw feedEnriched
Shade name"Shade 5""Shade 5 - Medium, Warm Undertone"
Undertone/depthmissingWarm, Medium depth, matches Fenty 250-260 range
FinishmissingNatural, buildable, satin finish
Coverage"medium"Medium to full, buildable without cakiness
Skin type fitmissingCombination to oily; oil-control claim substantiated
Ingredient listpartial, non-INCIFull INCI-standard list, allergen flags called out
Claims"clean beauty"Vegan (certified), fragrance-free, non-comedogenic

The left column is a page. The right column is a page an AI agent, a search engine, and a shopper trying to avoid a return can all actually use.

Search and conversion, not just returns

Returns are the visible cost. The invisible one is upstream: products that never surface because the attributes a shopper filters by (undertone, finish, skin concern) are missing or inconsistent. A shopper who searches "warm undertone medium coverage foundation for oily skin" will never see a listing that only says "Shade 5, medium." That's a lost session before the return even has a chance to happen.

This compounds at the catalog level. Beauty brands frequently discover a hero SKU has quietly dropped out of retailer search results for days at a time, while paid media keeps sending traffic to a listing nobody can find. Thin, inconsistent data is not a cosmetic problem in the beauty category; it is the mechanism behind lost search visibility, mismatched purchases, and returns that were preventable before the order shipped.

Why 2025-2026 raises the stakes

Two forces are converging on beauty catalogs right now, and both punish thin data harder than before.

First, AI shopping agents have become a real discovery channel, not a novelty. BeautyMatter's reporting notes that a typical ChatGPT skincare query now returns just a handful of results, often five or fewer, and that AI-guided beauty discovery already converts two to three times higher than standard browsing paths. Generative AI referrals aren't hypothetical either, ChatGPT already accounts for a meaningful share of Target's and Walmart's referral traffic. An agent building that shortlist reads product data literally: no undertone attribute, no INCI-standard ingredient list, no substantiated claim means the product effectively doesn't exist for that query. Ask an AI to recommend a fragrance-free retinol serum for sensitive skin under thirty dollars, and it can only shortlist products whose data actually says all four of those things.

Second, marketplace pressure hasn't let up. Beauty brands increasingly sell across their own site, Amazon, Walmart, TikTok Shop, and specialty marketplaces simultaneously, each with its own feed requirements and attribute mapping. Every additional channel is another place for a shade name, an ingredient list, or a claim to drift out of sync with the source of truth. Inconsistency that used to just confuse one shopper on one page now propagates across every channel a brand touches.

Closing the gap

None of this requires ripping out a PIM. Your PIM stores the data; Anglera does the work of continuously scoring, gap-filling, and enriching it, so shade, finish, undertone, INCI ingredient lists, and substantiated claims stay complete and consistent across every channel and every reformulation. It plugs into whatever system already holds the catalog, or none at all, and flags what's thin before a shopper (or an AI agent) notices it first.

Ray Iyer

About the author

Ray IyerCo-founder, Anglera

Ray is a co-founder of Anglera, building the product-data infrastructure for agentic commerce — turning messy catalogs into structured, AI-readable data that buyers and answer engines can find. Previously product at Uber; Stanford CS.

See it on your own SKUs.

A 30-minute walkthrough on your categories and your supplier data.

Book a demo