All posts
Ray Iyer
Ray Iyer
Co-founder & CEO, Anglera

Scoring product-data quality so it improves instead of decaying

How to score product-data quality across completeness, consistency, accuracy, and richness, set a real bar, and keep catalogs improving instead of decaying.

Scoring product-data quality so it improves instead of decaying

Most catalog cleanups follow the same arc: a big enrichment push, a temporary spike in quality, then a slow slide back toward chaos as new SKUs, supplier updates, and marketplace exports pile back on. The problem isn't that teams don't care about data quality. It's that they measure it once, as a project, instead of scoring it continuously, as a metric. Here's how to build a scoring system that actually holds the line.

Why catalogs decay by default

Product data isn't static, even when nobody touches it. Suppliers revise spec sheets, categories get restructured, new attributes become mandatory for a channel, and last quarter's "complete" record quietly falls behind. This mirrors the broader pattern of data decay across business systems, where records erode continuously unless something actively maintains them (Object Edge). A PIM doesn't stop this. A PIM stores whatever was true (or good enough) on the day someone entered it. It has no opinion about whether that record still meets the bar six months later, or whether the bar itself has moved because a retailer or an AI answer engine now expects more.

That's the core distinction worth internalizing: your PIM is a system of record, not a system of quality. Scoring has to sit on top of it.

The four dimensions worth scoring

Data quality literature converges on a consistent set of dimensions, and for product data specifically, four map cleanly to buyer and channel needs (Atlan, GS1):

DimensionWhat it measuresExample failure
CompletenessAre required and channel-specific fields populated?Marketplace requires 8 bullet points; record has 3
ConsistencyDoes the same attribute match across SKUs, categories, and systems?"Voltage" stored as 24V, 24 volts, and 24-Volt in the same category
AccuracyDo values match the true spec, not just something plausible?Weight copied from a similar SKU during a rushed import
RichnessIs there enough structured, buyer-relevant detail to answer real questions?Dimensions listed, but no material, load rating, or compatibility data

Retail data-quality programs, including GS1's, treat physical attributes and net content as high-stakes fields precisely because errors there cascade into returns, compliance issues, and even GTIN reassignment requirements (GS1 US). Distributors should treat their highest-return, highest-search categories the same way: score them harder than the long tail.

What a score actually looks like

A useful score is not a single number pulled from a vibe. It's a weighted composite per SKU, rolled up by category, brand, and supplier, so you can see where the catalog is actually weak instead of guessing.

For example, a mid-tier scoring model might weight completeness and accuracy higher for categories with high return rates or high search volume, and weight richness higher for categories where buyers compare technical specs before purchase (industrial components, electronics, safety equipment). The output isn't "94% complete" as a vanity metric. It's a ranked list: these 400 SKUs are below the bar, here's why, here's the fastest fix.

Before and after: a torque wrench listing

Raw supplier feed description:

"Torque wrench 1/2 drive adjustable"

Enriched attribute table:

AttributeValue
Drive size1/2 in
Torque range10-150 ft-lb
Accuracy rating±4%
Handle typeErgonomic, non-slip grip
CalibrationFactory-calibrated, ISO 6789 compliant
Case includedYes, molded storage case
Use caseAutomotive, HVAC, general maintenance

The raw feed has four words of information. The enriched version answers the questions a buyer, a distributor's search filter, and an AI answer engine all ask independently.

Ask an answer engine "what torque wrench works for automotive lug nuts and is ISO calibrated," and only the enriched record has the structured attributes to surface as a confident match. The raw description doesn't contain the words "calibration," "ISO," or "torque range" at all, so it's invisible to that query even if the product is the right one.

Setting a real bar, then holding it

A bar only works if it's specific and enforced at the point of ingestion, not discovered in a quarterly audit. Practical thresholds worth adopting:

  • Completeness: 95%+ on required fields for products actively selling (Atlan cites similar thresholds as standard practice across product data programs).
  • Consistency: zero tolerance on unit-of-measure and naming variance within a category, since this is the cheapest defect to catch and the most damaging to search and filtering.
  • Accuracy: values traceable to a source document, not inferred by analogy to a similar SKU.
  • Richness: a defined minimum attribute count per category, set by what buyers and retail requirements actually ask for, not by what's easy to fill in.

Below the bar should trigger action automatically, not sit in a dashboard. Above the bar should be revalidated on a cadence, because "passed once" and "still true" are different claims.

Continuous scoring instead of periodic cleanup

The reason cleanups don't stick is that they treat quality as a project with an end date. A scoring system that runs continuously catches drift as new SKUs land, suppliers push updates, or a category's requirements change, and it flags what actually fell below the bar instead of forcing a full re-audit. That's the difference between a catalog that improves and one that just gets cleaned periodically while decaying in between.

This is the problem Anglera is built to sit on top of. Your PIM stores the data; Anglera scores it against completeness, consistency, accuracy, and richness continuously, gap-fills from real supplier and source documents rather than guessing, and keeps flagging what drifts below the bar as the catalog changes. It's additive to whatever PIM you already run, or to a flat file if you don't have one, and most teams see it working inside 30 days rather than committing to a multi-year systems overhaul.

Ray Iyer

About the author

Ray IyerCo-founder & CEO, Anglera

Ray is the co-founder and CEO of Anglera, building the product-data infrastructure for agentic commerce — turning messy catalogs into structured, AI-readable data that buyers and answer engines can find. Previously product at Uber; Stanford CS.

See it on your own SKUs.

A 30-minute walkthrough on your categories and your supplier data.

Book a demo