Product data governance: who owns what, and how to keep it clean

Most product data problems are not technical. The PIM works, the ERP works, the feed pushes. What breaks is the part nobody put a name on: who decides what a "correct" SKU looks like, who is allowed to change it, and who notices when it rots. That is governance, and the absence of it is why a catalog that passed an audit last quarter is already drifting this one.

Governance gets a bad reputation because it's usually sold as a committee, a 40-page policy, and a tool license. None of that keeps data clean. What keeps data clean is a small number of clear decisions — which fields exist, who owns each one, what "good" means, and what happens automatically when something falls below the bar — enforced on every SKU, on every change, forever. This guide is about making those decisions concretely for a B2B catalog.

It's written for the people who actually live with the consequences: the catalog manager who gets blamed for the wrong category, the e-commerce lead whose conversion depends on attributes they don't control, the merchandiser who owns the assortment but not the data, and the IT owner who holds the system but not the content. We'll cover the ownership model, the golden record, the quality dimensions worth measuring, the operating cadence, and the failure modes that quietly undo all of it.

What product data governance actually is (and what it is not)

Product data governance is the set of decisions and rules that determine what your product records must contain, who is accountable for each part, and how quality is maintained as data flows in and out. It is the agreement layer above your systems. The PIM, ERP, DAM, and feed tools are where data lives; governance is the standard everything in them is held to.

It is easy to confuse governance with adjacent things, so be precise:

Governance is not your PIM. A PIM is a filing cabinet. It stores and structures records; it does not decide whether a value is correct, complete, or who is allowed to set it. You can run airtight governance in a spreadsheet and chaotic data in a six-figure PIM.
Governance is not a one-time data cleanup. A cleanse project ends. Governance is the standing rule that prevents the next mess. Without it, every cleanup decays back to baseline within a few quarters.
Governance is not a committee that approves things. A monthly data council that reviews exceptions is fine, but the work is the rules and the routing, not the meetings. If governance only happens when people are in a room, it doesn't scale past a few hundred SKUs.

The practical test: pick any field on any SKU and ask three questions. Who is accountable for this value? What is the rule that makes it right or wrong? What happens when it's wrong? If you can answer all three for the fields that matter, you have governance. If you can't, you have storage.

The ownership map: who owns what

The single most useful thing governance produces is a clear answer to who owns what — at the level of data domains, not just "the catalog team owns the catalog." Ownership has to be assigned per group of attributes, because the person who knows the right answer differs by field. A merchandiser cannot author a UNSPSC code; an engineer cannot write benefit-led copy.

Use a simple distinction borrowed from data management practice:

Data owner — accountable for the standard and the outcome for a domain. Usually a director-level role. Owns the definition of correct, signs off on the rules, and answers for quality in their domain. One owner per domain.
Data steward — responsible for the day-to-day: authoring, correcting, and approving values against the standard. The hands on the data.
Contributor — supplies raw input (a supplier, an engineer, a vendor PDF) but is not accountable for fitness.
Consumer — uses the data downstream (e-commerce, sales, marketplaces) and is entitled to raise defects but does not edit the master.

A workable ownership map for a B2B catalog looks roughly like this:

Data domain	Example fields	Owner	Steward
Identity & keys	SKU, GTIN/UPC, MPN, supplier part #	Master data / IT	Catalog ops
Classification	Category, taxonomy node, UNSPSC/ETIM class	Merchandising	Catalog ops
Technical attributes	Dimensions, material, voltage, compatibility	Product/engineering	Category specialist
Commercial	Price, UOM, MOQ, lead time, pack/case qty	Pricing / sales ops	Pricing analyst
Marketing content	Title, description, features, keywords	E-commerce / marketing	Content steward
Digital assets	Images, spec sheets, CAD, video	Brand / DAM owner	DAM steward
Compliance	Prop 65, RoHS, hazmat, certifications	Compliance / regulatory	Compliance analyst

Two rules make this stick. First, every field maps to exactly one owning domain — shared ownership means no ownership, and "description" sitting half with marketing and half with engineering is how you get titles nobody will defend. Second, the owner of a field is the only role that can change its standard; stewards work the data, but they don't get to redefine what good looks like. Write the map down, publish it, and put a name (not a team) next to every domain.

Define the golden record and the data standard

Ownership is meaningless without a target. The target is the golden record: the single, authoritative version of a product that every channel inherits from. Defining it means answering two questions for each attribute.

1. Where does the master value live, and how is it reconciled? Product data arrives from many places — supplier feeds, ERP, engineering, manufacturer PDFs, distributor portals. When two sources disagree on net weight, governance has to pre-decide who wins. Document a source-of-truth precedence per field. For example: GTIN comes from the manufacturer feed and overrides everything; price comes from ERP; category is assigned internally and is never taken from a supplier. Survivorship rules like these turn merge conflicts from a debate into a lookup.

2. What does "complete and correct" mean for this field? This is the data standard, and it should be specific enough to validate automatically. For each attribute define:

Format — units, allowed values, pattern (e.g. GTIN is 14 digits and passes the check-digit; dimensions are in inches to one decimal).
Requiredness — is it mandatory for all SKUs, mandatory for a category, or optional? Requiredness usually varies by category, so anchor it to the taxonomy, not the whole catalog.
Validation rule — the machine-checkable test (voltage is one of a controlled list; description is 150–300 words; at least one image ≥ 1000px).
Provenance — where the current value came from and when, so you can trust or re-verify it.

A short, real artifact beats a long abstract one. Build a field dictionary: one row per attribute with owner, source precedence, format, requiredness rule, and validation. This single sheet is the contract the whole operation runs against, and it's what makes "clean" objective instead of a matter of opinion.

The dimensions of quality worth measuring

"Keep it clean" is too vague to manage. Break quality into dimensions you can score per SKU and roll up per category, so you can see exactly where the catalog is weak. The six that matter most for product data:

Completeness — what share of required-for-category fields are populated. The most predictive single metric; a clean record with five attributes still loses to a complete one with thirty.
Accuracy — do values match reality? A voltage that's wrong is worse than a voltage that's blank, because it converts and then returns. Accuracy is the hardest to measure without a trusted reference; sample-audit against manufacturer spec sheets.
Consistency — same concept, same representation everywhere. "in" vs "inch," "Hubbell" vs "Hubbell Inc.," duplicated SKUs for one product. Consistency failures are what feed and search engines choke on.
Validity — does the value conform to its rule? Malformed GTINs, out-of-range numbers, free text in a controlled field.
Uniqueness — one product, one master record. Duplicates inflate counts, split reviews, and corrupt analytics.
Timeliness — is the record current? Discontinued items still listed, prices stale, a compliance flag that changed last quarter and never propagated.

The move that makes this operational is a per-SKU quality score that weights these dimensions and rolls up to a category and catalog level. Don't chase a single vanity number; track completeness-by-category and validity separately, because they have different owners and different fixes. The goal of measurement isn't a report — it's routing. A SKU that scores 40% on completeness in the "circuit breakers" category is a work ticket for that category's steward, not a line in a dashboard nobody reads.

The operating cadence: how clean data stays clean

Catalogs don't stay clean because someone cleaned them; they stay clean because three loops run continuously. Most governance fails by treating data quality as a project with an end date. Build these as standing processes instead.

Loop 1 — Intake (new SKUs and onboarding). Every new product enters through a gate, not a side door. New SKUs arrive thin and inconsistent from suppliers; the intake rule is that nothing publishes until it meets the data standard for its category. Make the standard a publish gate in the workflow, with a clear owner for filling gaps before go-live. This is where most thinness is born, so it's where the highest-leverage enforcement lives.

Loop 2 — Change control (edits to existing data). Define which fields can be edited by whom, and which require approval. Price and compliance flags usually need a second set of eyes; a typo fix in a description does not. Log every change with who, when, and from what source, so provenance survives. The point isn't bureaucracy — it's that uncontrolled edits are how a clean field silently becomes wrong with no trail back to the cause.

Loop 3 — Decay detection (the data you already published). Even untouched records rot: suppliers change specs, channels add required fields, products get discontinued, regulations shift. Run scheduled re-validation against the standard and flag records that have fallen below threshold. Watch for the common decay triggers: a new marketplace requirement, a supplier feed format change, a taxonomy update, a discontinued line. The teams that stay ahead treat re-validation as a recurring job, not a fire drill.

Fix data upstream, at the source of truth, not at the feed. A correction made in a channel feed patches the printout and never writes back, so you redo it on every channel forever and the master keeps rotting while the dashboard looks green. Enrich and correct the golden record; let the channels inherit.

Metrics, SLAs, and a governance scorecard

Governance you can't see, you can't run. Put a small number of metrics in front of owners on a regular cadence and tie them to accountability.

Quality metrics (the state of the catalog):

Completeness rate by category (% of required fields populated) — the headline number.
Validity rate (% of values passing their format/rule checks).
Duplicate rate and orphan rate (records with no category, no owner, or no source).
Coverage of critical fields — GTIN, primary image, category — since these gate discovery and channel eligibility.

Process metrics (is the machine working):

Time-to-publish for new SKUs (intake loop health).
Open data defects by domain and age (are consumers' complaints getting resolved).
% of changes with complete provenance (change-control health).
Re-validation backlog (decay loop health).

Attach SLAs to the owners, not the tool: new SKUs reach standard within X days of intake; reported defects in compliance fields close within Y days. The owner of each domain reports their numbers; the scorecard makes ownership visible and gives the data council something concrete to act on instead of opinions.

A caution worth stating plainly: don't over-instrument. A governance program that produces twelve dashboards and no fixed defects is theater. Pick the three or four numbers that change behavior — completeness by category, critical-field coverage, defect age, intake time — and let everything else be available but not in the weekly conversation.

Common pitfalls (and how to avoid them)

The same failure modes recur across distributors, brands, and retailers. Recognizing them early saves quarters of wasted effort.

Diffuse ownership. "The catalog team owns it" means no one owns the voltage field specifically. Fix: assign per-domain owners by name, in the field dictionary.
Governance as a policy document. A 40-page PDF nobody enforces is not governance. Fix: encode the standard as validation rules and publish gates, so the system enforces it, not goodwill.
Confusing clean with complete. A spotless record with five attributes passes a tidiness audit and still loses placement. Fix: measure completeness against category-specific requirements, not just error counts.
Fixing data at the feed. Channel-level corrections never write back; the master rots while the feed looks fine. Fix: correct the golden record upstream.
One-time cleanup, no maintenance loop. The audit came back green and then drifted. Fix: schedule re-validation; treat decay as expected.
Standards with no teeth in intake. New SKUs bypass the standard to hit a launch date, and the backlog of thin records grows forever. Fix: make the standard a publish gate.
Boiling the ocean. Trying to govern every field on every SKU at once stalls. Fix: start with the critical fields that gate discovery and compliance (GTIN, category, primary image, required technical attributes), prove the loop, then expand.
Owning the rule but not the labor. A common trap is that the steward is accountable for filling 50,000 thin SKUs by hand and simply can't. Governance defines what good looks like and who's accountable; it does not, by itself, do the enrichment work. That gap is where programs quietly stall, and it's worth planning for explicitly.

Where tooling and enrichment fit

Governance defines the standard and the ownership; it still needs hands (or automation) to actually gather, fill, and correct data at catalog scale. Be clear about what each layer does so you don't expect the wrong thing from a tool.

The PIM stores and structures the golden record and can enforce some validation and workflow. It is the system of record, not the workforce. It will happily store thin, stale data in a tidy schema.
Data quality / validation tools check values against rules and surface defects. They tell you where you're below standard. They don't fill the gap.
Feed tools translate and deliver to channels. Useful for format mapping; the wrong place to author or correct master data, because fixes there don't persist.
The enrichment layer is what does the actual work the standard implies: gathering missing attributes from manufacturer sources, cleaning and standardizing values, classifying to your taxonomy, scoring each SKU against the standard, and writing the result back to the source of truth.

That enrichment layer is where Anglera sits. Your PIM stores the data; Anglera does the work — it fills the thin SKUs, reconciles conflicting sources, scores every record against your data standard (and against how buyers actually search and compare), and writes the clean, complete result back to your PIM or ERP. It is not a PIM and not a governance committee; it's the labor that turns a governance standard from a document into a catalog that actually meets it, typically inside about 30 days. Governance decides what "correct" means and who's accountable. Something still has to make 50,000 SKUs correct — and keep them that way as they drift.

Step-by-step checklist

Build a field dictionary: one row per attribute with owner, source-of-truth precedence, format, requiredness rule, and validation test
Assign a named owner to every data domain (identity, classification, technical, commercial, content, assets, compliance) — no shared ownership
Separate roles explicitly: owner (accountable for the standard), steward (works the data), contributor (supplies input), consumer (raises defects)
Define the golden record and survivorship rules so source conflicts resolve by lookup, not debate
Make requiredness category-specific and anchor it to your taxonomy, not the whole catalog
Score every SKU on completeness, accuracy, consistency, validity, uniqueness, and timeliness; roll up by category
Run three standing loops: an intake publish gate, change control with provenance logging, and scheduled decay re-validation
Fix and enrich data upstream in the source of truth, never at the channel feed
Put 3-4 metrics in front of owners weekly (completeness by category, critical-field coverage, defect age, time-to-publish) with SLAs tied to owners
Start with critical fields that gate discovery and compliance (GTIN, category, primary image, required technical attributes), then expand
Plan for the labor, not just the rules — decide how thin and stale SKUs actually get filled at scale, not who is to blame for them

Frequently asked questions

Who should own product data governance overall?

Governance needs one accountable executive sponsor (often the head of e-commerce, merchandising, or master data), but day-to-day ownership is distributed by domain. The sponsor owns that the program exists and that owners are named; each domain owner (e.g. merchandising for classification, compliance for regulatory flags) owns the standard and outcome for their fields. Avoid putting one team in charge of every field — they won't have the expertise to define correct values across technical, commercial, and compliance domains.

What's the difference between a data owner and a data steward?

The owner is accountable for the standard and the outcome of a data domain — they define what 'correct' means and answer for quality. There's one owner per domain, usually a director-level role. The steward is responsible for the hands-on work: authoring, correcting, and approving values against that standard. Owners set the rules; stewards work the data within them. Keeping the two distinct prevents the common failure where the person doing data entry is also (impossibly) accountable for catalog-wide quality.

How do we keep product data clean after the initial cleanup?

Treat it as three continuous loops rather than a project. Intake: every new SKU passes a publish gate that enforces the category standard before go-live. Change control: edits are logged with who/when/source, and sensitive fields require approval. Decay detection: scheduled re-validation flags records that fall below standard as suppliers change specs, channels add fields, or products discontinue. The teams that stay clean run these as standing jobs, because catalogs drift whether or not anyone touches them.

Doesn't a PIM handle governance for us?

A PIM stores and structures the golden record and can enforce some workflow and validation, but it doesn't decide what correct means, assign accountability, or fill missing data. It will happily store thin, stale records in a tidy schema. Governance is the agreement layer above your systems — the field dictionary, ownership map, standards, and metrics. You can run strong governance with a modest PIM and weak governance with an expensive one.

What should we measure first?

Start with completeness by category (share of required-for-category fields populated) and coverage of critical fields that gate discovery and channel eligibility — GTIN, primary image, category. Add validity (values passing their format rules) and defect age. These four change behavior. Resist building a wall of dashboards; a governance program is judged by defects fixed, not reports produced.

Where does enrichment fit relative to governance?

Governance defines the standard and who's accountable; enrichment is the labor that makes records actually meet it. After you've defined the golden record and validation rules, something still has to gather missing attributes, reconcile sources, classify to your taxonomy, score each SKU against the standard, and write the result back to the source of truth at scale. That's the enrichment layer — where Anglera sits alongside (not inside) your PIM. The standard says what good looks like; enrichment makes thousands of SKUs good and keeps them that way as they drift.