How to structure product attributes and attribute values
Most catalog problems trace back to a handful of early decisions about how attributes and their values are defined. Get them right and your data flows cleanly into a PIM, onto every channel, and through search and AI answer engines without constant cleanup. Get them wrong and you spend the next three years reconciling "Color: Blue" against "color: blue (navy)" across 80,000 SKUs.
This guide is about the structure underneath the catalog: what an attribute actually is, how it differs from a value, how to constrain values so they stay consistent, and how to organize all of it by category so each product carries the right fields and nothing extra. It's written for distributors, manufacturers, retailers, and brands who are standing up a new attribute schema, cleaning up an inherited mess, or preparing data for a PIM migration.
We'll stay practical: concrete naming rules, a reusable attribute-definition template, unit and vocabulary conventions, the single-vs-multi-value decision, and the governance that keeps it all from rotting. Where enrichment tools (including Anglera) fit, we'll say so plainly — but the schema decisions here are yours to make regardless of what tool fills the data.
Get the data model right: attribute, value, and group
Before naming a single field, agree on the vocabulary, because teams that blur these terms build schemas that fight back later.
- Attribute — the property being described. A definition, not a value. Examples:
voltage_rating,material,connector_type. An attribute has a data type, a unit (if numeric), a value constraint, and an owner. - Attribute value — the data filled in for a specific SKU:
120V,316 stainless steel,RJ45. Values are instances of an attribute on a product. - Allowed value (or value list) — the controlled vocabulary an attribute can draw from.
materialmight allow{aluminum, brass, 304 stainless steel, 316 stainless steel, ...}. This is the thing that keeps316 SS,316ss, andStainless 316from all coexisting. - Attribute group — a display/organizational bucket (
Dimensions,Electrical,Compliance) that holds related attributes. Groups are for humans and layout, not for logic. - Category (or product type) — the node in your taxonomy that decides which attributes apply. A
Circuit Breakerrequiresamperageandpoles; aWork Glovedoes not.
The relationships matter: categories govern which attributes apply; attributes constrain which values are legal; values are what actually ships to channels. Most messy catalogs collapse two of these layers — usually they store free-text values with no allowed-value list, or they attach every attribute to every product instead of scoping by category.
Build the taxonomy first, then map attributes to it
Attributes don't live in a flat pile — they hang off a category tree. Define the tree first, because it decides what's required where.
- Build a category (product type) hierarchy that reflects how buyers narrow down, not how your warehouse is organized. Aim for leaf categories specific enough that the same set of attributes applies to everything inside them.
Fasteners > Bolts > Hex Boltsis a good leaf;Hardwareis not. - Separate universal from category-specific attributes. Universal attributes apply to every SKU (
manufacturer,mpn,gtin,country_of_origin,unit_of_measure). Category-specific attributes apply only to a leaf or branch (thread_pitchfor bolts,lumensfor light fixtures). - Create an attribute-to-category map. A simple matrix — categories down the side, attributes across the top, cells marked Required / Optional / Not Applicable — is the single most useful artifact in the whole project. It tells enrichment what to fill and validation what to enforce.
- Mark requiredness per category, not globally.
amperageis required for breakers, irrelevant for gloves. Global "required" flags force teams to stuffN/Aeverywhere, which destroys the signal that an attribute is genuinely missing. - Reuse attribute definitions across categories.
materialshould be one attribute with one allowed-value list used by fasteners, fittings, and fixtures alike — not three near-duplicate fields. Define once, attach many times.
A practical sanity check: if two products in the same leaf category would naturally need different attribute sets, the leaf is too broad. Split it.
Define every attribute with a spec sheet, not just a name
A bare field name (size) is the root of a thousand inconsistencies. Each attribute deserves a one-row definition. Keep these in an attribute dictionary — a spreadsheet or PIM screen with one row per attribute and these columns:
- Machine name — stable, lowercase, snake_case:
thread_pitch. Never changes once published. - Display label — human-facing: "Thread Pitch." Can be localized; the machine name cannot.
- Definition — one sentence stating exactly what it captures and excludes. "Distance between threads in millimeters; metric only. For imperial, use threads_per_inch."
- Data type —
text,enum(single-select),multi-enum,integer,decimal,boolean,date,measure(number + unit). - Unit / unit family — if numeric, the canonical unit (
mm) and whether alternates are allowed and how they normalize. - Allowed values — the controlled list, or the rule for free-text (rare).
- Validation — min/max, regex, required-by-category.
- Example value — a real, correctly formatted instance.
- Owner — the team or person who approves new values.
The payoff: enrichment (human or automated) fills against an unambiguous target, validation can be automated, and onboarding a new merchandiser takes an afternoon instead of tribal knowledge. The single most common failure here is an attribute whose definition lives only in someone's head — which guarantees two people fill it two ways.
Control your values: vocabularies, units, and normalization
Consistent values are what make a catalog searchable and filterable. This is where most of the durable quality lives.
Use controlled vocabularies (enums) wherever the real-world set is finite. Color, material, connector type, certification, and finish should all be pick-lists, not free text. Free text is acceptable only for genuinely open fields like description or model_name.
Normalize units to one canonical unit per attribute. Pick a storage unit (e.g., store all weights in grams, all lengths in millimeters) and convert on input and display. Storing 5 lb, 2.27 kg, and 2270 g in the same column makes range filters and comparisons impossible.
Separate the value from the unit when the number must be computed on. length_mm = 50 (numeric) beats length = "50mm" (text). The first sorts, filters, and ranges; the second is a string. Keep the unit in the attribute definition or a dedicated unit field.
Standardize value formatting rules and write them down:
- Case: pick one (Title Case for display values like materials, or store a code + label).
- Abbreviations: ban silent variants — one of
Stainless SteelorSS, never both. - Booleans:
true/false, notYes/Y/1/X. - Ranges: decide structure up front —
operating_temp_minandoperating_temp_maxas two numeric attributes beats"-40 to 85C"as text. - Null vs. zero vs. "N/A": empty means unknown;
0means measured zero; a category mapping of "not applicable" means the attribute doesn't apply. Conflating these corrupts completeness reporting.
Map synonyms instead of forbidding them. Suppliers will send 316SS, 316 S/S, Stainless 316. Don't reject — maintain a synonym map that resolves all of them to the canonical 316 stainless steel. This is the difference between a brittle import and a resilient one.
Single-value, multi-value, and when to split a field
Cardinality decisions are easy to get wrong and painful to reverse.
Single-value (one value per SKU): most measures and classifications — weight, voltage_rating, primary_material. Use single-select enums or a single numeric field.
Multi-value (a set of values per SKU): certifications (UL + CSA + RoHS), compatible_models, available_colors at a parent level. Model these as multi-enum, not as comma-separated text in one cell. Comma-joined values look fine until something needs to filter on one of them.
Split a field when it carries more than one fact. "50mm M8 Hex Bolt, Zinc" packs four attributes into a name. Decompose into length_mm=50, thread_size=M8, head_type=hex, finish=zinc. The rule: one attribute = one fact, one unit, one value type. If you'd ever want to filter, sort, or compare on a piece of it independently, it deserves its own attribute.
Watch the variant boundary. Attributes that vary across a product family (size, color) are variant axes and belong on the child SKU; attributes shared by the family (brand, series, material) belong on the parent. Putting a variant axis on the parent — or a shared attribute on every child — creates redundancy and drift. Decide, per attribute, whether it's variant-defining before you load data.
Naming conventions and identifiers that survive migrations
Names are forever in practice, because every integration, feed, and report references them. Set the rules before anyone creates fields.
Machine names:
- lowercase
snake_case, ASCII only, no spaces or punctuation:country_of_origin. - Prefix or group by domain when helpful:
dim_length_mm,dim_width_mm, or rely on attribute groups. - Encode the unit in the name when it removes ambiguity:
weight_g,length_mm. It saves a lookup and prevents unit drift. - No display words, no localization, no version numbers. The machine name is an identity, not a label.
- Never rename a published machine name — deprecate and add a new one, then migrate.
Stable identifiers matter more than pretty ones. Give each attribute and each allowed value a stable internal ID or code (mat_316ss) separate from its display label. Then you can rename "316 Stainless Steel" to "316 Stainless (Marine Grade)" for display without rewriting a million product rows or breaking a channel mapping.
Don't overload identifiers. Keep gtin, mpn, sku, and internal product_id as distinct attributes with their own validation (GTIN-14 check digit, MPN as manufacturer's exact string). Collapsing them is a frequent and costly mistake — see our note on why missing GTINs quietly remove you from search.
Governance: keep the schema from rotting
A clean schema decays the moment real data and real people touch it. The teams whose catalogs stay clean run governance as a standing process, not a one-time cleanup.
- Assign an owner per attribute and per value list. Someone approves new allowed values. Without this, every supplier import silently invents new "colors."
- Define an intake path for new values and attributes. A request → review → add flow (even a lightweight one) beats merchandisers creating fields ad hoc. The goal is that adding
material: titaniumis a 10-minute approval, not a free-for-all and not a quarter-long ticket. - Validate at ingestion, not at the feed. Enforce data type, unit, allowed-value, and required-by-category rules when data enters the system. Catching a bad value at import is cheap; catching it after it's syndicated to six channels is not.
- Measure completeness by category. Track "% of required attributes filled" per leaf category. This surfaces thin spots that averages hide.
- Version the schema and log changes. When you add, deprecate, or remap an attribute, record it. Migrations and audits depend on knowing what changed and when.
Getting structure right defines the target; keeping it filled and accurate against that target is continuous work. This is where Anglera fits: it gathers, normalizes, and scores values against your attribute definitions and allowed-value lists, then writes them back to your PIM — so the schema you designed here actually stays populated as SKUs and suppliers churn. The structure is yours to own; the ongoing fill is what we automate.
Step-by-step checklist
- Define attribute vs. value vs. allowed value vs. group, and write the definitions down where everyone can see them
- Build a leaf-level category tree first; leaves should be narrow enough that one attribute set fits everything inside
- Maintain an attribute-to-category matrix marking each attribute Required / Optional / Not Applicable per category
- Keep an attribute dictionary: machine name, label, definition, data type, unit, allowed values, validation, example, owner
- Use controlled vocabularies (enums) for every attribute with a finite real-world value set; reserve free text for descriptions
- Store one canonical unit per numeric attribute and normalize all inputs to it; keep the number and unit separable
- Enforce one fact per attribute — decompose packed titles like '50mm M8 Hex Bolt Zinc' into discrete fields
- Choose single- vs. multi-value per attribute, and decide which attributes are variant axes (child) vs. shared (parent)
- Use lowercase snake_case machine names with units encoded (weight_g); give attributes and values stable IDs separate from display labels
- Keep gtin, mpn, sku, and internal product_id as distinct, individually validated identifiers
- Distinguish empty (unknown), 0 (measured zero), and not-applicable; maintain a synonym map to resolve supplier variants
- Assign owners, validate at ingestion not at the feed, and track completeness by category
Frequently asked questions
What is the difference between a product attribute and an attribute value?
An attribute is the property definition — `voltage_rating`, `material`, `connector_type` — with a data type, unit, and value constraints. An attribute value is the data filled in for a specific SKU, like `120V` or `316 stainless steel`. One attribute (defined once) holds many values across your catalog. The allowed-value list, a controlled vocabulary attached to the attribute, is what keeps those values consistent.
Should I use controlled vocabularies or free text for attribute values?
Use controlled vocabularies (enums/pick-lists) for any attribute whose real-world set of values is finite: color, material, finish, connector type, certifications. Free text is appropriate only for genuinely open fields like descriptions or model names. Controlled values are what make filtering, faceted search, and AI answer engines work; free text fragments into dozens of near-duplicate spellings that no filter can collapse.
How specific should my product categories be before I assign attributes?
Specific enough that the same attribute set applies to everything in a leaf category. `Fasteners > Bolts > Hex Bolts` is a good leaf because every product needs thread size, length, head type, and finish. `Hardware` is too broad. The test: if two products in one leaf would naturally need different attributes, split the leaf. Requiredness should be set per category, not globally, so you never force `N/A` into fields that don't apply.
How do I handle units consistently across numeric attributes?
Pick one canonical storage unit per attribute (e.g., all lengths in millimeters, all weights in grams) and convert on input and display. Store the number as a numeric type, not a string like `50mm`, so it sorts and ranges correctly. Encode the unit in the machine name (`length_mm`, `weight_g`) to prevent drift, and maintain conversion rules so supplier data sent in pounds or inches normalizes automatically instead of coexisting with metric values.
When should an attribute be multi-value versus single-value?
Single-value for properties with exactly one answer per SKU — weight, voltage rating, primary material. Multi-value for sets, like certifications (UL + CSA + RoHS) or compatible models. Model multi-value attributes as proper multi-select fields, never as comma-separated text in a single cell, because comma-joined strings can't be filtered on individual values. Also decide whether an attribute is a variant axis (size, color — belongs on child SKUs) or shared (brand, series — belongs on the parent).
How do I keep an attribute schema clean over time?
Treat it as a standing process, not a one-time build. Assign an owner to each attribute and value list, define an intake path so new values are approved rather than invented during imports, and validate data type, unit, allowed-value, and required-by-category rules at ingestion instead of at the feed. Track completeness per leaf category, version the schema, and maintain a synonym map so supplier variants resolve to your canonical values automatically.