How to migrate from one PIM to another without breaking your catalog
Most PIM migrations don't fail at the import step. They fail three weeks later, when a buyer searches for a part that used to show up and doesn't, when a marketplace feed rejects 8,000 SKUs because a required attribute went null, or when someone notices the new system says a product is "complete" while half its specs quietly disappeared in the move. The data loaded fine. The catalog still broke.
The reason is that a PIM migration is never a copy. Two PIMs almost never model products the same way — different attribute schemas, different taxonomy depth, different rules for variants, assets, units, and localization. Moving between them is a translation problem, and translation is where meaning leaks. The job isn't to move bytes; it's to preserve what each record means to the buyer and to every downstream channel that consumes it.
This guide is the sequence we'd actually run for a B2B distributor, retailer, brand, or manufacturer with anywhere from 5,000 to 5 million SKUs. It covers what to audit before you touch anything, how to map two data models honestly, how to choose a cutover strategy that matches your risk tolerance, how to validate so you know nothing dropped, and how to keep a rollback path open until you're certain. It also flags the specific failure modes that bite catalog teams, because "without breaking your catalog" is the whole point.
First, understand why migrations break catalogs (so you can defend against it)
Before planning, name the enemy. PIM migrations damage catalogs in a small number of predictable ways. Every step later in this guide exists to neutralize one of these:
- Data-model mismatch. The old PIM stored "Material" as free text; the new one wants a controlled value from a list. Anything that doesn't match the list silently lands as blank, "Other," or an import error you didn't read.
- Relationship loss. Parent/child variants, kits and bundles, accessories, replacement-part links, and cross-references are stored differently in every PIM. These links break far more often than scalar fields because they depend on IDs resolving on both sides.
- Asset de-referencing. Images, spec sheets, and CAD files are usually references, not payloads. Move the records but not the asset URLs (or move assets to a new DAM with new paths) and every product loses its media while still looking fine in a spreadsheet.
- Completeness illusion. The old PIM called a record 100% complete against its rules. The new PIM has different required fields, so the same record is now 70% complete — but nobody notices until a channel readiness gate fails.
- Encoding and formatting rot. UTF-8 vs. Latin-1, smart quotes, HTML embedded in descriptions, decimal/comma locale issues, leading-zero GTINs read as numbers. These corrupt quietly and at scale.
- Identifier drift. If your join key (SKU, GTIN, MPN, internal ID) isn't stable and unique across both systems, records merge, split, or duplicate. This is the single most catastrophic failure because everything downstream is keyed on it.
If you can prove, post-migration, that none of these six happened, your catalog survived. The rest of the playbook is how you earn that proof.
Audit and freeze: know exactly what you have before you move it
You cannot validate a migration against a baseline you never captured. Spend real time here.
- Take a full, dated export of the source PIM — every product, every attribute, every locale, every asset reference, every relationship. This export is your ground truth and your rollback reference. Store it read-only.
- Profile the data, don't just look at it. For each attribute capture: fill rate, distinct values, max length, data type, multi-valued or single, and which values are out-of-vocabulary for the target. A column that's 95% empty or 80% "N/A" is a decision to make, not data to move.
- Inventory the structures, not just the fields: the category/taxonomy tree (depth, node count, products per node), variant models and their axes, bundles/kits, asset count and total size, locales and per-locale completeness, and all downstream syndication targets (GDSN/GS1, Google, Amazon, distributor portals, your own site).
- Identify your golden SKUs. Pick 50–200 records that represent the hard cases: deepest variant families, most attributes, every locale, multiple assets, kits, regulated items. These become your manual validation set at every checkpoint.
- Freeze authorship at the right moment. Decide on an edit-freeze window for the source PIM during cutover, and plan a delta pass to catch records changed between the freeze and go-live. For large catalogs, a hard freeze is unrealistic, so design delta migration from the start rather than bolting it on.
The deliverable from this phase is a written baseline: counts, fill rates, structure maps, and the golden-SKU list. If you skip it, you'll have no objective way to answer "did anything drop?"
Decide what NOT to migrate — and resist restructuring mid-move
A migration tempts you to do three projects at once: move systems, clean the data, and redesign the taxonomy/attribute model. Doing all three together is how a 30-day project becomes a 9-month one with a broken catalog in the middle.
Separate the decisions:
- Move vs. retire. Discontinued SKUs, one-off test products, and dead variants don't need a seat in the new system. Archive them in the source export; don't carry them forward. Migrating less is migrating safer.
- Migrate-as-is vs. fix-before vs. fix-after. For each known data problem, pick one lane explicitly. Blank required attributes, inconsistent units, and out-of-vocabulary values are usually cheaper to fix before load (so they land clean) than to chase down in the new system later.
- Lift-and-shift vs. remodel. If you genuinely need a new taxonomy or attribute schema, strongly prefer: migrate faithfully into a structure that mirrors the old one, prove the catalog is intact, then remodel as a separate, reversible project. Remodeling during the move means you can't tell whether a discrepancy came from the migration or the redesign.
One nuance specific to PIMs: don't haul forward rot just because it's there. A migration is the rare moment you're already touching every record, so it's the most efficient time to close real gaps — missing materials, thin descriptions, absent compliance fields, attributes buyers actually filter on. The discipline is to treat enrichment as a defined, scoped lane (fix-before or fix-after) with its own acceptance criteria, not as ad-hoc edits smuggled into the mapping. This is where a layer like Anglera fits honestly: it sits alongside the new PIM, fills and scores the gaps against how buyers actually search and compare, and writes clean records back into the source of truth — so you arrive in the new system with a better catalog instead of a faithfully-migrated mess. But keep it a separate, gated step from the move itself.
Map the two data models honestly (this is the real work)
The mapping document is the heart of the migration. Build it as a living spreadsheet, source attribute by source attribute, and don't let any field reach "unmapped" status by default.
For every source attribute, record: target attribute, type transformation, value transformation (including vocabulary lookups), default/fallback, multi-value handling, and what happens to unmappable values.
Watch these specific traps:
- Free text → controlled vocabulary. Build an explicit value-mapping table (e.g., "SS", "Stainless", "304 SS" all → "Stainless Steel"). Unmapped values need a destination, not a silent null. Review the long tail; that's where wrong mappings hide.
- Units and numbers. Normalize units (in vs. mm, lb vs. kg), separate magnitude from unit if the target stores them apart, and protect against locale decimal/thousands separators. Re-validate that GTINs/UPCs keep leading zeros and never get cast to numbers.
- Variants and relationships. Map the variant model deliberately: which attributes are variant axes, how parent and child SKUs relate, and how the target represents it. Migrate relationship links by stable identifier, and load parents before children so references resolve.
- Assets. Decide whether asset URLs carry over as-is or assets re-host in a new DAM. If paths change, you need a URL rewrite map and a job to verify every reference resolves (HTTP 200) post-load. Carry alt text, asset roles (hero vs. spec), and ordering, not just the file.
- Localization. Map every locale and its fallback rules. A field that fell back to a default locale in the old PIM may be genuinely empty in the new one — measure per-locale completeness, not just global.
- Reference/linked data. Manufacturers, suppliers, brands, and other linked entities must exist in the target before products reference them, or links fail on import.
Pick a stable join key and commit to it. GTIN is ideal where it exists; otherwise an internal SKU/product ID that's unique and unchanging across both systems. Everything in validation depends on being able to match a source record to its target twin.
Choose a cutover strategy that matches your risk
There's no single right answer; pick based on catalog size, channel coupling, and how much downtime your commerce stack can absorb.
- Big bang. Freeze, migrate everything, validate, flip downstream connections to the new PIM, go live. Simplest to reason about; highest blast radius if something's wrong. Works best for smaller catalogs (under ~50k SKUs) with a tolerable freeze window and few real-time integrations.
- Phased / incremental. Migrate by category, brand, or business unit across several waves. Lower risk per wave and you learn from early ones, but you must run both PIMs as authoritative for different slices simultaneously and prevent overlap. Best for large or multi-business-unit catalogs.
- Parallel run. Stand the new PIM up alongside the old, sync both, and reconcile continuously until you trust the new one, then cut authorship over. Lowest risk, highest cost and complexity, and you must define which system is the writer for each field to avoid split-brain. Reserve for catalogs where a bad publish has direct revenue impact.
Whatever you choose, sequence it as Extract → Transform → Load → Validate → (only then) Cut downstream over. A frequent mistake is repointing channels and feeds at the new PIM before validation passes. Keep syndication pointed at the old, proven source until the new one is verified, then switch feeds deliberately and watch the first publish cycle closely. Run a delta pass to capture anything edited during the migration window before you declare go-live.
Validate like an auditor, not a spot-checker
"It looks fine" is not validation. You need objective, automated reconciliation between your frozen source baseline and the loaded target, plus human review of the hard cases.
Automated checks (run on the full catalog):
- Record counts match by total and by category/brand. Investigate every delta; a clean migration explains its own numbers (e.g., "2,140 discontinued SKUs intentionally archived").
- Field-level completeness diff. Compare fill rates per attribute, source vs. target. A field that was 90% populated and is now 60% is a mapping bug until proven otherwise.
- Value fidelity on key fields. Hash or directly compare critical attributes (identifiers, price-relevant specs, compliance flags, titles) per record. Flag mismatches.
- Reference integrity. Every variant child resolves to a parent; every asset URL returns 200; every linked manufacturer/supplier exists. Zero dangling references is the bar.
- Completeness against the NEW rules. Score records against the target's required-field rules and channel readiness gates, so the "completeness illusion" surfaces before a channel rejects you.
Human checks: walk your 50–200 golden SKUs end to end in the new PIM — every locale, every variant, every asset, rendered correctly. Then round-trip a sample to each real downstream channel (GDSN, Google, Amazon, your site) in a test/preview mode and diff the output against what the old system produced. The channel is the real judge of whether your catalog is intact.
Define pass criteria before you run these, with thresholds (e.g., "0 dangling references, 0 identifier mismatches, completeness within 1% of baseline or explained"). Migration isn't done when data loads; it's done when reconciliation passes.
Keep a rollback path and decommission only after clean publish cycles
Don't burn the boats. Until the new PIM has proven itself in production, the old system and your dated export are your safety net.
- Keep the source PIM read-only and intact through go-live and beyond. If validation reveals a problem you can't fix forward quickly, you can repoint downstream channels back to the old, known-good source while you correct the new one.
- Define explicit rollback triggers in advance: identifier corruption, mass null-out of required fields, dangling references above threshold, or a channel feed rejecting more than X% of records. Decide who can call it and how, before emotions are running high during cutover.
- Watch the first few publish cycles like a hawk. Feed acceptance rates, search/findability on your own site for high-traffic SKUs, marketplace error queues, and any spike in "can't find the product" support tickets are your early-warning system. A catalog usually breaks visibly within the first one to three publish cycles, not at load time.
- Decommission the old PIM only after multiple clean cycles — typically two to four full publish/syndication rounds with no regressions — and after you've confirmed the new system is authoritative for every field and channel. Archive the final source export permanently regardless.
The goal isn't a flawless migration with no surprises; it's a migration where every surprise is detectable, reversible, and fixed before a buyer ever notices.
Step-by-step checklist
- Take a full, dated, read-only export of the source PIM as your baseline and rollback reference before touching anything
- Profile every attribute (fill rate, distinct values, type, multi-value, out-of-vocabulary values) — don't just eyeball the data
- Map the taxonomy, variant models, kits/bundles, assets, locales, and every downstream syndication target, not just scalar fields
- Pick 50–200 golden SKUs (deepest variants, all locales, multiple assets, regulated items) as your manual validation set
- Choose a stable, unique join key (GTIN, MPN, or internal SKU) that resolves on both systems before mapping anything
- Build a field-by-field mapping with explicit value-translation tables; no attribute allowed to default to 'unmapped' or silent null
- Decide per data problem: migrate-as-is, fix-before-load, or fix-after — and keep taxonomy/schema remodeling as a separate, later project
- Load reference entities and variant parents before children so relationships and links resolve on import
- Choose a cutover strategy (big bang, phased, or parallel run) matched to catalog size and channel coupling; plan a delta pass for the freeze window
- Keep syndication pointed at the proven old source until validation passes, then switch feeds deliberately and watch the first publish
- Validate as an auditor: count reconciliation, completeness diffs, value-fidelity hashes, zero dangling references, and completeness against the NEW rules
- Keep the old PIM read-only with defined rollback triggers; decommission only after 2–4 clean publish cycles
Frequently asked questions
How long does a PIM migration take?
It depends on catalog size, data-model distance between the two systems, and how clean the source data is — not on the import tooling, which is usually the fast part. A small, clean catalog (under ~50k SKUs) with a faithful lift-and-shift can go big-bang in a few weeks. Large or multi-business-unit catalogs, heavy localization, complex variants, or a simultaneous remodel push it to months. The biggest time sink is almost always mapping and validation, so budget more for those than for the load itself.
Should I clean up and enrich data before or after migrating?
Treat it as its own decision per problem. Fixing blanks, inconsistent units, and out-of-vocabulary values before load means they arrive clean and you don't chase them in the new system later. But a migration is also the most efficient moment to close real gaps, since you're touching every record anyway. The rule is to make enrichment a scoped, gated lane with its own acceptance criteria — never ad-hoc edits smuggled into the mapping — so you can always tell a migration discrepancy from a data change.
What's the most common way PIM migrations break a catalog?
Identifier drift and silent null-outs. If your join key isn't stable and unique across both systems, records duplicate, merge, or split, and everything downstream keyed on that ID breaks. The close second is free-text values that don't match the target's controlled vocabulary landing as blank or 'Other,' which doesn't error loudly but quietly empties fields buyers filter on. Both are invisible in a casual spreadsheet look and only surface at the channel.
Big bang or phased cutover — which is safer?
Phased and parallel runs lower risk per step but cost more and force you to run two authoritative systems at once without overlap. Big bang is simpler to reason about but has a larger blast radius. Decide on catalog size and how tightly your channels are coupled: under ~50k SKUs with a tolerable freeze window often suits big bang; large, multi-unit, or revenue-critical catalogs justify phased or parallel. Whatever you pick, validate before repointing any downstream feed.
How do I know nothing was lost in the migration?
You prove it against a baseline you captured before the move. Run automated reconciliation on the full catalog — record counts by category, field-level completeness diffs, value-fidelity hashes on key fields, and zero-dangling-reference checks for variants and assets — plus completeness scoring against the new system's rules. Then manually walk your golden SKUs and round-trip a sample to each real channel and diff the output. Define pass thresholds before you run, not after.
When is it safe to shut off the old PIM?
After the new system has run multiple clean publish cycles — typically two to four full syndication rounds with no regressions — and you've confirmed it's authoritative for every field and channel. Keep the old PIM read-only as a rollback path until then, with explicit rollback triggers defined in advance (identifier corruption, mass null-outs, feed rejection above a threshold). Archive the final source export permanently regardless of when you decommission.