Launching thousands of SKUs: the catalog cold-start problem

Onboarding thousands of new SKUs at once breaks manual enrichment math. Here's why the catalog cold-start problem is operational, not creative, and how to fix it.

A new supplier line lands. You win a distribution agreement. A category buyer signs off on 6,000 SKUs for a spring reset. Whatever the trigger, the result is the same: a flat file, a deadline, and a catalog that isn't ready to sell. This is the catalog cold-start problem, and it's the least glamorous, most expensive moment in retail and distribution operations.

Why cold-start breaks the usual playbook

Most product-data processes are built for steady-state maintenance: a trickle of new items, a content team that reviews a few hundred SKUs a week, a PIM that holds everything neatly once it's in. Cold-start is the opposite. It's a batch shock. Thousands of SKUs, from one or many suppliers, all at once, all needing titles, attributes, categorization, and images before they can go live anywhere.

Two forces collide:

Supplier data is inconsistent by default. One industry analysis notes that a single SKU can carry more than 700 potential attributes, arriving in as many as 500 different supplier formats, spreadsheet layouts, and naming conventions. Nothing about that maps cleanly to your taxonomy.
Manual enrichment doesn't scale linearly. It scales at roughly the same per-SKU rate whether you have 50 items or 50,000. If a person needs 30-45 minutes to research, gap-fill, and quality-check a single SKU's attributes, 5,000 new SKUs is 2,500-3,750 hours of work — more than a full year of one analyst's time, before a single item is live.

That math is why cold-start batches routinely blow through launch dates. A global industrial manufacturer studied by Blue Meteor was taking 45 days to onboard new SKUs before centralizing the process — and manufacturers running catalogs above 10,000 SKUs see roughly 25% higher processing costs from manual inefficiency alone.

What "not ready" actually looks like

Cold-start catalogs don't fail because data is missing outright. They fail because it's thin, inconsistent, and unstructured. A typical raw supplier feed row:

SKU: WP-2240-BLK
Name: Widget Pro 2240 Black
Desc: Heavy duty widget for industrial use. Good quality. Black finish.

That's enough to import. It's not enough to sell, filter, or answer a buyer's question. Compare it to what a buyer actually needs to make a decision:

Attribute	Raw feed	Enriched
Title	Widget Pro 2240 Black	Widget Pro 2240 Heavy-Duty Industrial Widget, Black, 3/8 in
Material	(missing)	Cold-rolled steel, powder-coated
Dimensions	(missing)	3.75 in L x 1.2 in W x 0.85 in H
Load rating	(missing)	2,240 lb static
Compliance	(missing)	ANSI B18.2.1
Compatible with	(missing)	Widget Pro mounting bracket series 2200-2299

The left column is what gets dumped into the PIM on day one of a cold-start launch. The right column is what site search, faceted filters, and comparison shopping actually run on — and it's the version that survives being asked a real question.

The AI-search wrinkle nobody planned for

Cold-start data doesn't just need to satisfy a category page anymore. Google's Merchant Center now spans roughly 50 billion product listings, refreshed at up to 2 billion updates per hour, and the company has added dozens of new attributes specifically for conversational shopping — compatible accessories, substitute products, answers to common product questions — as part of the shift toward AI-driven and agentic shopping surfaces, according to reporting on Google's agentic commerce strategy. Queries inside AI Mode also run 2-3x longer than a typical keyword search, which means the data has to answer a fuller question, not just match a term.

Ask an answer engine "which black industrial widget rated for 2,240 lb fits a 2200-series mounting bracket" and the raw feed row above returns nothing useful. The enriched row is the only one that can be matched, cited, and recommended. A cold-start batch that ships thin is invisible to that traffic on day one — and stays invisible until someone circles back to fix it, which, at 30-45 minutes a SKU, is exactly the work nobody has time to do during a launch crunch.

Why the fix isn't "hire more analysts" or "wait longer"

The two default responses to a cold-start batch are throwing headcount at it or pushing the launch date. Both have real ceilings. Headcount is linear cost against a fixed per-SKU time budget — it doesn't change the 30-45 minute rate, it just buys more parallel copies of it, and quality still varies by whoever's working that day. Pushing the date delays revenue and cedes shelf space, physical or algorithmic, to whoever launched first.

The operational fix is to change what "enrichment" means at the moment of intake: extract and normalize attributes from supplier docs automatically, score every SKU for completeness and consistency against your taxonomy, and route only the genuine judgment calls to a person. That's a fundamentally different throughput curve than a person reading a spec sheet and retyping it into a spreadsheet 5,000 times.

What this means for how you plan a launch

A few practical implications for anyone staring at a cold-start batch:

Score before you build. Know which SKUs are missing which attributes before assigning work, so effort goes to actual gaps.
Treat the supplier flat file as the starting line, not the finish line. A spreadsheet from a new vendor is rarely launch-ready; budget the gap-fill step into onboarding, not as a post-launch fire drill.
Plan for AI-visible data, not just page-visible data. The attributes that make a product filterable are the same ones that make it answerable by AI shopping tools.
Decouple the timeline from headcount. If your launch date depends on how many analysts you can staff, the plan runs on the wrong unit economics.

None of this requires ripping out your PIM or standardizing every supplier before they'll work with you — both are multi-year fantasies for most distributors and retailers.

Anglera exists for exactly this moment. It plugs into whatever PIM you already run — Akeneo, Salsify, inriver, Stibo, Syndigo, Pimcore, Informatica, or none at all — and can start straight from a flat file. It scores every incoming SKU for completeness, extracts and quality-checks attribute values from the supplier's own documentation rather than guessing, and gap-fills at a pace no manual team can match, going live in about 30 days rather than a multi-quarter implementation. Your PIM still stores the data. Anglera is what turns a cold-start batch into a catalog that's actually ready to sell — and ready to be asked a question.

Launching thousands of SKUs: the catalog cold-start problem

Why cold-start breaks the usual playbook

What "not ready" actually looks like

The AI-search wrinkle nobody planned for

Why the fix isn't "hire more analysts" or "wait longer"

What this means for how you plan a launch

Related reading

Stop fixing your product data at the exit

Product data enrichment is the cheapest growth in ecommerce

Your PIM added an AI button. It didn't add an enrichment team.

See it on your own SKUs.