Industrial distributor · $400M · Catalog · Data · Search
Rebuilding a catalog that could actually be searched
Moved from seven fragmented supplier feeds to a single canonical catalog — without freezing the business.
420k
SKUs normalized
7→1
Feed pipelines
18h→90s
Publish latency
+38%
Search conversion
Period
2023 — 2024 · 14 months
Stack
Postgres · Python · Algolia · Custom MDM
Problem
Seven supplier feeds with incompatible taxonomies, no canonical attributes, and no single source of truth for product identity.
Merch, digital, and operations each maintained their own spreadsheets; the storefront was a lagging index of whoever last updated what.
Approach
Started with identity — made SKU canonicalization a platform primitive, not a data-cleanup project.
Rolled changes department-by-department rather than a big-bang cutover, so each team could validate against their own workflow.
Built a review queue and audit trail from day one — the data story only holds if people can trust it.
Hindsight
I'd invest earlier in supplier-facing tooling. By the time the canonical model stabilized, we were still chasing upstream quality issues we could have caught at ingest.