An end-to-end AI system for identifying cost-saving opportunities across multi-vendor purchasing data — combining deterministic exact matching, FAISS semantic retrieval over 5.19 million vectors, and LLM equivalence reasoning with pack-size normalization across six million catalog rows from four distributors.
Large organizations that purchase from multiple vendors face a structural data problem: vendor A describes a product one way, vendor B describes the same product differently, quantities are in inconsistent units, and pack sizes are buried inside product codes. No common identifier exists to join across sources. Nobody has been systematically comparing prices.
The instinct when confronted with this is to reach for a visualization tool or drop everything into an AI assistant and ask it to find patterns. Both approaches fail for the same reason: the data isn't ready. You cannot virtualize your way out of a data quality problem.
This project was scoped and built to solve that problem end-to-end — from raw multi-vendor exports through a three-tier AI pipeline to a live price comparison tool and spending dashboard, deployable to Azure for use by researchers and procurement staff.
Which products are the same item sold at different prices by different vendors? And once you know that — what is the cheapest source, per unit, for each product?
Four distributors. Six million catalog rows. Two had prior-year purchase quantities; two were catalog-only. Part numbers sparsely populated. Pack sizes encoded in product descriptions or manufacturer codes. No clean join key across any source.
Lead with what is deterministic. Do everything you can without AI before using AI. Exact matching is faster, cheaper, and more reliable than inference — and it produces the most defensible savings estimates for building a business case.
The single biggest data quality challenge in any multi-vendor procurement dataset is pack size inconsistency. A product listed at $317 from one vendor and $14 from another looks like a massive savings opportunity. In reality, the $317 listing is for a case of 500 and the $14 listing is for a pack of 50. The per-unit price is nearly identical. Acting on the raw comparison would be wrong — and embarrassing to present to leadership.
The naive fix is to parse pack sizes from descriptions using regex. This works for well-formatted descriptions but fails on terse vendor abbreviations, which represent the majority of rows.
The real fix came from the source data itself. The procurement platform's export contained a field called qsu ("quantity sold as") with values like 1, 50, 100, 500, 1000 — populated for 99.9% of rows (6,068,599 of 6,068,888). Dividing catalog price by qsu produces the true per-unit price for every row.
23% of rows (1,395,917) required normalization (qsu > 1).
Raw savings estimate: $317,583
After unit-price normalization: $161,564
Pack-size inflation removed: $156,019 (49.1%)
Before building any comparison logic, find out if your data source has a quantity or pack-size field. It is often there, just not used. In this case, that single field transformed the entire analysis from unreliable to defensible — and changed the headline figure by nearly half.
Same manufacturer + same part number at a different vendor = identical product. Price difference is pure savings. This tier alone builds the business case — no AI required, no ambiguity.
The embedding model handles scale; the LLM handles reasoning. You never want AI searching 6M rows. The funnel narrows candidates to ~50 before any inference runs.
Extracts baseline unit price, checks product type/size/specs/concentration, normalizes pack sizes, rejects accessories. Returns structured JSON; savings calculated deterministically from output.
The LLM's one job is equivalence judgment — not price comparison. Price comparison is arithmetic. Do it in code after the AI runs. Keep the AI's scope narrow, give it explicit rules, and validate its output with spot checks.
Median unit-price spread across all 42,052 matched pairs: 20.7%. Mean: 30.1%. 90th percentile: 95% — meaning 10% of matched products have a price gap where one distributor charges nearly double the other for the same item.
The $161,564 figure covers only products where prior-year purchase quantities were available from two of four distributors. An additional 37,535 matched pairs have confirmed price gaps but unknown purchase volume — the true savings potential is larger and unlocked by obtaining quantity data from the remaining vendors.
Exact match: <2ms. Semantic retrieval: 75–110ms. Full AI pipeline: 5–8 seconds. LLM evaluated at 4/4 correctness and 4/4 consistency on a 30-query hand-curated eval set. Graceful degradation: Tier 1 always runs when LLM is offline.
Raw savings estimate before normalization: $317,583. After applying unit normalization: $161,564. The $156,019 difference was entirely pack-size inflation — identical products listed at different pack quantities making cheaper-per-pack options appear more expensive per unit.
| Lesson | What it means in practice | This project |
|---|---|---|
| Find the qty field first | Every multi-vendor dataset has a pack-size or quantity field with high coverage that sits unused. Find it, use it, normalize on it before anything else. | qsu field — 99.9% coverage — changed the headline savings figure by 49% |
| Funnel architecture is non-negotiable | You cannot run LLM inference on 6M rows. You can run it on 50. Design so that by the time a record reaches AI, it has been through at least two filtering steps. | FAISS narrows 6M → ~44 median candidates before any LLM call |
| Exact matching and semantic matching are complementary | Exact matching is fast, reliable, zero AI. Use it first. Semantic matching covers what exact matching misses. Never let semantic replace exact — the false positive rate is too high for a system people will act on. | Tier 1 runs in 2ms, always, even when LLM is offline |
| Normalize before presenting savings | A $300K savings figure that is 49% pack-size inflation is not a savings figure — it is a credibility problem. Document the methodology and show the corrected number. | $317K raw → $161K normalized, methodology fully documented |
| AI's job is equivalence judgment, not arithmetic | Price comparison is arithmetic — do it in code after the AI runs. The LLM's one job: look at ~50 product descriptions and say which ones are truly the same product as the baseline. | Savings calculated deterministically from LLM's structured JSON output |
| Build for graceful degradation | When the LLM is offline, exact-match results should still appear. When the vector index is cold, browse-by-category should still work. A tool that returns nothing when one component fails will not be used. | Three independent tiers, each fully functional without the others |
Python · pandas · numpy · pyarrow · Parquet · FAISS IVFFlat · bge-small-en-v1.5 (384-dim embeddings) · Sentence-Transformers
qwen2.5:7b via Ollama · FastAPI · Pydantic · uvicorn · JSON-structured LLM output · LLM response caching by prompt hash
Vanilla HTML/JS (no framework dependency) · works as a local file or deployed to Azure App Service B3 · Blob Storage for index files · ~$77/month estimated production cost