Enriching Company Lists with AI: A Workflow for Data Teams
Data teams spend a surprising amount of time on the same task: someone hands over a spreadsheet of 2,000 company names and asks "what do they actually do?" Manually researching each one is impossible. Pulling website copy and feeding it to an LLM is the obvious solution — but doing it well, at scale, takes more care than a one-shot prompt.
What "enrichment" actually means
For most CRM and sales-ops use cases, you need three things added to each row:
- A products list — concrete things the company sells.
- A services list — what they do for customers (consulting, fulfilment, support).
- A short, structured description that a salesperson can scan in five seconds.
Resist the temptation to also ask the model for revenue, headcount or sector tags in the same call. Those are factual fields that belong to firmographic providers; LLMs will cheerfully hallucinate them.
The workflow
- Start from a clean key. Either a company number (most reliable) or company name + country. If your input is messy, dedupe and standardise first — see our fuzzy matching guide.
- Resolve a canonical website. Without a URL, the model is guessing from its training data — fine for famous brands, terrible for SMEs.
- Constrain the output schema. Force JSON with an explicit shape (
{ products: string[], services: string[], description: string }). Free-text output is impossible to load into a database. - Cap the lists. Allow 3–7 products and 3–7 services. Without a cap you'll get either one vague bullet or a wall of marketing copy.
- Always store the source. Keep the URL the model based its answer on, plus a timestamp. This is the single biggest predictor of whether the enriched data will survive an audit.
Quality checks that catch most failures
- Hallucination filter: if the model can't cite a source page, mark the row "needs review" rather than accept it.
- Generic-language filter: reject descriptions matching "innovative solutions provider" or "leading provider of". They almost always mean the model padded an empty result.
- Sector sanity check: a quick keyword match against expected industry terms catches "fintech" tags applied to a corner shop.
Doing it without building it
If you want this workflow without standing up your own pipeline, our Products & Services Generator does exactly this: upload a spreadsheet of companies and the tool returns clean, structured products and services descriptions for each one — ready to load into a CRM, prospecting list or supplier database.
What enrichment won't fix
Enrichment is downstream of identification. If the underlying list confuses "Acme Ltd" (Bristol cleaning company) with "Acme Limited" (Singapore software firm), the enriched fields will look polished and still be wrong. That's why we strongly recommend pairing AI enrichment with a registry-level identity check — covered in UK company due diligence for data teams.
