Three years into the LLM era (the Transformer architecture that started it all is well-documented in the original Attention Is All You Need paper), one pattern is clear across every client we work with: the model is the easy part.
The hard part — the part that produces real business outcomes — is the data that feeds the model, the workflow it sits inside, and the human decisions that surround it.
What LLMs are actually good at inside a business
When we run a business analysis with a new client, we look for workflows that satisfy three conditions:
- Structured-ish inputs, structured-ish outputs. Emails, documents, forms, transcripts. Pure-freeform creative work is a bad fit.
- High volume, repetitive shape. A workflow happens hundreds or thousands of times a month.
- Forgiving failure mode. If the model is wrong, a human catches it cheaply — or the worst case is a small inconvenience.
Examples that hit all three: customer support, invoice processing, document extraction, lead qualification, internal knowledge retrieval, compliance pre-check. Examples that do not: final legal sign-off, medical diagnosis, fully autonomous financial trading.
The LLM stack inside a modern business
A production-grade LLM deployment is almost always four layers, not one:
- Retrieval. Getting the right context in front of the model every time.
- Model routing. Using a cheap model for easy work and a frontier model for hard work — automatically.
- Evaluation. A running test suite that tells you when a change made things worse.
- Orchestration. Chaining steps, handing off to humans, retrying on failure.
The model itself — GPT, Claude, local — is a swappable component. The other three layers are where the business-specific value accumulates.
On-prem, cloud, and the privacy question
Not every business can send data to a cloud model. In regulated environments, or when data residency is non-negotiable, we run models on-premises via tools like LLM Studio. Performance is sometimes 80% of frontier, sometimes 95%. The trade-off is real, and it is the business owner's call — not the engineer's. The Hugging Face Open LLM Leaderboard is the quickest way to see which open-weights model is currently top of its size class.
A simple test for whether an LLM workflow is worth building
Ask four questions:
- How many hours per week does a human currently spend on this workflow?
- What is the cost per hour (fully loaded)?
- What is the marginal cost of an LLM call for this shape of task?
- What is the cost of a wrong answer?
If the annual saving is at least 5x the annual model spend and the cost of a wrong answer is bounded, you have an LLM workflow. If the saving is smaller than that, you probably do not.
Where we think this goes next
Over the next two years, the winning teams will not be the ones with the largest model — they will be the ones with the cleanest data, the sharpest evals, and the best judgement about where to insert a human. That is the reason the QwertyBit process starts with business analysis, not with a model choice.
If you want a practical assessment of where LLMs would make the biggest dent in your business, book a call.