Question 1

Which LLMs do you work with?

Accepted Answer

Frontier cloud models — Claude (Anthropic), GPT (OpenAI), Gemini (Google) — for high-reasoning and long-context work. Open-weights models — Llama, Qwen, Mistral, Gemma, DeepSeek — for on-prem or cost-sensitive deployments via [LLM Studio](/tech-stack/llm-studio). Model selection happens in the feasibility phase based on the specific task, not by default. We routinely route between 2–3 models in a single application to balance cost and quality.

Question 2

Can you deploy LLMs on-premises for data-residency requirements?

Accepted Answer

Yes. For clients in regulated industries (fintech, health, legal, gov) or with strict sovereignty requirements, we deploy open-weights models entirely on your infrastructure. No data leaves the perimeter. We've shipped on-prem deployments for [KYC compliance workflows](/case-studies/fintech-compliance) and [contract risk scoring](/case-studies/contract-risk-scoring) where cloud LLMs were not an option.

Question 3

What is RAG and do we need it?

Accepted Answer

Retrieval-augmented generation — the LLM gets your relevant documents or data injected into context at query time, so it can answer about your specific content rather than only its training data. Most useful LLM applications need some form of RAG: customer-support assistants, internal knowledge agents, document-analysis pipelines, compliance checks. We build RAG pipelines with proper chunking, embedding, reranking, and eval — not just vector similarity that breaks at scale.

Question 4

How do you control cost in production LLM applications?

Accepted Answer

Three layers. (1) Model routing — use a small cheap model for routine classification, reserve the frontier model for genuinely hard calls. (2) Prompt caching — reuse common system prompts to cut token cost 50–90%. (3) Observability — every call is logged with token counts, latency, and downstream outcome. Monthly cost review against the KPI the integration was built to move. Typical cost savings when we take over a naive integration: 40–70%.

Question 5

What does a typical LLM integration engagement include?

Accepted Answer

Scoping and use-case validation (is an LLM actually the right tool?), model selection and feasibility testing, eval harness design, prompt engineering and RAG pipeline build, integration into your application, observability and cost instrumentation, human-review gates for high-blast-radius calls, production rollout, and monthly tuning. Fixed-price, typically €40–120k depending on scope and whether on-prem is required.

LLM integration & deployment

What "LLM integration" covers at QwertyBit

What we deliver

The models and tools we use

Where LLM integration meets the rest of QwertyBit

How to start

Anthropic

LLM Studio

CrewAI

KYC pre-check & compliance-aware LLM workflows

Contract risk scoring engine

Case summaries & document generation

What business owners ask before signing

Ready to see where agents can take cost out of your business?