- LLM
- Large language model — a broad class of generative text models trained (at least in part) on next-token prediction, usually at billions of parameters, and adapted with SFT, preference optimization, or tools.
- RAG
- Retrieval-augmented generation — fetch grounded passages from your own stores, place them in context with the user query, then generate an answer with citations. The retrieval step is a product surface: chunking, ACLs, freshness, reranking, evals.
- MCP
- Model Context Protocol — a structured way for agents to discover and call tools/data sources through a standard host ↔ server contract (often discussed alongside copilots that need filesystems, tickets, or internal APIs).
- Agent
- In LLM products, a loop where the model plans steps, calls tools, observes results, and continues until a stop condition — as opposed to a single prompt/response turn without tool use.
- RLHF
- Reinforcement learning from human feedback — train or fine-tune a policy using human (or model-assisted) preferences, often after SFT, to better align outputs with usefulness and safety constraints.
- SFT
- Supervised fine-tuning — continue training on curated input/output pairs to teach a model a style, format, or domain prior to RLHF/DPO-style preference stages.
- DPO
- Direct preference optimization — a class of alignment objectives that adjust models from pairwise preferences without an explicit reward model in the loop (implementation details vary by paper and stack).
- LoRA
- Low-rank adaptation — parameter-efficient fine-tuning that adds small trainable matrices so you can specialize a base model without updating full weights.
- PEFT
- Parameter-efficient fine-tuning — umbrella for LoRA, adapters, and similar methods that cut training cost versus full-model updates.
- BPE
- Byte-pair encoding — a common subword tokenizer algorithm; “tokens” billed by APIs correspond to these pieces, not words.
- Eval / evals
- Evaluation harnesses for models or systems: offline benchmarks, golden sets, online A/B, human labeling, LLM-as-judge (with skepticism), and safety/red-team probes.
- HITL
- Human-in-the-loop — workflows where people approve, correct, or escalate model/tool actions before side effects commit.
- NFR
- Non-functional requirement — latency, availability, cost, privacy, observability targets that constrain architecture choices alongside features.
- SLA
- Service-level agreement — contractual or internal promise about uptime, latency, or support; design answers should tie SLAs to what you actually measure.
- PII
- Personally identifiable information — data subject to privacy controls, residency, redaction, and retention policies in RAG and logging paths.
- RBAC
- Role-based access control — assign permissions via roles/tenants; retrieval and tool execution must enforce the same view of data the user may access.
- KV cache
- Key/value cache of attention states during autoregressive generation — critical for lowering repeated-prompt latency; vLLM/continuous batching stacks organize it across requests.
- BM25
- Classic sparse retrieval scoring — often hybridized with dense vector search for RAG when lexical overlap matters (SKUs, IDs, rare tokens).
- Reranker
- A second-stage model (often cross-encoder) that scores (query, passage) pairs to reorder top candidates from a first-pass retriever.
- LCEL
- LangChain expression language — composable Runnable pipelines (prompt → model → parser) in modern LangChain-style stacks.
- Guardrail
- Policy layers around models and tools: schema validation, regex/topic blocks, allowlists, circuit breakers, and human escalation when confidence is low.
Reading: on Path A, Path B, the interview theory banks, and the design guide index, a scroll progress strip under the header shows how far you are through the page. The learning hub can offer a one-time resume link from your last visit (stored only in this browser).