Reference

Glossary

Portable definitions for vocabulary that appears across the hub. Pages can link here (for example #rag, #llm) or use inline <abbr title="…"> where a tooltip is enough.

LLM: Large language model — a broad class of generative text models trained (at least in part) on next-token prediction, usually at billions of parameters, and adapted with SFT, preference optimization, or tools.
RAG: Retrieval-augmented generation — fetch grounded passages from your own stores, place them in context with the user query, then generate an answer with citations. The retrieval step is a product surface: chunking, ACLs, freshness, reranking, evals.
MCP: Model Context Protocol — a structured way for agents to discover and call tools/data sources through a standard host ↔ server contract (often discussed alongside copilots that need filesystems, tickets, or internal APIs).
Agent: In LLM products, a loop where the model plans steps, calls tools, observes results, and continues until a stop condition — as opposed to a single prompt/response turn without tool use.
RLHF: Reinforcement learning from human feedback — train or fine-tune a policy using human (or model-assisted) preferences, often after SFT, to better align outputs with usefulness and safety constraints.
SFT: Supervised fine-tuning — continue training on curated input/output pairs to teach a model a style, format, or domain prior to RLHF/DPO-style preference stages.
DPO: Direct preference optimization — a class of alignment objectives that adjust models from pairwise preferences without an explicit reward model in the loop (implementation details vary by paper and stack).
LoRA: Low-rank adaptation — parameter-efficient fine-tuning that adds small trainable matrices so you can specialize a base model without updating full weights.
PEFT: Parameter-efficient fine-tuning — umbrella for LoRA, adapters, and similar methods that cut training cost versus full-model updates.
BPE: Byte-pair encoding — a common subword tokenizer algorithm; “tokens” billed by APIs correspond to these pieces, not words.
Eval / evals: Evaluation harnesses for models or systems: offline benchmarks, golden sets, online A/B, human labeling, LLM-as-judge (with skepticism), and safety/red-team probes.
HITL: Human-in-the-loop — workflows where people approve, correct, or escalate model/tool actions before side effects commit.
NFR: Non-functional requirement — latency, availability, cost, privacy, observability targets that constrain architecture choices alongside features.
SLA: Service-level agreement — contractual or internal promise about uptime, latency, or support; design answers should tie SLAs to what you actually measure.
PII: Personally identifiable information — data subject to privacy controls, residency, redaction, and retention policies in RAG and logging paths.
RBAC: Role-based access control — assign permissions via roles/tenants; retrieval and tool execution must enforce the same view of data the user may access.
KV cache: Key/value cache of attention states during autoregressive generation — critical for lowering repeated-prompt latency; vLLM/continuous batching stacks organize it across requests.
BM25: Classic sparse retrieval scoring — often hybridized with dense vector search for RAG when lexical overlap matters (SKUs, IDs, rare tokens).
Reranker: A second-stage model (often cross-encoder) that scores (query, passage) pairs to reorder top candidates from a first-pass retriever.
LCEL: LangChain expression language — composable Runnable pipelines (prompt → model → parser) in modern LangChain-style stacks.
Guardrail: Policy layers around models and tools: schema validation, regex/topic blocks, allowlists, circuit breakers, and human escalation when confidence is low.

Reading: on Path A, Path B, the interview theory banks, and the design guide index, a scroll progress strip under the header shows how far you are through the page. The learning hub can offer a one-time resume link from your last visit (stored only in this browser).