LLM system safe for regulated industries (banking)
A top-tier bank wants an internal AI assistant for employees—not a consumer chatbot. Compliance is not a footer slide: the system must not invent regulatory numbers, must cite every claim, must not leak PII across sessions, and must produce an immutable audit trail for every answer.
Scenario
Design an LLM system that is safe for regulated industries.
A top-tier bank wants to deploy an internal AI assistant. It must never hallucinate regulatory figures, must cite sources for every claim, must never leak PII across user sessions, and must be auditable for every response it generates.
Design the full architecture including guardrails, audit logging, and citation enforcement.
What you should be able to do after reading:
- Frame the problem as controlled generation on top of governed retrieval—not “ChatGPT behind VPN.”
- Separate session isolation, input/output guardrails, structured regulatory facts, and post-generation verification.
- Explain how citation enforcement and numeric verification block answers before they reach the user.
- Describe an audit record complete enough for model risk and internal audit replay.
Step 0 — How to open the session
- Who uses it? Branch staff, compliance analysts, engineers—each has different data scope and risk.
- What is “regulatory figure”? Capital ratios, reserve requirements, fee caps, disclosure deadlines—define the taxonomy.
- What sources are authoritative? Only approved policy corpus + regulator bulletins, or also email/Slack?
- What does “never hallucinate” mean operationally? Abstain, or only speak from verified structured fields?
- Retention and subpoena: how long audit logs live, who can read them, redaction rules.
Step 1 — Clarifying questions
| Area | Question | Why |
|---|---|---|
| Jurisdiction | US-only, EU GDPR, multi-entity? | Data residency and log handling |
| Actions | Read-only Q&A or initiate workflows? | Tool use multiplies audit and fraud risk |
| Customer data | Can the assistant see live account PII? | Often no in v1—policy docs only |
| Model hosting | On-prem, private VPC, or vendor API? | Contractual no-training, log prohibition |
| Human review | When must compliance approve an answer template? | High-risk intents get human-in-the-loop |
| SLA | Latency vs safety—block 30s for verification? | Sets synchronous vs async answer path |
Step 2 — The sixty-second answer
I would not let the LLM freestyle regulatory numbers. Authoritative limits and ratios live in a structured regulatory fact store (versioned, effective-dated). The assistant answers from RAG over approved documents plus lookup of structured facts for any number in the response.
Every user query runs through an orchestrator with session-scoped context only (no shared memory), input guardrails (PII detection, prompt injection, policy intent routing), then generation constrained to citation-tagged spans. A verification stage checks: each sentence has a citation, cited chunks support the claim, numeric tokens match the fact store, and outbound text is scrubbed for PII. Only then does the API return—and an append-only audit event captures prompts, retrieval ids, model versions, verifier results, and final text hash.
Phrase that lands well: “In banking, ‘safe LLM’ means deny by default—the default action when verification fails is no answer, not a softer guess.”
Step 3 — Non-negotiable requirements
| Requirement | Operational meaning | Architectural lever |
|---|---|---|
| No hallucinated regulatory figures | Rates, thresholds, dates must match approved source | Structured fact store + numeric verifier; abstain if no match |
| Cite every claim | No uncited sentences in user-visible answer | Structured output schema; citation linker; claim–evidence checker |
| No cross-session PII leak | User A’s customer data never influences User B | Stateless workers; per-session vault; no global chat memory |
| Full auditability | Replay any answer months later for exam | Immutable audit log + artifact store (chunks, hashes, versions) |
| Access control | Employee sees only what HR/role allows | Entitlements on retrieval; filter-first search |
| Data minimization | Logs useful for audit, not a second data lake of secrets | Tokenize PII in logs; store chunk ids not full customer rows |
Step 4 — High-level architecture
flowchart TB
subgraph client [Client]
UI[Internal assistant UI]
end
subgraph edge [Edge and identity]
GW[API gateway + mTLS]
IAM[Bank SSO / entitlements]
SESS[Session service - no shared memory]
end
subgraph guard_in [Input guardrails]
INJ[Injection / jailbreak filter]
PII_IN[PII detect - block or mask query]
ROUTE[Intent router - risk tier]
end
subgraph knowledge [Governed knowledge]
CORP[Approved document corpus]
FACT[Regulatory fact store]
RAG[ACL-aware hybrid retrieval]
end
subgraph gen [Controlled generation]
ORCH[Orchestrator]
LLM[Private LLM gateway]
STRUCT[Structured answer + citations]
end
subgraph guard_out [Output guardrails]
CITE[Citation completeness]
NUM[Numeric / date verifier]
PII_OUT[PII scrubber]
POL[Policy / tone classifier]
end
subgraph audit [Audit plane]
EVT[Append-only audit events]
ART[Artifact store - chunk snapshots]
SIEM[SIEM + compliance dashboards]
end
UI --> GW --> IAM --> SESS
SESS --> INJ --> PII_IN --> ROUTE
ROUTE --> ORCH
CORP --> RAG
FACT --> ORCH
RAG --> ORCH
ORCH --> LLM --> STRUCT
STRUCT --> CITE --> NUM --> PII_OUT --> POL
POL --> GW
ORCH --> EVT
RAG --> ART
STRUCT --> ART
EVT --> SIEM
Step 5 — Session isolation (PII must not cross users)
Design rules
- No global conversation store keyed only by user id that accumulates all past queries—compliance hates “the model remembered my client.”
- Session id per browser tab / shift; context window built from that session’s messages only, TTL 8–24 hours, encrypted at rest.
- Workers are stateless; session state in a dedicated store with strict ACL (session owner only).
- Retrieval is never conditioned on another user’s history. Optional “personal notes” feature is a separate encrypted bucket with explicit opt-in.
- Prompt assembly in a trusted service—not client-side—so users cannot inject another session id.
What you log vs what you keep in context
| Data | In LLM context | In audit log |
|---|---|---|
| Policy PDF chunks | Yes | Chunk ids + hash |
| Customer account number | Only if role allows; often blocked in v1 | Tokenized reference |
| Prior user’s chat | Never | N/A |
Step 6 — Governed knowledge layer
Approved document corpus
- Only documents with
approval_status=publishedandeffective_dateenter the index. - Versioning: Basel memo v3 vs v4 both exist; retrieval filters by
as_ofdate from query or user-selected “policy as of.” - Hybrid search (lexical + vector) with entitlements—same patterns as enterprise RAG.
Regulatory fact store (anti-hallucination for numbers)
Extract or manually curate structured records:
RegulatoryFact {
fact_id, jurisdiction, regulator,
metric: "LCR_minimum", value: 1.0, unit: "ratio",
effective_from, effective_to,
source_doc_id, source_page, approval_workflow_id
}
When the model mentions a ratio, threshold, or deadline, the numeric verifier resolves the span against this store (and cited chunk text). Mismatch → block response.
Step 7 — Guardrails (input, generation, output)
Input guardrails
| Check | Action |
|---|---|
| Prompt injection / exfiltration | Classifier + allowlist tools; strip “ignore previous instructions” patterns |
| PII in user query | Detect SSN/account patterns; refuse or mask before logging |
| High-risk intent | “Wire $1M”, “bypass AML”—route to block or human queue |
| Out-of-corpus questions | “What will the Fed do tomorrow?”—refuse; no open-web browse in v1 |
Generation constraints
- Force JSON or XML answer schema:
claims[]each withtext,citation_ids[],confidence. - System prompt: “If no supporting chunk, output
abstainwith reason code.” - Temperature low; no creative mode for regulatory tier.
Output guardrails (hard gate before user sees text)
- Citation completeness: every claim has ≥1 citation id present in retrieval set.
- Claim–evidence alignment: NLI or cross-encoder scores claim vs cited chunk; below threshold → drop claim or whole answer.
- Numeric verifier: regex + parser on amounts, %, dates; join to
RegulatoryFactand cited tables. - PII scrubber: block response if unexpected PII patterns appear in output.
- Policy classifier: no investment advice, no legal conclusions presented as fact.
Step 8 — Citation enforcement (not optional formatting)
Citations are a release gate, not UI decoration.
Pipeline
- Retrieval returns chunks with stable
chunk_id,doc_version, page, deep link. - Model emits claims bound to
citation_idsonly from that set. - Citation linker validates ids exist and were in the prompt context.
- Support checker verifies paraphrase against chunk text (entailment score).
- Renderer shows footnotes; audit stores the same mapping.
Failure modes
| Failure | User experience | Audit |
|---|---|---|
| Missing citation | “Unable to verify answer”—generic safe message | VERIFY_FAIL:UNCITED_CLAIM |
| Citation id forged | Blocked | VERIFY_FAIL:INVALID_CITE |
| Supported but weak | Show answer with “low confidence” banner or human review queue | Score stored |
| Numeric mismatch | Blocked; offer link to source doc only | VERIFY_FAIL:NUMERIC_DRIFT |
Step 9 — Audit logging (every response, exam-ready)
One audit event per assistant turn, append-only (WORM storage or ledger table with hash chain).
AuditEvent {
event_id, timestamp_utc,
user_id_hash, session_id, entitlements_snapshot,
query_redacted, query_hash,
risk_tier, intent_label,
retrieval: { chunk_ids[], scores[], corpus_version },
model: { provider, model_id, prompt_template_version },
generation: { raw_structured_output_hash },
verification: { cite_pass, numeric_pass, pii_pass, scores[] },
response_redacted, response_hash,
latency_ms_per_stage
}
Artifact store
- Snapshot retrieved chunk text at answer time (or content hash + pointer)—so policy updates do not rewrite history.
- Retain 7 years (typical banking archive policy—confirm with compliance).
- Access: internal audit, model risk, security—role-based, dual control for bulk export.
What you deliberately do not log
- Full customer PII in plaintext.
- Vendor “training” telemetry—contract must prohibit retention on their side too.
Step 10 — Risk tiers and human escalation
| Tier | Examples | Path |
|---|---|---|
| Low | “Where is the travel policy?” | Auto answer after verification |
| Medium | “Summarize this compliance memo” | Citations required; optional sample review |
| High | Capital treatment, AML scenarios, customer-specific advice | Human approval or hard abstain + link to source |
| Blocked | Fraud enablement, credential harvesting | Refuse; security alert |
Step 11 — Failure points and mitigations
| Failure | Impact | Mitigation |
|---|---|---|
| Stale policy in index | Wrong regulatory answer | Effective-dated corpus; nightly diff alerts; UI “as of” date |
| Verifier timeout | Pressure to ship unverified text | Fail closed; async “answer pending review” |
| Session fixation / IDOR | Cross-user data exposure | Bind session to SSO token; server-side session only |
| Prompt injection via doc body | “Ignore policies” in PDF | Sanitize ingest; treat docs as data not instructions |
| Over-logging | PII in SIEM | Redact + tokenize; separate security vs compliance views |
| Model upgrade | Regression in citation format | Golden-set eval gate; shadow traffic before promotion |
| Employee pastes customer PII | Leak to logs/vendor | Input block; train UI warnings; DLP on egress |
| “Almost right” number | Exam finding | Structured fact store; no numbers without verification pass |
Step 12 — Model risk and compliance hooks
- Model inventory: register LLM, embed model, verifiers with owners and validation dates.
- Offline eval: hundreds of gold Q&A from compliance; metrics: citation accuracy, numeric exact match, abstain rate on unanswerable.
- Online monitoring: verifier fail rate, abstain rate, thumbs-down → weekly review with compliance.
- Incident response: kill switch per model version; replay audit events for affected window.
Step 13 — MVP phasing (defensible scope)
- Phase 1: Policy Q&A only, no live customer data, citations + audit + abstain.
- Phase 2: Regulatory fact store + numeric verifier for ratios and deadlines.
- Phase 3: Role-scoped customer context with field-level tokenization and extra human review tier.
Shipping Phase 3 before Phase 1 verification exists is how banks get exam findings.
Step 14 — How to walk through this in a design session
- 3 min — requirements table (four non-negotiables).
- 5 min — architecture diagram: identity → guardrails → knowledge → verify → audit.
- 8 min — citation + numeric verification path (fail closed).
- 7 min — session isolation and what never goes in shared memory.
- 7 min — audit event schema and retention.
- 5 min — failure matrix + MVP phases.
- Close — “We optimize for provable correctness, not fluent prose.”
Step 15 — Goals → knobs
| Goal | Knob |
|---|---|
| Safer numbers | Stricter fact store match; block all free-form numerics |
| Higher coverage | More corpus sources—each adds approval workflow cost |
| Lower latency | Pre-verify common queries; cache verified answers with TTL |
| Stronger audit | Chunk snapshots; longer retention; hash-chained events |
| Less PII risk | No customer context in v1; aggressive input/output DLP |
The one line to remember
A bank-safe assistant is a verified reporting pipeline, not a chatbot: retrieve from approved sources, generate with mandatory citations, prove every number and claim before display, isolate every session, and log enough to replay the decision under audit—when verification fails, silence beats fluency.