Design AI Systems

Open-ended AI architecture prompts: RAG, agents, inference, evaluation, safety, and production tradeoffs. Each guide starts from a blank whiteboard—requirements, napkin math, components, and what you would cut under pressure. For classic product interviews (URL shortener, feeds, payments), see How would you Design?.

Guide
RAG for financial PDFs

You’ll defend hybrid search plus a structured numeric index—and explain why vector-only retrieval loses millions on one missed figure.
Guide
Production RAG pipeline

You’ll design ingest through generation for 10M docs—PDF, Confluence, Slack, DB—and name every failure point and fix.
Guide
Real-time RAG sub-200ms

You’ll hit p95 under 200ms on 50M chunks—sharding, HNSW tuning, cache layers, and where you cut recall for speed.
Guide
Regulated LLM for banking

You’ll enforce citations, block unverified regulatory numbers, isolate sessions, and log every answer for audit replay.
Guide
Live knowledge · Trading desk

You’ll stream filings and news into hot indexes, invalidate caches by ticker, and never answer from a stale retrieval set.
Guide
Air-gapped defense assistant

You’ll run local LLMs and on-prem indexes at 99.9% uptime—signed update bundles, zero internet, classified RAG end to end.

RAG for financial PDFs

Production RAG pipeline

Real-time RAG sub-200ms

Regulated LLM for banking

Live knowledge · Trading desk

Air-gapped defense assistant