Treat realtime as bounded latency ETL, not magic: upstream events (S3 ObjectCreated, Kafka doc topics, SaaS webhooks, SCIM-driven group changes) normalize into a single ingestion job envelope with tenant_id, source revision, content hash, and ACL hints so downstream steps never guess identity.
Queue discipline. Use partitioned streams per tenant or shard to preserve ordering for the same document id; cap depth and shed with priority classes so one noisy customer cannot starve SLA tenants. Dead-letter with replay after connector fixes.
Pipeline stages. Fetch → virus scan / size caps → parse (possibly multi-format) → dedupe → chunk → embed batch → upsert vectors + row metadata. Emit stage timestamps to a single “edit→searchable” lag histogram your dashboard actually reads.
Backpressure. When embed workers saturate, slow producers via consumer lag or push back on webhooks (retry with Retry-After). Prefer degrading batch breadth before dropping correctness (never silently skip ACL writes).
Failure semantics. Distinguish transient vendor 429/5xx from permanent auth revocation; surface connector health in an ops UI so support can say “SharePoint token expired” not “search feels weird.”
Stream ingestion
flowchart LR
EV[Upload event] --> Q[Queue]
Q --> W[Parse embed]
W --> V[(Vector DB)]
W --> META[(Metadata)]