Interview ready · Design · Section 16

Governance & responsible AI

Fifteen staff-depth scenarios on keeping LLM systems lawful and trusted: enterprise AI governance RACI; MRM-style tiering; harm-focused incident response; production fairness monitoring; carbon accounting; IP posture across data and outputs; workforce-safe tool policies; vendor diligence scorecards; aggressive log minimization; adverse-action explainability; cross-border regulatory cells; insurance and indemnity patterns; concise board risk reporting; regulatory exam readiness; and open-weight vs API policy in high-assurance environments.

Interview stance. Governance questions separate architects who ship once from leaders who keep shipping lawfully. Show inventories, tiered controls, audit evidence, and honest residual risk—not a single ‘ethics checklist’ PDF.

Treat models like regulated software + data: version, validate, monitor, retire.
Fairness work starts with lawful definitions and slice metrics, not vibes.
Incidents need forensics and comms choreography, not only rollback.
Climate, IP, and labor are now board-visible—have an adult answer.

236. How would you design an enterprise AI governance program spanning legal, security, risk, and engineering?

RACI. Clear ownership: who approves new models, data sources, and customer-facing claims; escalation to exec risk committee for novel harms.

Policy library. Living documents for acceptable use, data classes, retention, human-review thresholds—version-controlled like code.

Inventory. Central registry of all models (vendor + fine-tuned), data flows, and DPIAs—no shadow fine-tunes.

Rhythm. Monthly risk review with metrics; quarterly tabletop exercises for incident response.

Enablement. Self-service checklist for product teams reduces ‘ticket to nowhere’ friction while keeping gates.

Governance layers

flowchart TB
  P[Policies] --> R[Risk reviews]
  R --> I[Inventory]
  I --> O[Ops audits]

237. How would you apply model risk management principles (similar to banking MRM) to large language models in production?

Tiers. Classify use cases by materiality—customer-facing credit decisions demand deeper controls than internal email assist.

Documentation. Model cards + validation reports + monitoring plan + contingency procedures for each tier.

Independence. Validation team separate from builders for high-tier models; periodic third-party review for critical paths.

Monitoring. Drift, misuse, and outcome fairness metrics with board reporting.

Limitations. Honest residual risk statements with compensating controls, not checkbox ‘approved.’

238. How would you run incident response when an LLM causes real-world harm (toxic output, bad advice, data leak)?

Triage. Sev levels by impact; immediate disable feature flag or model route if ongoing damage possible.

Forensics. Immutable logs with trace ids; legal hold; PR+Legal joint comms review before external statements.

Customer care. Scripted responses, credits policy, regulatory notification timelines where required.

Root cause. Distinguish model regression vs data poisoning vs prompt injection vs deployment bug—different fixes.

Follow-up. Public postmortem summary when appropriate; update eval harness with new regression tests.

239. How would you monitor and mitigate bias and fairness issues in LLM-powered decisions at scale?

Definition. Agree with legal/compliance on protected attributes and fairness metrics relevant to jurisdiction—avoid hand-wavy ‘de-bias the AI.’

Slicing. Automated eval dashboards per demographic proxy slices where ethically usable; watch disparity in escalation or task success.

Mitigations. Retrieval filters (inclusive sources), calibration layers, or human review for sensitive cohorts.

Feedback. User appeals channel with SLA; corrections feed training governance.

Limits. Document where automation is inappropriate—some decisions remain human-only by policy.

240. How would you account for and reduce the carbon footprint of training and inference for LLM products?

Measurement. GPU-hour telemetry × region grid factors; separate training burst from steady inference burn.

Efficiency. Routing, caching, quantization, and right-sized models beat greenwashing offsets alone.

Procurement. Prefer low-carbon grids or PPAs for training clusters when brand promises exist.

Transparency. Publish methodology; avoid precise claims you cannot defend in audits.

Product. Offer ‘eco mode’ with slightly reduced quality if customers value sustainability KPIs.

241. How would you navigate intellectual property issues for LLM inputs, outputs, and training corpora?

Contracts. Customer agreements specify who owns prompts, outputs, derivative fine-tunes; default posture explicit.

Copyright. Avoid ingesting unlicensed scraped content where jurisdiction unsettled; maintain takedown process.

Outputs. Disclose limitations on enforceability; indemnity posture aligned with legal appetite.

Open weights. Track license chains; attribute and comply with share-alike terms.

Process. Legal review for high-risk ingestion pipelines—not post-hoc panic.

242. How would you craft internal workforce policies for employee use of consumer LLM tools versus approved enterprise tools?

Guardrails. Block paste of secrets into unapproved tools; provide sanctioned alternative with logging.

Training. Quarterly security modules with realistic exfil scenarios—not punitive-only tone.

Detection. DLP signals on sensitive repos; compassionate enforcement with manager coaching first strike.

Productivity. Approved internal copilot should be better than shadow ChatGPT to win adoption.

Labor. Discuss impacts on roles openly with HR; reskilling budgets tied to automation wins.

243. How would you perform due diligence on a new LLM API or open-weight vendor before enterprise adoption?

Security. SOC2, pen test summaries, data processing agreement, subprocessors, incident history.

Resilience. SLA credits, multi-region posture, published uptime honest with exclusions.

Model lineage. Training data claims, eval transparency, known limitations documented.

Exit. Data export, model portability, notice period for breaking changes.

Scorecard. Quantitative rubric compared across vendors—avoids politics-only decisions.

244. How would you implement data minimization and retention policies specifically for LLM prompt/response logs?

TT L tiers. Hot logs days, warm aggregates months, cold archives years only if legally necessary—defaults aggressive trim.

Redaction. Structured PII scrub on ingest; optional no-content logs (metadata only) modes.

Regional. EU vs US retention schedules differ; automate geo-routed pipelines.

Erasure. Tie into subject-rights workflows; prove deletion including embeddings referencing PII.

Audit. Retention config in terraform; drift detection if someone widens TTL silently.

245. How would you handle explainability and adverse-action notice requirements when LLMs inform high-stakes decisions?

Scope. Legal defines which decisions require notices—credit, housing, employment varies by law.

Content. Provide principal reasons in plain language plus human appeal path; avoid raw logits as ‘explanation.’

Evidence. Tie decision to retrieved or structured inputs auditable later—no vibes.

Human. Override workflow with logging when agent disagrees with model suggestion.

Testing. Compliance simulations before launch in each regulated jurisdiction.

246. How would you architect cross-border AI systems to respect diverging AI Acts, executive orders, and data localization rules?

Regional cells. Inference + storage + logging co-located; control plane automation enforces routing pins.

Model allowlists. Some weights disallowed in regions—enforce in gateway, not tribal knowledge.

Transfers. SCCs, DPAs, impact assessments when analytics must aggregate globally—document residual risk.

Feature gaps. Product surfaces honest ‘unavailable here’ states; sales cannot promise parity.

Updates. Legal monitors regulatory diffs; engineering ships config templates not emergency hacks.

247. How would you approach insurance, indemnities, and liability limits for customer-facing LLM products?

Contract. Liability caps aligned with realistic loss scenarios; carve-outs for gross negligence or IP infringement as counsel advises.

Cyber insurance. Disclose AI-assisted services to carrier; coverage for privacy and outage events.

Vendor pass-through. Map vendor indemnity for model IP claims vs your obligations to customers.

Disclaimers. UX copy matches legal stance—marketing cannot outrun contract.

Scenarios. Tabletop loss estimating with finance annually.

248. How would you report AI risks and performance to a board of directors without overwhelming or understating?

Scorecards. Few KPIs: incidents, near-misses, fairness disparities, spend trajectory, key model changes.

Context. Compare quarter-over-quarter; spotlight emerging regulations affecting roadmap.

Materiality. Escalate new use cases crossing risk tier thresholds before press learns.

Education. Short primers on limits of metrics—boards need intuition not false precision.

Cadence. Pre-read + focused Q&A; avoid 200-slide dumps monthly.

249. How would you prepare systems and documentation for regulatory examination or customer audit of AI practices?

Evidence packets. Prebuilt exports: data maps, model inventory, access logs sample, change management records.

Reproducibility. Frozen configs for production models with checksums; who approved promotions logged.

Drills. Internal mock audits quarterly; time-to-produce answers measured.

Ownership. Named audit DRI per domain—no scramble spreadsheets.

Gaps. Track findings in risk register with remediation SLAs.

250. How would you set corporate policy for use of open-weight models versus API-only vendors in regulated environments?

Risk tiering. Air-gapped open weights may be only option; cloud APIs forbidden—document formally.

Supply chain. Verify checksums, provenance, and license; block unknown download mirrors.

Maintenance. OS model patching is your problem—budget SRE + ML ops accordingly.

Red teams. Additional testing for unknown fine-tunes merged into community checkpoints.

Hybrid. Allow APIs for dev, weights for prod in defense sector patterns—consistent enforcement.

Recap — this section

Q	Takeaway
236	RACI + policy-as-code; model/data registry; recurring risk cadence; scalable self-service gates.
237	Materiality tiers; independent validation; comprehensive model dossiers; exec monitoring; residual risk honesty.
238	Severity-based kill switches; forensic traces; coordinated comms; taxonomy of root causes; public learning loop.
239	Jurisdiction-aware fairness definitions; sliced metrics; layered mitigations; appeals with SLA; automation limits.
240	Attributable energy accounting; efficiency first; credible renewable procurement; modest public claims; eco SKU.
241	Clear IP assignments; licensed corpora; output disclaimers; license-chain hygiene; proactive legal review.
242	Approved-tool superiority; DLP + education; humane enforcement; HR-aligned reskilling; kill shadow IT demand.
243	Security + SLA + lineage dossier; exit terms; comparative scorecard; facts over brand.
244	Aggressive TTL defaults; redact-on-write; geo schedules; erasure proofs; infra-as-code retention.
245	Lawyer-scoped notice triggers; human-readable principal reasons; auditable inputs; override audit trail.
246	Data-plane regionalization; gateway-enforced allowlists; lawful transfer paperwork; honest feature parity UX.
247	Counsel-aligned caps; cyber policy disclosure; vendor indemnity mapping; consistent UX/legal; loss drills.
248	Small credible KPI set; QoQ trajectory; proactive materiality escalations; board literacy investments.
249	Prepackaged audit artifacts; immutable promotion trail; mock exams; named DRIs; tracked remediations.
250	Risk-tier routing; supply-chain attestation; self-owned ops costs; extended red-team; environment-specific policy.

← Section 15 · This section · Design hub · Section 17 →