Interview ready · Design · Section 15

Product roadmap & GTM

Fifteen staff-depth scenarios on shipping LLM products as businesses: outcome-weighted prioritization; north-star + guardrail metrics; usage-aware pricing; vendor-exit strategies; moats beyond the model; gated rollout trains; sales engineering truth; adoption and change management; executive expectation setting; build-vs-buy; OSS compliance; developer platform versioning; humane deprecation; regional GTM under divergent AI law; and institutional responsible-AI review.

Interview stance. Product questions reward candidates who connect model capabilities to business outcomes and organizational readiness. Roadmaps without eval gates, pricing without COGS, and GTM without security artifacts are fantasy diagrams.

Every bet needs a kill criterion—panels love intellectual honesty.
Differentiation is workflow + trust + economics, rarely raw model name.
Executive alignment means teaching limits, not only celebrating wins.
Deprecations and regional laws are first-class releases, not footnotes.

221. How would you prioritize LLM features on a product roadmap against non-AI work?

Outcome link. Tie candidates to measurable deltas: support deflection, sales cycle length, engineer merge time—not ‘AI coolness.’ Rank by confidence × impact ÷ (cost + risk).

Dependencies. Data connectors, eval harnesses, and safety rails often block flashy demos—surface infrastructure bets explicitly.

Portfolio. Mix quick wins (prompt tweaks) with platform investments (routing, tracing); all-AI-quarter leaves no foundation.

Stakeholders. PM owns outcome; ML platform owns feasibility; Legal flags early if vertical requires new policy pack.

Kill criteria. Predefine ‘stop rules’ if offline eval or pilot metrics miss—prevents sunk-cost fallacy.

LLM roadmap phases

flowchart LR
  D[Discover] --> P[Pilot]
  P --> S[Scale]
  S --> O[Optimize]

222. How would you define north-star and guardrail metrics for a customer-facing copilot?

Primary. Task success rate or ‘job done without human’ depending on product—avoid raw token volume as vanity.

Guardrails. Harm rate, PII leakage incidents, cost per success, escalation to human—any one breaching stops launches.

Leading vs lagging. Retrieval zero-hit rate leads CSAT—instrument both.

Segmentation. New vs power users; enterprise vs SMB—aggregate hides broken cohorts.

Culture. Publish definitions so sales cannot cherry-pick demo metrics that differ from prod dashboards.

223. How would you design pricing and packaging for an LLM-heavy SaaS (seats, usage, tiers)?

Unit economics. Map COGS: inference, storage, support load from failures—price so gross margin survives p95 heavy users.

Dimensions. Seat + included token bucket + overages; optional ‘automation’ add-on for agents; enterprise commits for discounts.

Transparency. Customer usage dashboards reduce invoice disputes and accelerate upsell when they see value.

Anti-abuse. Rate limits align with price tier; education on batch vs interactive.

Experiments. Simulate revenue under viral adoption; guard bankruptcy-by-whale scenarios.

224. How would you manage strategic dependence on a single LLM API provider?

Abstraction. Provider-agnostic gateway with normalized schemas—switching cost measured in engineer-weeks, not years.

Dual-vendor readiness. Periodic parity eval on secondary lane; contractual secondary spend allocation.

Financial. Hedge with committed use discounts balanced against exit flexibility—finance owns the option value.

Features. Avoid exotic proprietary-only capabilities on critical path unless product differentiates there.

Narrative. Board slides show mitigations concretely, not ‘we trust OpenAI.’

225. How would you differentiate an LLM product when competitors can call the same frontier models?

Moats. Proprietary workflow integrations, compliance posture, vertical data + eval depth, latency/reliability SLOs—not secret prompts.

Data flywheel. High-quality human corrections improving retrieval and adapters—permissioned and governed.

UX trust. Citations, undo, escalation paths beat generic chat skin.

Total cost. Efficient stack serving mid-market price points competitors cannot match at equal quality.

Interview punchline. ‘Model commoditizes; systems around the model compound.’

226. How would you structure internal dogfood → pilot customers → general availability for a major LLM release?

Gates. Each phase has entry/exit criteria: safety suite pass, cost envelope, support readiness.

Dogfood. Employees sign enhanced telemetry consent; rapid fix loops before external reputation risk.

Pilots. Written success metrics with champion customers; weekly office hours Engineering+CS.

GA. Runbooks, SLO pages, and on-call rotation staffed—not ‘launch and vibes.’

Rollback. Feature flags revert to prior model within minutes; comms templates ready.

227. How would you equip sales engineering teams to sell complex LLM platforms without overpromising?

Truth decks. Capability matrix with ‘supported / roadmap / custom SOW’ labels—never improvised in live calls.

Reference architectures. CISO-ready diagrams: data flows, residency, retention, subprocessors.

POC scaffold. 2-week bounded pilot playbook with dataset checklist and success metrics—prevents endless science projects.

Legal sync. Pre-approved answers on training data use and indemnities; escalate edge cases early.

Feedback loop. SE-reported losses feed product priorities quantitatively.

228. How would you drive user adoption and change management when rolling out AI assistants internally or to customers?

Jobs-to-be-done training. Short videos tied to actual workflows—not generic ‘prompt tips.’

Champions. Identify power users per team; incentive structure aligns with healthy usage not gaming tokens.

Trust. Show limitations upfront; celebrate corrections when AI wrong—builds psychological safety.

Support. Tier-1 macros referencing trace ids; feedback captured for product.

Measurement. Adoption funnel by role; intervene when managers block tool for fear.

229. How would you manage executive expectations and hype cycles around LLM capabilities?

Education cadence. Quarterly deep dives with live failure demos—executives who never see misses overfund fantasies.

Language. Replace ‘AGI soon’ with dated milestones tied to eval harnesses.

Budget guardrails. Cap exploratory spend without business metric; require hypothesis per dollar.

Wins & lessons. Balanced steering committee readout: what shipped, what failed, why.

Escalation. CTO path to push back on science-fair mandates without career risk.

230. How would you decide build vs buy for LLM gateways, eval tools, vector DBs, and orchestration frameworks?

Core vs context. Build where differentiation or compliance demands it; buy commodity wiring—revisit yearly as categories mature.

Total cost. Engineer salary + on-call vs vendor invoice + integration tax.

Velocity. Early stage buys speed; scale stage may justify in-house cost optimization layer.

Risk. Vendor SOC2 vs DIY audit burden for regulated tenants.

Lock-in. Prefer portable abstractions even when buying—migration drills annually.

231. How would you handle open-source dependencies and attribution for LLM stacks (models, libraries, licenses)?

License hygiene. Automated SBOM or dependency scans blocking AGPL/GPL surprises in distributed products.

Model cards. Credit upstream base weights; comply with commercial use flags.

Contribution policy. CLA for employee OSS; avoid mixing corporate and personal commits negligently.

Security. Pin versions; monitor CVEs for inference servers and tokenizers.

Community. When publishing research artifacts, clarify support expectations.

232. How would you design a developer platform and API ecosystem around your LLM product?

DX. Consistent auth, idempotency, typed SDKs, sandbox keys with realistic fixtures.

Docs. Runnable examples and error code encyclopedia—most integrations fail on opaque 400s.

Versioning. Long-lived API versions with sunset dates; changelog tied to model deprecations.

Partners. Marketplace listing criteria: security review, support SLAs, revenue share clarity.

Health. Status page scoped to API; synthetic external probes customers can mirror.

233. How would you sunset or deprecate an LLM feature without breaking customer trust?

Notice. Multi-month deprecation with rationale, data export, and migration guides—not surprise 404s.

Alternatives. Offer successor path or human-assisted workaround; pricing credits if replacement delayed.

Telemetry. Identify remaining users early; proactive outreach with success manager.

Engineering. Feature flag ramp down; archive models but retain read-only logs if contract requires.

Postmortem. Why feature failed informs roadmap—document for org learning.

234. How would you adapt go-to-market and support for LLM products across regions with different AI regulations?

Readiness matrix. Per-country availability flags: features, models, data residency, marketing claims.

Sales enablement. Region-specific objection handling from local counsel—not one global deck.

Support. Language coverage and escalation to multilingual SMEs for regulated answers.

Pricing. FX, tax, and local compute premiums reflected transparently.

Launch discipline. No ‘flip DNS worldwide’ launches without regional sign-off checklist.

235. How would you institute a responsible-AI or ethics review process inside a fast-moving product org?

Mandate. Cross-functional RAI council: Legal, Security, ML, Product, ERG representation with decision SLA.

Triggers. New verticals, biometric use, automated decisions affecting livelihood, public-facing claims of medical/financial advice.

Artifacts. DPIA-style questionnaires + mitigations logged per release train.

Balance. Avoid bureaucracy theater—time-box reviews; escalate only high-risk paths.

Transparency. External RAI pages summarize principles and contact—reduces sales friction.

Recap — this section

Q	Takeaway
221	Outcome-weighted scoring; infra dependencies visible; balanced portfolio; explicit kill rules.
222	Success + harm + cost SLOs; leading retrieval signals; cohort honesty; single metric glossary.
223	COGS-grounded SKUs; hybrid seat+usage; visible meters; tier-aligned limits; stress-test pricing model.
224	Gateway indirection; warm secondary lane; finance hedging; feature portability discipline.
225	Integration + compliance + eval moats; governed feedback loops; trust UX; unit-cost advantage.
226	Criteria-gated phases; instrumented dogfood; pilot scorecards; GA ops readiness; instant rollback.
227	Labeled capability truth; security artifacts; bounded POC; pre-cleared legal FAQs; loss-to-roadmap pipeline.
228	Workflow-tied enablement; champion network; honest capability framing; trace-aware support; funnel analytics.
229	Reality-based exec reviews; milestone literacy; budget hypothesis discipline; balanced scorecards.
230	Strategic core heuristic; TCO with ops burden; stage-appropriate; compliance mapping; portability drills.
231	SBOM + license gates; upstream attribution; CLA process; pinned secure deps; support boundaries.
232	First-class SDKs + sandbox; actionable errors; explicit API lifecycles; vetted marketplace; public API health.
233	Long-horizon comms; migration + export; active holdout outreach; contractual log retention; honest postmortem.
234	Per-region product matrix; localized legal enablement; language-capable support; honest pricing; gated launches.
235	Empowered cross-functional RAI; clear review triggers; documented mitigations; time-boxed process; public principles.

← Section 14 · This section · Design hub · Section 16 →