This is a safety-critical control loop. The exciting demo is “GPT browses and runs Python”; the production question is how you stop exfiltration, cryptomining, prompt injection from a random webpage, or an infinite loop burning five figures of tokens overnight.
Architecture split. A planner/model proposes structured actions; a separate policy & runtime service validates JSON tool calls against JSON Schema, rate limits, and risk class before anything executes. The LLM should never be the final authority on side effects.
Browsing. Fetch through an HTTP proxy with domain allowlists or reputation checks; cap response bytes; strip scripts; convert HTML to safe text; store raw captures for forensics gated by retention policy. Treat all page text as untrusted instructions even if it looks like trivia.
Code execution. Run inside isolated microVMs/containers: no network by default, CPU+RAM+time quotas, tmpfs scratch only, stdout/stderr size caps. If network egress is required, use explicit egress profiles (pypi mirror only, allowlisted APIs) per job class.
Progress & kill switches. Persist state after each tool call so operators can resume or cancel. Enforce max tool steps, max wall time, max dollars per session. Emit structured traces for compliance (“who executed what code on which dataset”).
Human expectations. When the agent cannot satisfy policy, answer partially and document blocked steps—better than silent failure or sneaky workarounds.
Agent tool loop
flowchart TB
U[User goal] --> PL[Planner LLM]
PL --> T{Tool?}
T -->|browse| B[Sandbox browser]
T -->|code| C[Sandbox Python]
T -->|answer| A[Final response]
B --> PL
C --> PL