AI Agent kubectl Safety: Sandboxed Execution for Production
Giving an AI agent kubectl access is an architecture decision. OWASP threats, k8s-sigs/agent-sandbox, gVisor, and Aurora's pod-isolated execution model.
Key Takeaways
- Giving an AI agent kubectl access is an architecture decision, not a permission flag. Per-permission gates fail under prompt injection.
- OWASP ranks "Excessive Agency" as LLM06 in the 2025 Top 10 for LLM Applications and "Tool Misuse and Exploitation" as ASI02 in the 2026 Top 10 for Agentic Applications.
- The Kubernetes ecosystem already has an answer: k8s-sigs/agent-sandbox provides a declarative API for isolated agent runtimes using gVisor or Kata Containers.
- Real precedent exists. EchoLeak (CVE-2025-32711), CVSS 9.3, was the first publicly documented zero-click prompt-injection data exfiltration in a production LLM system. The kubectl analogue would be cluster-wide.
- Aurora runs every
kubectlcommand in a pod-isolated process via itsterminal_runprimitive, with an environment-variable allowlist that strips secrets, signature-matcher and LLM-judge guardrails, and per-invocation cloud credentials.
Of the 46+ products marketed as "AI SRE" in 2026, only a handful publicly document their kubectl execution architecture — and the gap between vendors that handle this well and vendors that handle it badly is the single largest unspoken risk in the category. AI agent kubectl safety is the architectural discipline of letting an AI agent run kubectl (or any cloud CLI) against production without inheriting cluster-wide blast radius if the agent is compromised. It is not the same as RBAC scoping, and it is not the same as a human approval prompt — both are necessary but neither is sufficient on its own.
When OWASP published its 2025 Top 10 for LLM Applications, it ranked Prompt Injection (LLM01) as the top risk and Excessive Agency (LLM06) as one of the most consequential — defining it across three root causes: excessive functionality, excessive permissions, and excessive autonomy. In December 2025, OWASP followed up with a dedicated Top 10 for Agentic Applications that names Tool Misuse and Exploitation (ASI02) and Identity and Privilege Abuse (ASI03) as primary attack surfaces.
Translation: if you give an AI agent the ability to run kubectl, aws, or gcloud commands against production, you have a security architecture problem — not a permissions problem. This guide walks through the threat model, the emerging Kubernetes sandboxing standard, and how to evaluate any AI SRE on its kubectl safety.
What can go wrong when AI agents run kubectl?
Any LLM-driven agent that executes commands inherits the security properties of the LLM, the harness, and the runtime. Three real-world precedents illustrate the failure modes:
- EchoLeak (CVE-2025-32711) — Microsoft 365 Copilot, CVSS 9.3 critical, patched in June 2025. Discovered by Aim Security, it was the first publicly documented zero-click indirect prompt-injection data exfiltration in a production LLM system. A crafted email sat in Outlook; when the user later asked Copilot for an unrelated summary, the email's hidden instructions fired and exfiltrated SharePoint, OneDrive, and Teams data. Research paper: arXiv:2509.10540.
- MITRE ATLAS prompt-injection techniques — MITRE ATLAS catalogues real-world adversary techniques against AI systems, including indirect prompt injection that turns an LLM with tool access into an attacker-controlled execution surface. The framework specifically documents techniques for exfiltration via AI agent tool invocation.
- Agent Session Smuggling — Palo Alto Unit 42 (November 2025) demonstrated rogue agents exploiting trust in the Agent-to-Agent (A2A) protocol with multi-turn manipulation. Documented in OWASP's Agentic Top 10.
None of these specifically targeted kubectl-running agents in production — but the class is the same and the blast radius would be larger. An agent that can run kubectl delete is one prompt-injection payload away from a cluster-wide outage.
The Four Attack Surfaces of Agentic kubectl
Most teams think of kubectl agent safety as a single problem ("can the agent be tricked?"). It's actually four distinct attack surfaces, each requiring its own mitigation.
| Surface | Failure mode | Why permission-scoping alone fails | Mitigation |
|---|---|---|---|
| 1. Prompt injection | Hidden instructions in logs, alerts, runbooks, or chat coerce the agent | Compromised agent acts within its granted permissions, which is exactly what permission-scoping permits | Sandboxed runtime; never trust LLM output derived from data the LLM read |
| 2. Credential leakage | Executed command reads AWS_SECRET_ACCESS_KEY, VAULT_TOKEN, KUBECONFIG from inherited env | Permissions live on credentials; if the credential leaks, the permission set leaks with it | Per-invocation short-lived credentials (STS, Service Principal); explicit env allowlist that strips secrets |
| 3. Blast radius escalation | Legitimate command runs against wrong namespace, region, or cluster | Permissions don't model "right action, wrong target" | Default read-only; dependency-graph awareness; human approval for destructive writes |
| 4. Audit trail gaps | Logs capture commands without the agent's reasoning | Permission systems audit "who ran what," not "why" | Per-investigation transcripts that link reasoning → tool calls → outputs |
Attack Surface 1: Prompt injection
The agent reads a log line, alert payload, runbook, or chat message that contains hidden instructions. The LLM cannot reliably distinguish data from instructions in the same channel — this is the fundamental property OWASP's LLM01 captures. Even frontier models do not eliminate it. Anthropic has publicly stated that "no browser agent is immune to prompt injection" and publishes defense benchmarks showing measurable but imperfect attack-prevention rates across computer-use, bash tool use, and MCP workflows. The implication for kubectl-running agents is clear: the LLM is not the security boundary. The runtime is.
Mitigation: never trust LLM output that originates from data the LLM also read. Sandbox the execution layer so even a successful injection has limited blast radius.
Attack Surface 2: Credential leakage
If the agent runs commands with credentials inherited from the host process environment (AWS_SECRET_ACCESS_KEY, KUBECONFIG, VAULT_TOKEN), a successful command-injection or shell escape exposes everything the agent process has access to. Long-lived static credentials make this catastrophic.
Mitigation: per-invocation credential scoping. AWS STS AssumeRole, Azure Service Principal sessions, GCP short-lived tokens. Strip everything else from the child process environment with an explicit allowlist.
Attack Surface 3: Blast radius escalation
Even legitimate, non-injected commands can have outsized effects. kubectl delete pod on the wrong namespace. aws ec2 terminate-instances against a misidentified region. The agent doesn't need to be compromised — it just needs to be wrong.
Mitigation: read-only by default, write actions behind explicit human approval, and dependency-graph awareness so the agent can compute blast radius before acting. (This is the role of Aurora's Memgraph dependency graph.)
Attack Surface 4: Audit trail gaps
When an investigation runs across 20+ tool invocations, traditional audit systems (CloudTrail, Kubernetes audit logs) record what was run but not why. A reviewer six months later cannot tell whether a kubectl scale was a legitimate response to a load spike or an injected instruction.
Mitigation: structured per-investigation transcripts that capture agent reasoning alongside tool calls. The right log isn't "kubectl was run" — it's "in response to alert X, the agent hypothesized Y, ran kubectl Z, and observed W."
Why "human approval" alone is not enough
The most common safety story in the AI SRE space is "the agent suggests; humans approve." That is necessary but not sufficient.
The problem with approval gates as the only line of defense:
- Decision fatigue. An agent that handles 50 alerts a week generates dozens of approval prompts. Humans rubber-stamp.
- Approval ≠ understanding. Engineers approve commands they don't fully understand because the agent's reasoning sounds plausible.
- Injected intent looks legitimate. A prompt-injection payload can produce a recommendation that reads exactly like a normal RCA. The approver has no signal that the underlying instruction came from an attacker.
Approval gates are critical, but they need to sit on top of an already-sandboxed runtime — not be the only protection.
Permission scoping vs sandboxed execution: what's the difference?
These two terms get conflated. They aren't the same thing.
Permission scoping restricts what an agent's identity can do. RBAC roles, IAM policies, kubeconfig contexts. It's necessary, but it operates at the cluster-API layer — meaning a successful prompt injection can still use every permission the agent has.
Sandboxed execution isolates the runtime in which commands execute. If the agent's process is compromised, the sandbox limits what the compromised process can do regardless of the credentials it holds. The compromised process can't read other pods' files, can't reach other nodes, can't escalate to the host kernel.
The defensible architecture combines both: tight permission scoping (small RBAC role, short-lived credentials) + runtime isolation (sandboxed execution).
How sandboxed kubectl actually works
The Kubernetes ecosystem standardized on this pattern in 2025–2026.
k8s-sigs/agent-sandbox
k8s-sigs/agent-sandbox is a formal Kubernetes SIG Apps subproject that launched at KubeCon Atlanta in November 2025. It provides a declarative Kubernetes API for "isolated, stateful, singleton workloads" — built specifically for AI agent runtimes that may execute untrusted, LLM-generated code.
Core CRDs:
Sandbox— an isolated pod-equivalent with stronger boundariesSandboxTemplate— reusable configurationSandboxClaim— request a sandbox for a workloadSandboxWarmPool— pre-created sandboxes that bring cold-start under one second
The Kubernetes blog post from March 2026 makes the architectural claim explicit: "Isolation achieved via runtime-level sandboxing (gVisor/Kata), not just container-level namespaces."
gVisor
gVisor is a Google-maintained user-space application kernel that provides kernel-level isolation without full virtualization. Architecture: Sentry (a kernel emulator written in Go) intercepts roughly 200 Linux syscalls; Gofer brokers filesystem access over 9P. The OCI runtime is runsc, drop-in compatible with runc.
gVisor runs in production at Google for App Engine standard, Cloud Functions, Cloud Run, and Cloud ML Engine. GKE Sandbox productizes it for GKE node pools. It is one of two named isolation backends in agent-sandbox (the other being Kata Containers, which uses lightweight VMs).
Why this matters for AI SRE
An AI SRE that runs kubectl against production is exactly the kind of workload agent-sandbox was built for. It executes LLM-generated commands. It needs file system isolation, syscall isolation, and per-invocation credential scoping. It benefits enormously from a warm pool that reduces cold-start latency.
If you are evaluating an AI SRE in 2026, this is one of the right questions to ask: what isolation backend does the agent use when it executes commands?
How Aurora's pod-isolated execution works
Aurora's approach predates agent-sandbox and follows the same architectural principles.
When Aurora's agent runs a kubectl, aws, az, or gcloud command, it doesn't use subprocess.run() directly. It uses an internal primitive called terminal_run, defined in server/utils/terminal/terminal_run.py. The module's docstring is explicit:
Drop-in replacement for subprocess.run() that executes in terminal pods. This module provides a terminal_run() function that mimics subprocess.run() API but executes commands in isolated terminal pods via kubectl exec. Safety guardrails (signature matcher + LLM judge) run automatically unless the caller passes
trusted=Truefor known-safe internal operations.
Three properties matter:
1. Pod-isolated execution. When the ENABLE_POD_ISOLATION flag is set (the default in Kubernetes deployments), every external command runs inside a separate terminal pod via kubectl exec. The agent's own process never executes the command directly. A successful command-injection in the agent's reasoning loop does not give an attacker access to the agent host.
2. Two-stage safety guardrails. Before any non-trusted command runs, two checks fire automatically: a deterministic signature matcher that rejects known-dangerous patterns, and an LLM judge that evaluates the proposed command against the investigation context. The trusted=True flag bypasses both — used only for known-safe internal operations like configured connector calls.
3. Sanitized environment allowlist. Aurora's terminal_exec_tool module defines an explicit _SAFE_ENV_KEYS set: PATH, HOME, USER, SHELL, TERM, LANG, TMPDIR, SSL_CERT_FILE, plus ENABLE_POD_ISOLATION itself. Everything else — including VAULT_TOKEN, DATABASE_URL, SECRET_KEY, and any cloud credentials — is stripped from the child process environment. A compromised command cannot read the agent's secrets via env.
Cloud credentials are handled separately. Aurora calls generate_contextual_access_token and generate_azure_access_token per invocation. AWS uses STS AssumeRole via cross-account roles (aurora-cross-account-role.yaml) — short-lived credentials, not long-lived access keys. Azure uses Service Principal sessions. GCP uses OAuth-derived tokens.
For agents that need to reach customer Kubernetes clusters Aurora can't access directly, a separate kubectl-agent binary deploys via Helm into the customer's cluster and connects outbound over WebSocket. No inbound network access required, no kubeconfig sharing, no static credentials at rest.
How to evaluate an AI SRE's kubectl safety model
Eight questions to ask any AI SRE vendor or open-source project before enabling production access:
- Where does the command actually execute? Same process as the agent? Same host? Separate container? Sandboxed runtime (gVisor/Kata)?
- What credentials does the command inherit from the host environment? Specifically: can the executed command read your agent's vault token, database URL, or other host secrets?
- Are credentials short-lived or static? STS / Service Principal sessions, or long-lived access keys?
- Is the default read-only? What flag, configuration, or RBAC role enables write access?
- What happens between "agent decides to run X" and "X runs"? Is there a deterministic policy check? An LLM judge? A human approval prompt? All three?
- Are destructive actions specifically gated? What's the definition of "destructive" — vendor-defined or operator-configurable?
- What does the audit trail capture? Just the commands, or the agent's reasoning + the commands together?
- What's the blast radius of a single successful prompt injection? Walk through the worst case explicitly with the vendor.
If a vendor can't answer these clearly, the architecture isn't ready for production write access.
Open questions in 2026
This is a young problem space. Several questions are not yet resolved:
- Standardization. k8s-sigs/agent-sandbox is the leading candidate for a standard, but Knative Sandbox, container-level approaches, and microVM-based runtimes (Firecracker) are all in play.
- Multi-cloud isolation. Sandboxing a Kubernetes pod is a solved problem. Sandboxing a process that calls
aws,az,gcloudacross cloud APIs from a single agent is harder — the credentials and trust boundaries change per provider. - Approval UX at scale. Engineers can't approve 200 actions per week. The right UI for batch approval, policy-based pre-approval, and rollback-only autonomy is still being figured out.
Expect significant movement on all three through 2026 and into 2027.
Aurora's approach in summary
If you operate an AI SRE in production, the safety questions are non-negotiable. Aurora's answer is: pod-isolated execution by default, deterministic + LLM-judge guardrails before any non-trusted command, environment-variable allowlist that strips secrets, per-invocation cloud credentials via STS/Service Principal/short-lived tokens, and human approval for destructive write operations. The full architecture is open source under Apache 2.0 — auditable in the Aurora repository.
For background on the agent and tool model, see our complete guide to AI SRE, our open-source AI SRE comparison, or our explainer on agentic incident management.