Is it safe to give an AI agent kubectl access to production?

It can be — with the right architecture. Look for sandboxed execution (separate pods or runtime-level isolation via gVisor or Kata Containers), per-invocation credential scoping (not inherited from the host process environment), default read-only with explicit human approval for write actions, and structured audit trails that capture agent reasoning alongside commands. Permission gates alone are not sufficient because they fail under prompt injection.

What is k8s-sigs/agent-sandbox?

k8s-sigs/agent-sandbox is a formal Kubernetes SIG Apps subproject that launched at KubeCon Atlanta in November 2025. It provides a declarative Kubernetes API (CRDs: Sandbox, SandboxTemplate, SandboxClaim, SandboxWarmPool) for isolated AI agent runtimes. Isolation is delivered via gVisor or Kata Containers rather than container-level namespaces. As of v0.2.1 it includes secure-by-default networking and warm pools that reduce cold-start to under one second.

What is gVisor and how does it relate to AI agent safety?

gVisor is a Google-maintained user-space application kernel that intercepts Linux syscalls and provides kernel-level isolation without requiring full virtualization. The OCI runtime runsc is drop-in compatible with runc. gVisor is one of the two named isolation backends in k8s-sigs/agent-sandbox (the other is Kata Containers). It's used in production at Google for App Engine, Cloud Functions, Cloud Run, and Cloud ML Engine, and is productized for GKE via GKE Sandbox.

What is OWASP Excessive Agency (LLM06)?

Excessive Agency is the sixth entry in the OWASP 2025 Top 10 for LLM Applications. OWASP defines it through three root causes: excessive functionality (the agent has access to tools it doesn't need), excessive permissions (the tools have rights they don't need), and excessive autonomy (the agent acts without human oversight). It's the canonical OWASP entry for the security risks of giving an AI agent the ability to run real commands on real systems.

Has any AI agent been prompt-injected in production?

Yes. The most prominent documented case is EchoLeak (CVE-2025-32711), a CVSS 9.3 critical vulnerability in Microsoft 365 Copilot patched in June 2025 — the first publicly documented zero-click indirect prompt-injection data exfiltration in a production LLM system. MITRE ATLAS documented OpenClaw, a research demonstration of indirect prompt injection turning an LLM into a command-and-control implant. Palo Alto Unit 42 demonstrated Agent Session Smuggling against the A2A protocol in November 2025.

Why is human approval not enough on its own?

Three reasons. Decision fatigue: agents that handle 50 alerts per week generate dozens of approvals, and humans rubber-stamp. Approval doesn't equal understanding: engineers approve commands they don't fully understand. Injected intent looks legitimate: a prompt-injection payload can produce a recommendation that reads like a normal RCA, with no signal to the approver that the underlying instruction came from an attacker. Approval gates are necessary but must sit on top of a sandboxed runtime, not replace it.

What's the difference between permission scoping and sandboxed execution?

Permission scoping restricts what the agent's identity can do — RBAC roles, IAM policies, kubeconfig contexts. Sandboxed execution restricts what the agent's runtime can do, regardless of identity — file system isolation, syscall isolation, network policies. A successful prompt injection still has every permission the agent's identity carries, which is why permission scoping alone fails. The defensible architecture combines tight permission scoping (small RBAC role, short-lived credentials) with runtime isolation (sandboxed execution).

What questions should I ask an AI SRE vendor about kubectl safety?

Eight questions: (1) Where does the command actually execute — same process, same host, separate container, sandboxed runtime? (2) What credentials does the command inherit from the host environment? (3) Are credentials short-lived or static? (4) Is the default read-only and what enables write? (5) What happens between agent decision and command execution — policy check, LLM judge, human approval, all three? (6) Are destructive actions specifically gated and how is destructive defined? (7) Does the audit trail capture reasoning alongside commands? (8) What's the worst-case blast radius of a single successful prompt injection?

Are commercial AI SREs safer than open-source ones?

Not inherently. Commercial SaaS AI SREs typically run agents in their own infrastructure, which means your incident telemetry leaves your perimeter — a different risk profile, not a smaller one. Open-source AI SREs let you audit the actual sandboxing architecture in source. Both models can be safe; both can be unsafe. The architecture matters more than the licensing model.

AI Agent kubectl Safety: Sandboxed Execution for Production

Q: How does Aurora sandbox kubectl execution?

Aurora's terminal_run primitive runs every external command in a separate terminal pod via kubectl exec when ENABLE_POD_ISOLATION is set (the default in Kubernetes deployments). Two automated safety checks run before any non-trusted command: a deterministic signature matcher and an LLM judge that evaluates the command against the investigation context. The agent process strips all secrets from child environments using an explicit _SAFE_ENV_KEYS allowlist — VAULT_TOKEN, DATABASE_URL, SECRET_KEY, and cloud credentials never reach executed commands. Cloud credentials are generated per-invocation via STS AssumeRole, Service Principal sessions, and short-lived tokens.

Key Takeaways

Giving an AI agent kubectl access is an architecture decision, not a permission flag. Per-permission gates fail under prompt injection.

OWASP ranks "Excessive Agency" as LLM06 in the 2025 Top 10 for LLM Applications and "Tool Misuse and Exploitation" as ASI02 in the 2026 Top 10 for Agentic Applications.

The Kubernetes ecosystem already has an answer: k8s-sigs/agent-sandbox provides a declarative API for isolated agent runtimes using gVisor or Kata Containers.

Real precedent exists. EchoLeak (CVE-2025-32711), CVSS 9.3, was the first publicly documented zero-click prompt-injection data exfiltration in a production LLM system. The kubectl analogue would be cluster-wide.

Aurora runs every kubectl command in a pod-isolated process via its terminal_run primitive, with an environment-variable allowlist that strips secrets, signature-matcher and LLM-judge guardrails, and per-invocation cloud credentials.

Of the 46+ products marketed as "AI SRE" in 2026, only a handful publicly document their kubectl execution architecture — and the gap between vendors that handle this well and vendors that handle it badly is the single largest unspoken risk in the category. AI agent kubectl safety is the architectural discipline of letting an AI agent run kubectl (or any cloud CLI) against production without inheriting cluster-wide blast radius if the agent is compromised. It is not the same as RBAC scoping, and it is not the same as a human approval prompt — both are necessary but neither is sufficient on its own.

When OWASP published its 2025 Top 10 for LLM Applications, it ranked Prompt Injection (LLM01) as the top risk and Excessive Agency (LLM06) as one of the most consequential — defining it across three root causes: excessive functionality, excessive permissions, and excessive autonomy. In December 2025, OWASP followed up with a dedicated Top 10 for Agentic Applications that names Tool Misuse and Exploitation (ASI02) and Identity and Privilege Abuse (ASI03) as primary attack surfaces.

Translation: if you give an AI agent the ability to run kubectl, aws, or gcloud commands against production, you have a security architecture problem — not a permissions problem. This guide walks through the threat model, the emerging Kubernetes sandboxing standard, and how to evaluate any AI SRE on its kubectl safety.

What can go wrong when AI agents run kubectl?

Any LLM-driven agent that executes commands inherits the security properties of the LLM, the harness, and the runtime. Three real-world precedents illustrate the failure modes:

EchoLeak (CVE-2025-32711) — Microsoft 365 Copilot, CVSS 9.3 critical, patched in June 2025. Discovered by Aim Security, it was the first publicly documented zero-click indirect prompt-injection data exfiltration in a production LLM system. A crafted email sat in Outlook; when the user later asked Copilot for an unrelated summary, the email's hidden instructions fired and exfiltrated SharePoint, OneDrive, and Teams data. Research paper: arXiv:2509.10540.
MITRE ATLAS prompt-injection techniques — MITRE ATLAS catalogues real-world adversary techniques against AI systems, including indirect prompt injection that turns an LLM with tool access into an attacker-controlled execution surface. The framework specifically documents techniques for exfiltration via AI agent tool invocation.
Agent Session Smuggling — Palo Alto Unit 42 (November 2025) demonstrated rogue agents exploiting trust in the Agent-to-Agent (A2A) protocol with multi-turn manipulation. Documented in OWASP's Agentic Top 10.

None of these specifically targeted kubectl-running agents in production — but the class is the same and the blast radius would be larger. An agent that can run kubectl delete is one prompt-injection payload away from a cluster-wide outage.

The Four Attack Surfaces of Agentic kubectl

Most teams think of kubectl agent safety as a single problem ("can the agent be tricked?"). It's actually four distinct attack surfaces, each requiring its own mitigation.

Surface	Failure mode	Why permission-scoping alone fails	Mitigation
1. Prompt injection	Hidden instructions in logs, alerts, runbooks, or chat coerce the agent	Compromised agent acts within its granted permissions, which is exactly what permission-scoping permits	Sandboxed runtime; never trust LLM output derived from data the LLM read
2. Credential leakage	Executed command reads `AWS_SECRET_ACCESS_KEY`, `VAULT_TOKEN`, `KUBECONFIG` from inherited env	Permissions live on credentials; if the credential leaks, the permission set leaks with it	Per-invocation short-lived credentials (STS, Service Principal); explicit env allowlist that strips secrets
3. Blast radius escalation	Legitimate command runs against wrong namespace, region, or cluster	Permissions don't model "right action, wrong target"	Default read-only; dependency-graph awareness; human approval for destructive writes
4. Audit trail gaps	Logs capture commands without the agent's reasoning	Permission systems audit "who ran what," not "why"	Per-investigation transcripts that link reasoning → tool calls → outputs

Attack Surface 1: Prompt injection

The agent reads a log line, alert payload, runbook, or chat message that contains hidden instructions. The LLM cannot reliably distinguish data from instructions in the same channel — this is the fundamental property OWASP's LLM01 captures. Even frontier models do not eliminate it. Anthropic has publicly stated that "no browser agent is immune to prompt injection" and publishes defense benchmarks showing measurable but imperfect attack-prevention rates across computer-use, bash tool use, and MCP workflows. The implication for kubectl-running agents is clear: the LLM is not the security boundary. The runtime is.

Mitigation: never trust LLM output that originates from data the LLM also read. Sandbox the execution layer so even a successful injection has limited blast radius.

Attack Surface 2: Credential leakage

If the agent runs commands with credentials inherited from the host process environment (AWS_SECRET_ACCESS_KEY, KUBECONFIG, VAULT_TOKEN), a successful command-injection or shell escape exposes everything the agent process has access to. Long-lived static credentials make this catastrophic.

Mitigation: per-invocation credential scoping. AWS STS AssumeRole, Azure Service Principal sessions, GCP short-lived tokens. Strip everything else from the child process environment with an explicit allowlist.

Attack Surface 3: Blast radius escalation

Even legitimate, non-injected commands can have outsized effects. kubectl delete pod on the wrong namespace. aws ec2 terminate-instances against a misidentified region. The agent doesn't need to be compromised — it just needs to be wrong.

Mitigation: read-only by default, write actions behind explicit human approval, and dependency-graph awareness so the agent can compute blast radius before acting. (This is the role of Aurora's Memgraph dependency graph.)

Attack Surface 4: Audit trail gaps

When an investigation runs across 20+ tool invocations, traditional audit systems (CloudTrail, Kubernetes audit logs) record what was run but not why. A reviewer six months later cannot tell whether a kubectl scale was a legitimate response to a load spike or an injected instruction.

Mitigation: structured per-investigation transcripts that capture agent reasoning alongside tool calls. The right log isn't "kubectl was run" — it's "in response to alert X, the agent hypothesized Y, ran kubectl Z, and observed W."

Why "human approval" alone is not enough

The most common safety story in the AI SRE space is "the agent suggests; humans approve." That is necessary but not sufficient.

The problem with approval gates as the only line of defense:

Decision fatigue. An agent that handles 50 alerts a week generates dozens of approval prompts. Humans rubber-stamp.
Approval ≠ understanding. Engineers approve commands they don't fully understand because the agent's reasoning sounds plausible.
Injected intent looks legitimate. A prompt-injection payload can produce a recommendation that reads exactly like a normal RCA. The approver has no signal that the underlying instruction came from an attacker.

Approval gates are critical, but they need to sit on top of an already-sandboxed runtime — not be the only protection.

Permission scoping vs sandboxed execution: what's the difference?

These two terms get conflated. They aren't the same thing.

Permission scoping restricts what an agent's identity can do. RBAC roles, IAM policies, kubeconfig contexts. It's necessary, but it operates at the cluster-API layer — meaning a successful prompt injection can still use every permission the agent has.

Sandboxed execution isolates the runtime in which commands execute. If the agent's process is compromised, the sandbox limits what the compromised process can do regardless of the credentials it holds. The compromised process can't read other pods' files, can't reach other nodes, can't escalate to the host kernel.

The defensible architecture combines both: tight permission scoping (small RBAC role, short-lived credentials) + runtime isolation (sandboxed execution).

How sandboxed kubectl actually works

The Kubernetes ecosystem standardized on this pattern in 2025–2026.

k8s-sigs/agent-sandbox

k8s-sigs/agent-sandbox is a formal Kubernetes SIG Apps subproject that launched at KubeCon Atlanta in November 2025. It provides a declarative Kubernetes API for "isolated, stateful, singleton workloads" — built specifically for AI agent runtimes that may execute untrusted, LLM-generated code.

Core CRDs:

Sandbox — an isolated pod-equivalent with stronger boundaries
SandboxTemplate — reusable configuration
SandboxClaim — request a sandbox for a workload
SandboxWarmPool — pre-created sandboxes that bring cold-start under one second

The Kubernetes blog post from March 2026 makes the architectural claim explicit: "Isolation achieved via runtime-level sandboxing (gVisor/Kata), not just container-level namespaces."

gVisor

gVisor is a Google-maintained user-space application kernel that provides kernel-level isolation without full virtualization. Architecture: Sentry (a kernel emulator written in Go) intercepts roughly 200 Linux syscalls; Gofer brokers filesystem access over 9P. The OCI runtime is runsc, drop-in compatible with runc.

gVisor runs in production at Google for App Engine standard, Cloud Functions, Cloud Run, and Cloud ML Engine. GKE Sandbox productizes it for GKE node pools. It is one of two named isolation backends in agent-sandbox (the other being Kata Containers, which uses lightweight VMs).

Why this matters for AI SRE

An AI SRE that runs kubectl against production is exactly the kind of workload agent-sandbox was built for. It executes LLM-generated commands. It needs file system isolation, syscall isolation, and per-invocation credential scoping. It benefits enormously from a warm pool that reduces cold-start latency.

If you are evaluating an AI SRE in 2026, this is one of the right questions to ask: what isolation backend does the agent use when it executes commands?

How Aurora's pod-isolated execution works

Aurora's approach predates agent-sandbox and follows the same architectural principles.

When Aurora's agent runs a kubectl, aws, az, or gcloud command, it doesn't use subprocess.run() directly. It uses an internal primitive called terminal_run, defined in server/utils/terminal/terminal_run.py. The module's docstring is explicit:

Drop-in replacement for subprocess.run() that executes in terminal pods. This module provides a terminal_run() function that mimics subprocess.run() API but executes commands in isolated terminal pods via kubectl exec. Safety guardrails (signature matcher + LLM judge) run automatically unless the caller passes trusted=True for known-safe internal operations.

Three properties matter:

1. Pod-isolated execution. When the ENABLE_POD_ISOLATION flag is set (the default in Kubernetes deployments), every external command runs inside a separate terminal pod via kubectl exec. The agent's own process never executes the command directly. A successful command-injection in the agent's reasoning loop does not give an attacker access to the agent host.

2. Two-stage safety guardrails. Before any non-trusted command runs, two checks fire automatically: a deterministic signature matcher that rejects known-dangerous patterns, and an LLM judge that evaluates the proposed command against the investigation context. The trusted=True flag bypasses both — used only for known-safe internal operations like configured connector calls.

3. Sanitized environment allowlist. Aurora's terminal_exec_tool module defines an explicit _SAFE_ENV_KEYS set: PATH, HOME, USER, SHELL, TERM, LANG, TMPDIR, SSL_CERT_FILE, plus ENABLE_POD_ISOLATION itself. Everything else — including VAULT_TOKEN, DATABASE_URL, SECRET_KEY, and any cloud credentials — is stripped from the child process environment. A compromised command cannot read the agent's secrets via env.

Cloud credentials are handled separately. Aurora calls generate_contextual_access_token and generate_azure_access_token per invocation. AWS uses STS AssumeRole via cross-account roles (aurora-cross-account-role.yaml) — short-lived credentials, not long-lived access keys. Azure uses Service Principal sessions. GCP uses OAuth-derived tokens.

For agents that need to reach customer Kubernetes clusters Aurora can't access directly, a separate kubectl-agent binary deploys via Helm into the customer's cluster and connects outbound over WebSocket. No inbound network access required, no kubeconfig sharing, no static credentials at rest.

How to evaluate an AI SRE's kubectl safety model

Eight questions to ask any AI SRE vendor or open-source project before enabling production access:

Where does the command actually execute? Same process as the agent? Same host? Separate container? Sandboxed runtime (gVisor/Kata)?
What credentials does the command inherit from the host environment? Specifically: can the executed command read your agent's vault token, database URL, or other host secrets?
Are credentials short-lived or static? STS / Service Principal sessions, or long-lived access keys?
Is the default read-only? What flag, configuration, or RBAC role enables write access?
What happens between "agent decides to run X" and "X runs"? Is there a deterministic policy check? An LLM judge? A human approval prompt? All three?
Are destructive actions specifically gated? What's the definition of "destructive" — vendor-defined or operator-configurable?
What does the audit trail capture? Just the commands, or the agent's reasoning + the commands together?
What's the blast radius of a single successful prompt injection? Walk through the worst case explicitly with the vendor.

If a vendor can't answer these clearly, the architecture isn't ready for production write access.

Open questions in 2026

This is a young problem space. Several questions are not yet resolved:

Standardization. k8s-sigs/agent-sandbox is the leading candidate for a standard, but Knative Sandbox, container-level approaches, and microVM-based runtimes (Firecracker) are all in play.
Multi-cloud isolation. Sandboxing a Kubernetes pod is a solved problem. Sandboxing a process that calls aws, az, gcloud across cloud APIs from a single agent is harder — the credentials and trust boundaries change per provider.
Approval UX at scale. Engineers can't approve 200 actions per week. The right UI for batch approval, policy-based pre-approval, and rollback-only autonomy is still being figured out.

Expect significant movement on all three through 2026 and into 2027.

Aurora's approach in summary

If you operate an AI SRE in production, the safety questions are non-negotiable. Aurora's answer is: pod-isolated execution by default, deterministic + LLM-judge guardrails before any non-trusted command, environment-variable allowlist that strips secrets, per-invocation cloud credentials via STS/Service Principal/short-lived tokens, and human approval for destructive write operations. The full architecture is open source under Apache 2.0 — auditable in the Aurora repository.

For background on the agent and tool model, see our complete guide to AI SRE, our open-source AI SRE comparison, or our explainer on agentic incident management.