← Back to Blog
news
11 min read

Aurora Actions: User-Defined Background Automations for Incident Response

Aurora Actions let SRE teams define reusable background automations in natural language — triggered manually, on incident completion, or on a schedule.

By Noah Casarotto-Dinning, CEO at Arvo AI|

Key Takeaways

  • Aurora Actions are reusable, natural-language automations that Aurora's agent executes in the background using all 22+ connected integrations. Available today on the main branch of Aurora.
  • Three trigger types out of the box: manual ("run now"), on incident completion (chain follow-up work after every RCA), and recurring schedule (Celery Beat–driven intervals).
  • Same agent, same tools, different prompt scaffolding. Actions reuse Aurora's existing LangGraph agent and 30+ tools (kubectl, aws, gcloud, az, Terraform, Confluence, Slack, GitHub) — they just run as background chat sessions with eager-loaded skills and no RCA mandate.
  • /action <name> is a first-class chat primitive. Slash-command autocomplete in the chat input, "Run Action" dropdown on completed incidents, and full RBAC-gated CRUD UI in Settings.
  • Aurora Actions turn the agent into a programmable platform. This is the building block for CI/CD auto-remediation, scheduled audits, and post-incident health checks — covered in our CI/CD Auto-Remediation guide.

We shipped one of the most-requested features in Aurora's history: Aurora Actions — user-defined background automations that run on Aurora's agent. An Aurora Action is a named, natural-language instruction the user writes once and then triggers manually, on incident completion, or on a recurring schedule; Aurora's agent executes it as a background task with full access to every connected integration. Where traditional incident management tools force you to pick from a fixed catalog of "automations" (close incident, post to Slack, run runbook), Actions are written in plain English and inherit the full reasoning capability of the agent.

This post is for SRE and platform teams already running Aurora — or evaluating it — who want to understand what Actions actually do, where they fit on the agentic spectrum, and how to use them safely.

What is an Aurora Action?

An Aurora Action has four parts:

  1. A name — used as the slash-command handle (/action <name>) and as the dropdown label on incident cards.
  2. A natural-language instruction — the prompt the agent will execute. The same instruction the user would type into chat, except it can reference incident context placeholders when triggered post-incident.
  3. A trigger type — manual, on-incident-completion, or on-schedule (interval-based via Celery Beat).
  4. An on/off toggle — actions can be disabled without deletion, with full RBAC for who can create, edit, or trigger them.

The implementation is a thin layer over Aurora's existing chat agent. When an Action triggers, the executor service creates a background chat session with the action's instruction as the user message, runs it through the same LangGraph workflow that powers interactive chat, and persists the run history. The agent has full tool access (kubectl, cloud CLIs, Terraform, Slack, GitHub, Confluence, Memgraph, Weaviate) and eager-loaded skills — the only differences from interactive chat are scaffolded prompts and the absence of any RCA mandate.

Why this matters

Most incident management automation today is workflow automation: PagerDuty fires, Slack channel is created, status page is updated, runbook link is posted. The "automation" is a directed graph of static actions. There is no reasoning, no investigation, no judgment. Tools like Rootly, FireHydrant, and incident.io are excellent at this — but they don't do anything an SRE wouldn't have to manually verify after the fact.

Aurora's bet has always been the opposite: automate the investigation itself. Aurora Actions extend that bet from one-shot incident investigations to recurring or post-incident workflows. A few concrete examples:

  • Noisy alert tuning — "Every Friday at 5pm, review which Datadog alerts fired more than 20 times this week with mean time-to-acknowledge over 10 minutes. Open a Terraform PR to widen the thresholds or move them to a warning channel."
  • Post-incident health check — "After every completed RCA, run a 15-minute observation on the affected service: check error rate, p99 latency, and pod restart count. Post results to #incident-followup."
  • Scheduled infrastructure audit — "Every Monday at 9am, audit IAM roles in the production AWS account that have not been used in 90 days. List candidates for removal in a Confluence page."

None of these are runbook automation. Each requires the agent to query infrastructure, reason about results, and produce a structured output. Each one was previously the job of an on-call engineer doing follow-up between pages.

Where Actions sit on the agentic capability spectrum

In our Open-Source AI SRE comparison, we proposed a four-level spectrum for AI SRE capability. Actions don't change the level — they change when the agent runs.

When the agent runsTriggerPre-Actions exampleWith Actions
On alertWebhook from PagerDuty / Datadog / GrafanaAurora investigates the alert and produces an RCASame — investigation flow is unchanged
On user requestEngineer asks a question in chatAurora answers using toolsSame — plus /action <name> shortcuts
After every incidentIncident state transitions to "resolved"Postmortem generated; engineer manually does follow-up checksAction runs automatically with incident context in scope
On a scheduleCelery Beat cronNo equivalent — required external scheduler + custom codeSingle source of truth: agent runs the prompt on cadence

The post-incident and scheduled triggers are the genuinely new capability. Before Actions, anything recurring or post-incident required gluing Aurora to an external scheduler, an external prompt store, and bespoke trigger code. Actions collapse all three into the product surface.

How Actions work under the hood

This is for the technically curious. A few architecturally interesting things from the implementation:

1. Background chat sessions, not a separate runtime. When an Action triggers, the executor service creates a regular chat session with the action's instruction as the seed message and dispatches it as a background Celery task. The agent doesn't know it's running an Action — it just runs the workflow. This means every capability the interactive agent has (tool calls, RAG, graph traversal, sub-agent orchestration) is available inside Actions for free.

2. Eager-loaded skills, no RCA mandate. Interactive chat lazy-loads skills based on the user message. Background actions eager-load all skills because there is no human to clarify ambiguity. The system prompt also strips the "your job is to find root cause" framing — Actions can do anything the agent can do, not just investigate.

3. RLS context is preserved. Aurora uses PostgreSQL row-level security for multi-tenancy. The executor explicitly sets RLS context (org_id, user_id) before running so background tasks see only their own org's data — even though they run under a service identity.

4. Stale run cleanup is integrated. Aurora's existing background-chat janitor already handles orphaned chat sessions from crashed pods. Action runs go through the same path, so a worker pod dying mid-action doesn't leave the run state inconsistent.

5. RBAC is enforced at the route layer. Action CRUD is gated by Aurora's Casbin-based RBAC. Org admins can restrict which roles can create or trigger actions — important because an Action with cloud-CLI access has real blast radius.

Trigger types in detail

Manual triggers

The simplest case. An admin creates the action, an engineer triggers it from the Actions page or via /action <name> in chat. Useful for codifying common operational tasks ("rotate ECS task definitions for service X", "scan Confluence for stale runbooks") into named, repeatable commands.

The chat integration is worth calling out: /action is implemented as an LLM tool call using the same pattern as Aurora's /rca slash command. The agent processes the action dispatch and then continues responding to the rest of the user's message — so you can write "kick off the IAM audit and tell me what changed since last week" and the agent will dispatch the audit action and answer your question in the same turn.

On-incident-completion triggers

When an incident transitions to "resolved", any action with this trigger type runs against the incident context. The incident's metadata, RCA, and timeline are available to the action's agent without the user having to paste anything in. This is the trigger that turns Aurora from a reactive tool ("investigate this page") into a continuous one ("investigate, then run health checks, then file the postmortem").

Scheduled triggers

Interval-based, driven by Celery Beat. Choose a cadence (every N minutes / hours / days), and the action runs without user involvement. This is the building block for the CI/CD auto-remediation and scheduled audit use cases — and it's why we're calling this post and the CI/CD Auto-Remediation guide sister posts.

What Actions don't do (and why)

A few capability decisions worth being explicit about:

  • No external webhook triggers in this release. We could have added "trigger on arbitrary webhook" but it overlaps with the existing alert-triggered investigation flow. We may add it if we see demand for triggers from systems that don't go through PagerDuty / Datadog / Grafana.
  • No agent-authored Actions yet. The agent can't create or modify Actions on its own. Self-modification is a serious security boundary; we'd want approval gating and audit logging before opening that door. (See our AI Agent kubectl Safety guide for the threat model.)
  • No conditional / DAG composition in this release. Actions are single-prompt for now. If you need a multi-step workflow, write a single prompt that describes the steps — the agent is good at sequencing. We'll add explicit composition if the natural-language form proves limiting.

Safety: what to think about before enabling

Every Action is a small program with access to your cloud environment. A few rules we use ourselves:

  1. Start read-only. Actions inherit Aurora's tool permissions. If your tool config restricts write actions (no kubectl apply, no aws ec2 terminate-instances), Actions inherit that posture. Keep it that way for the first few weeks.
  2. Use scheduled triggers conservatively. A daily audit is cheap. A 5-minute polling loop with cloud CLI calls is not. Watch the LLM bill.
  3. Audit who can create Actions. RBAC defaults to org-admin-only creation. Leave it there unless you have a clear reason to widen.
  4. Pin the model. Action prompts can be sensitive to model behavior. Pin a known-good model per action (gpt-5.5, claude-sonnet-4.6, opus-4.7, etc.) using Aurora's per-org model dropdown until you have confidence in cross-model stability.
  5. Review action runs weekly. Every action has a run-history view. Spend 10 minutes a week reading the agent's traces for your scheduled actions — anomalous reasoning is the leading indicator of prompt drift or tool drift.

How to ship your first Action

A six-step recipe.

1. Pick a recurring task you currently do manually

Anything you do every week or after every incident. Examples: stale-PR review, alert-noise audit, on-call handover summary. The smaller and more deterministic, the better for v1.

2. Write the prompt as if you were typing it into chat

Don't translate to "automation language." Write it the way you would write a chat message to a smart junior SRE. "Look at..." "Check whether..." "Open a PR that..."

3. Create the Action with a manual trigger

Settings → Actions → New Action. Paste the prompt, set trigger = manual, leave it disabled if you want to review before enabling. Trigger it once and watch the run.

4. Inspect the run trace

Click the run in the history view. Read every tool call. Look for: tool misuse (wrong cloud account), excessive tool calls (3 attempts at the same thing), hallucinated paths or resource IDs. Iterate on the prompt until the trace is clean for three consecutive runs.

5. Promote to the right trigger type

If the action makes sense after every incident → on-incident-completion. If it's a routine sweep → on-schedule with the longest cadence that still meets your need. Only use short cadences when you have a clear cost and blast-radius understanding.

6. Add it to your team's incident review

Treat agent runs the same way you treat human runs: include them in your weekly incident review. Look for actions that produced wrong output, actions that nobody read the output of, and actions that produced output nobody acted on. Delete or downgrade as needed.

Aurora Actions vs traditional incident-management automation

The category most people compare us to is "workflow automation in incident-management SaaS" — Rootly, FireHydrant, incident.io. The comparison is informative but ultimately category-different:

CapabilityAurora ActionsRootly / FireHydrant / incident.io workflows
AuthoringNatural languageDSL or visual builder
ReasoningYes — LLM agentNo — fixed conditional graph
Tool reachCloud CLIs, kubectl, Terraform, Slack, Confluence, GitHub, RAG, infra graphSlack, status pages, Zoom, runbook links, ticket creation
Scheduled executionYes (Celery Beat)Limited (some support timed reminders)
Post-incident chainingYes — full incident context availableYes — but limited to workflow actions
Open sourceYes (Apache 2.0, self-hosted)No
PricingFree (self-hosted; LLM tokens only)Per-user SaaS

The honest framing: traditional incident-management tools automate the process around the incident. Aurora Actions automate what happens inside the agent. Both have value; they cover non-overlapping work. If you live in PagerDuty and use Rootly for incident channels, Aurora Actions sit alongside that — they don't replace it.

What's next

Aurora Actions is the foundation for several capabilities on our roadmap:

  • DAG composition — explicit multi-step Action chains where each step is itself an Action.
  • Approval gates — Actions that pause for human approval before destructive tool calls (already supported in chat; explicit Action-level gating coming).
  • CI/CD auto-remediation hooks — first-class integration with GitHub Actions, Jenkins, and ArgoCD so a failing pipeline becomes a triggered Aurora investigation. (Background and detailed write-up in our CI/CD Auto-Remediation guide.)
  • Action marketplace — community-contributed Actions you can install with one click. Bring-your-own prompt store.

We'll publish each of these as they ship.

Get Aurora

Aurora is fully open source under Apache 2.0. Self-host with Docker Compose or Helm. Actions ship in the next tagged release after aurora-oss-1.2.15 (April 15, 2026); the feature is available on main today.

Aurora
Product
Aurora Actions
Automation
Incident Response
AI SRE
LangGraph

Frequently Asked Questions

Try Aurora for Free

Open source, AI-powered incident management. Deploy in minutes.