← Back to Blog
comparison
9 min read

Keep vs Aurora: Open Source Alert Management vs AI Investigation

Keep correlates and routes alerts, Aurora investigates them. Compare these two open-source AIOps tools and learn which layer your team actually needs.

By Noah Casarotto-Dinning, CEO at Arvo AI|

Key Takeaways

  • Keep is an open-source AIOps and alert-management platform (MIT-licensed, around 11.9k GitHub stars) that deduplicates, correlates, and routes alerts and runs workflow-as-code remediation, per the Elastic acquisition announcement.
  • Keep's AI correlation is fully AI-driven clustering of alerts into incidents, and per Keep's own docs it is available only on Keep Cloud and Keep Enterprise On-Premises, not the open-source tier.
  • Keep does not perform autonomous root-cause investigation. Its AI correlator assigns alerts to incidents, it does not query your infrastructure or run commands, per Keep's AI correlation docs.
  • Aurora is an open-source (Apache-2.0) AI SRE that autonomously investigates: it runs kubectl, aws, az, and gcloud in sandboxed Kubernetes pods across AWS, Azure, GCP, OVH, Scaleway, and Kubernetes, then builds an evidence chain and a root-cause analysis.
  • Keep and Aurora are complementary, not competitors: Keep is the alerting and workflow plane, Aurora is the investigation plane. Aurora ingests alerts via webhook from eleven monitoring connectors and a Slack bot.
  • Keep was acquired by Elastic in May 2025; the core stays MIT-licensed, but the roadmap now follows Elastic's Search AI platform.

Does Keep do root-cause analysis, or just correlation? The short answer: Keep correlates. It groups related alerts into incidents and routes them, and on its paid tiers it uses an AI model to cluster alerts more intelligently. It does not open a terminal, query your cloud accounts, or build an evidence chain. That investigation step is exactly what Aurora is built for. This guide compares the two honestly, because in most real stacks you want both.

What is Keep?

Keep is the open-source alert-management and workflow layer for your monitoring stack. It unifies alerts behind a single pane of glass, deduplicates and correlates events to cut noise, and automates remediation with workflow-as-code plus a no-code visual builder, as described in the Elastic and Keep announcement.

Keep started as a Y Combinator W23 company and raised a 2.5 million dollar pre-seed round announced by Y Combinator. The project is MIT-licensed (an enterprise edition lives in a separate ee/ directory) and has grown to roughly 11.9k stars on GitHub. It ships bidirectional integrations with more than 110 tools, listed on the Keep homepage, including Datadog, Grafana, Prometheus, PagerDuty, Jira, and CloudWatch.

The headline feature is workflows. A Keep workflow is a declarative YAML file, very much like GitHub Actions for your monitoring tools, triggered by alerts, incidents, schedules, or manual runs. That makes Keep a genuinely strong choice for the routing and automation plane: deduplicate the noise, group what is related, and fire a remediation workflow or a ticket.

In May 2025, Keep was acquired by Elastic, announced on May 7 and completed later that month. The open-source core remains MIT-licensed, but the product direction is now tied to Elastic's Search AI platform, which is worth weighing if you care about long-term independence.

Does Keep do root-cause analysis or just correlation?

Keep does correlation, not autonomous root-cause investigation. This is the literal question most teams are asking, so it is worth being precise.

Keep's AI correlation engine, per Keep's own documentation, trains a per-tenant model on your historical alerts and incidents, then runs in cycles (each iteration completing in roughly 5 to 15 minutes) that cluster unassigned alerts and assign them to incidents once a confidence score clears a threshold. In Keep's words, it 'intelligently classifies new alerts and assigns them to appropriate incidents.'

That is correlation done well. It is not investigation. The AI correlator does not open a shell, it does not run kubectl or aws commands, it does not read your Kubernetes events or query your cloud control plane, and it does not produce a why. It tells you these twelve alerts are the same incident. It does not tell you the incident was caused by a failed node pool autoscale that throttled a downstream service.

There is a second important detail: per Keep's AI correlation docs, AI correlation is available on Keep Cloud and Keep Enterprise On-Premises only. It is not in the open-source self-hosted tier. So if you self-host the free MIT build, you get rule-based correlation, deduplication, and workflows, but the AI clustering sits behind the paid plans. The open-source rules engine still works well, it is just not the AI feature.

What is Aurora?

Aurora is the autonomous investigation layer. It is an open-source (Apache-2.0) AI SRE that, when an incident fires, runs the diagnostic commands a human on-call would run, gathers the evidence, and writes up the root cause.

Built by Arvo AI, Aurora uses LangGraph-orchestrated agents that execute kubectl, aws, az, and gcloud inside sandboxed Kubernetes pods, across AWS, Azure, GCP, OVH, Scaleway, and Kubernetes. It builds a Memgraph-backed infrastructure knowledge graph to reason about blast radius, generates a structured root-cause analysis with remediation recommendations, exports postmortems to Confluence, Notion, or SharePoint, and can suggest code fixes or open a pull request. Destructive actions are human-gated, so the agent investigates freely but waits for approval before it changes anything that matters. For more on how the write side works, see our guide to Aurora Actions and background automations.

Aurora is self-hosted and air-gapped capable, with bring-your-own-LLM support through Ollama, so your incident data and your model can stay inside your network. It ingests alerts via webhook from eleven monitoring connectors (PagerDuty, Datadog, Grafana, New Relic, OpsGenie, Netdata, Dynatrace, Coroot, ThousandEyes, BigPanda, and incident.io) plus a Slack bot. Notice that Datadog, Grafana, and PagerDuty appear on both Keep's integration list and Aurora's connector list. That overlap is the point: Keep can be the thing that routes the alert, and Aurora can be the thing that investigates it.

Keep vs Aurora head to head

The cleanest way to think about it: Keep manages the alert, Aurora explains the incident. They sit at different layers of the same pipeline.

Keep is the better tool when your problem is alert volume and routing. If you have alerts pouring in from a dozen sources, on-call engineers drowning in duplicates, and a need for declarative remediation workflows, Keep is purpose-built for that and is a mature, well-starred project. For a broader look at this layer, see our open-source incident-management guide.

Aurora is the better tool when your problem is the toil of the investigation itself. The pages got routed fine, but now a human has to SSH around three clouds at 3 a.m. to figure out what broke. Aurora automates that diagnostic loop and produces the evidence chain and the postmortem. Where Keep's AI clusters alerts, Aurora's agents run commands and read the actual system state.

DimensionKeepAurora
LicenseMIT, with an enterprise edition in a separate ee/ directory (source)Apache 2.0, fully open source (source)
Primary jobAlert management: dedupe, correlate, route, workflow-as-codeAutonomous investigation: run commands, build evidence chain, write RCA
Multi-cloud reachIntegrates with cloud monitoring tools, but does not execute against cloud control planesRuns kubectl, aws, az, gcloud across AWS, Azure, GCP, OVH, Scaleway, Kubernetes
Investigation vs correlationCorrelates alerts into incidents; no autonomous investigation (docs)Investigates: queries infra, traces blast radius, generates root-cause analysis
Write and execute actionsWorkflow-as-code remediation and no-code workflows (source)Sandboxed command execution, suggests code fixes, opens PRs, destructive actions human-gated
AI feature pricing modelAI correlation is Cloud and Enterprise only, not in open source (docs); no public per-incident rateFree and open source; BYO-LLM, so you pay only your own model and infra costs
Self-host and air-gapSelf-hostable open-source core; AI features need Cloud or Enterprise On-PremisesSelf-hosted and air-gapped capable with Ollama

Can Keep and Aurora run together?

Yes, and that is the recommended pattern. They occupy different planes, so there is no real conflict. Keep is the alerting and workflow layer that catches the noise, deduplicates it, correlates it into incidents, and routes it. Aurora is the investigation layer that takes a routed incident and figures out why it happened.

A practical wiring looks like this: your monitoring tools feed Keep, Keep correlates and routes, and a Keep workflow (or the monitoring tool directly) fires a webhook into Aurora to kick off the investigation. Aurora then runs its diagnostics, builds the evidence chain, posts the root cause to your incident channel, and stages a fix. If you want the remediation step automated end to end, our CI/CD auto-remediation guide covers how the write side fits into a delivery pipeline.

This is the same complementary story that applies to alert routing tools in general. A routing layer decides who gets paged and when; it does not investigate. Aurora sits on top of whatever routing layer you run.

Which should you choose?

Choose based on which half of the problem hurts more, and in most mature stacks the answer is both.

Choose Keep if your pain is alert noise, routing, and remediation workflows, and you want an open-source, self-hostable hub for a many-source monitoring stack. Just budget for the fact that the AI correlation feature is a paid tier and that the roadmap now runs through Elastic.

Choose Aurora if your pain is the investigation toil, you operate across more than just Kubernetes, and you want an open-source agent that actually executes diagnostics and produces a root cause rather than a cluster of alerts. Aurora is genuinely vendor-neutral and free, with BYO-LLM, which matters for cost control and data residency.

Run both if you want the full pipeline: Keep for correlation and routing, Aurora for the why. For how Aurora compares to other open-source investigation tools specifically, see our Aurora vs HolmesGPT vs K8sGPT comparison.

The honest framing is that Keep and Aurora are not really substitutes. Keep is a strong open-source alert-management platform with a real workflow engine, and Aurora is an open-source investigation agent. Picking one over the other usually means you only had one half of the problem. If you have both, run both.

Keep
Aurora
open source
AIOps
alert management
root cause analysis
AI SRE

Frequently Asked Questions

Try Aurora for Free

Open source, AI-powered incident management. Deploy in minutes.