What is the best open-source alternative to HolmesGPT for multi-cloud teams?

Aurora is the strongest open-source alternative when your infrastructure spans more than Kubernetes. Per the Aurora repository, it natively investigates AWS, Azure, GCP, OVH, Scaleway, and Kubernetes, while HolmesGPT is Kubernetes-first and reaches clouds through toolsets and MCP servers. Both are Apache 2.0 and self-hosted, so the decision is about reach, not licensing.

Is HolmesGPT read-only, and can it run commands like Aurora?

HolmesGPT is read-only by design and respects Kubernetes RBAC, as stated on its GitHub README, which is a deliberate safety choice that keeps its blast radius small. It can open suggested-fix pull requests in Operator mode and runs read-only commands and queries, but it does not perform write or mutating cloud or cluster actions. Aurora runs kubectl, aws, az, and gcloud inside sandboxed Kubernetes pods, with destructive actions human-gated.

Is HolmesGPT a CNCF project?

Yes. HolmesGPT was accepted into the CNCF Sandbox in October 2025, according to the CNCF blog, and is maintained by Robusta with major contributions from Microsoft. If vendor-neutral governance is a priority for your organization, that is a genuine point in HolmesGPT's favor. Aurora is an independent Apache 2.0 project built by Arvo AI.

Can Aurora replace Grafana OnCall?

No, and it is not meant to. Grafana OnCall is alert routing, scheduling, and escalation, while Aurora is the AI investigation layer, so they are complementary. The grafana/oncall OSS repository was archived on 24 March 2026, so if you are migrating off it, point your new router toward Aurora rather than expecting Aurora to handle routing.

How does Aurora compare to Dell APEX AIOps, formerly Moogsoft?

Dell APEX AIOps Incident Management, formerly Moogsoft, is actively maintained and strong at event correlation and noise reduction, but it is proprietary and carries Dell ownership and ProSupport-contract gating. Its Incident Management pricing is enterprise and opaque, with no public per-event or per-seat rate published by Dell. Aurora is open source, self-hosted, free, and focused on agentic investigation rather than ML correlation.

Can I run an open-source AI SRE without sending data to a cloud LLM?

Yes. Both HolmesGPT and Aurora support bringing your own model, including local inference via Ollama for air-gapped deployments, as documented in their respective GitHub repositories. This lets investigations run with no external API calls, which is why regulated and security-conscious teams favor open-source AI SRE over SaaS tools.

HolmesGPT Alternative: Multi-Cloud Open Source AI SRE (2026)

Key Takeaways

HolmesGPT is an excellent, CNCF-blessed, Kubernetes-native investigator. It was accepted into the CNCF Sandbox in October 2025, is Apache 2.0, and is maintained by Robusta with major contributions from Microsoft.

The honest trade-off is reach and action. HolmesGPT is read-only by design and respects RBAC. It diagnoses and can open suggested-fix PRs in Operator mode, and it runs read-only commands and queries, but it does not perform write or mutating cloud or cluster actions.

Aurora is the alternative when you need multi-cloud and execution. It investigates across AWS, Azure, GCP, OVH, Scaleway, and Kubernetes, and runs 'kubectl', 'aws', 'az', and 'gcloud' in sandboxed Kubernetes pods.

Aurora is a fuller incident platform. It builds a Memgraph blast-radius graph, generates postmortems that export to Confluence and Notion, and drafts remediation pull requests.

Both are free, self-hosted, and BYO-LLM. Both support local inference via Ollama for air-gapped environments, so incident telemetry never has to leave your perimeter.

If you are pure-Kubernetes and want CNCF governance, HolmesGPT may be the better pick. This is not a hit piece. It is a fit guide.

If your incidents live entirely inside Kubernetes, HolmesGPT is one of the strongest open-source AI SRE agents you can deploy in 2026. It is genuinely good, and it carries CNCF governance. The reason teams look for a HolmesGPT alternative is usually one of three things: their infrastructure spans more than Kubernetes, they want the agent to safely run commands rather than only read state, or they need a fuller platform with postmortems and a dependency graph. This post is an honest fit guide, not a takedown. We name HolmesGPT's real strengths, then show where Aurora is the better tool.

What is HolmesGPT?

HolmesGPT is an open-source, Apache 2.0 AI agent for investigating cloud-native incidents, built by Robusta with major contributions from Microsoft, and accepted into the CNCF Sandbox in October 2025. It connects large language models to live observability data through an agentic loop: when you feed it an alert or a question, it iteratively calls tools, gathers data from multiple sources, and builds a root-cause analysis, as described on its GitHub project and the CNCF announcement.

Its design strengths are real and worth naming clearly:

CNCF Sandbox governance. Vendor-neutral governance and a joint roadmap maintained by Robusta and Microsoft is meaningful for risk-averse buyers.
Read-only and RBAC-aware. Per the README, it 'has read-only access and respects RBAC permissions' and is described as safe to run in production. That small blast radius is a legitimate selling point.
Deep observability integrations. It ships with built-in toolsets for Prometheus, Loki, Tempo, Grafana, Datadog, ArgoCD, and many more, and it can consume MCP servers for additional sources.
Operator mode. It can run in the background, message you in Slack, and, with the GitHub integration, open pull requests with suggested fixes.
BYO-LLM. It supports OpenAI, Anthropic, Azure OpenAI, Bedrock, Gemini, and self-hosted models, so your data does not have to train anyone's model.

For a clear sense of where HolmesGPT sits versus the other CNCF Sandbox option, see our HolmesGPT vs K8sGPT comparison.

What is Aurora?

Aurora is an open-source, Apache 2.0 AI SRE and incident-management platform from Arvo AI that autonomously investigates incidents across multiple clouds and Kubernetes, executes commands in sandboxed pods, and generates root-cause analyses, postmortems, and remediation PRs. Its agents are LangGraph-orchestrated and, per the Aurora repository, they query AWS, Azure, GCP, OVH, Scaleway, and Kubernetes, then run 'kubectl', 'aws', 'az', and 'gcloud' commands inside sandboxed Kubernetes pods.

Where HolmesGPT stops at read-only diagnosis, Aurora is built to act. It builds a Memgraph infrastructure knowledge graph to model blast radius, suggests code fixes, and can open pull requests with the remediation. It ingests alerts via webhook from eleven monitoring connectors, PagerDuty, Datadog, Grafana, New Relic, OpsGenie, Netdata, Dynatrace, Coroot, ThousandEyes, BigPanda, and incident.io, plus a Slack bot. It is self-hosted, air-gapped capable, and supports local inference via Ollama. Destructive actions are human-gated.

HolmesGPT vs Aurora: the head-to-head

The core difference is scope and action: HolmesGPT is a read-only Kubernetes-first investigator with CNCF governance, while Aurora is a multi-cloud agent that executes commands and ships a fuller incident platform. Both are Apache 2.0, both self-host, and both let you bring your own model. The decision comes down to how much of your world lives outside Kubernetes and whether you want the agent to act, not only advise.

Dimension	HolmesGPT	Aurora
License	Apache 2.0	Apache 2.0
Governance	CNCF Sandbox since Oct 2025, Robusta plus Microsoft	Independent, built by Arvo AI
Multi-cloud reach	Kubernetes and cloud-native; native AWS, Azure, GCP and database toolsets, several via MCP	Native AWS, Azure, GCP, OVH, Scaleway, plus Kubernetes
Investigation vs execution	Read-only diagnosis, RBAC-aware	Runs kubectl, aws, az, gcloud in sandboxed pods
Write and remediation actions	Suggested-fix PRs in Operator mode	Human-gated execution plus remediation PRs
Dependency graph	Not a built-in feature	Memgraph blast-radius graph
Postmortems	Investigation reports	RCAs and postmortems exported to Confluence and Notion
Self-host and air-gap	Self-hosted, BYO-LLM via Ollama	Self-hosted, air-gapped bundles, BYO-LLM via Ollama

A few numbers to set scale. As of mid-2026, HolmesGPT is around 2,600 GitHub stars with frequent releases (0.31.1 shipped on 28 May 2026), reflecting a mature, fast-moving ecosystem. Aurora is the younger project at roughly 263 stars. If raw ecosystem maturity and CNCF backing are your top priorities, that gap is a fair point in HolmesGPT's favor.

Where HolmesGPT wins

Be honest about this. If your stack is heavily Kubernetes plus Prometheus plus Grafana, HolmesGPT's 30-plus built-in toolsets and read-only-by-default posture make it a low-risk, high-coverage choice. CNCF Sandbox status gives you neutral governance and a roadmap shared between Robusta and Microsoft. The read-only model means a smaller blast radius and an easier security review. For a pure-Kubernetes team that wants a CNCF-blessed investigator, those are strong reasons to pick HolmesGPT and stop reading here.

Where Aurora wins

Three places. First, reach: when an incident crosses from a pod into an AWS IAM policy, an Azure load balancer, or a GCP quota, an agent that natively queries those clouds correlates the failure in one investigation instead of leaving you to stitch it together by hand. See our guide to multi-cloud incident management for why this matters. Second, action: Aurora runs cloud and cluster commands inside sandboxed Kubernetes pods with destructive actions gated behind human approval, so the agent can do the read-heavy investigative legwork itself. We wrote a whole companion piece on the architecture of that, AI agent kubectl safety. Third, platform depth: a Memgraph blast-radius graph, postmortems that export to Confluence and Notion, and remediation PRs make Aurora a fuller incident workflow rather than a single investigation step.

Where do AIOps incumbents fit?

A fair question when you are evaluating open-source AI SRE: should you just buy a mature AIOps platform instead? The honest answer is that those tools solve a different, older problem and come with real lock-in.

Dell APEX AIOps Incident Management, the product formerly known as Moogsoft, is actively maintained and not discontinued. Per Dell's own writeup, Dell acquired Moogsoft, the original AIOps pioneer, and the product is a strong event-correlation and noise-reduction engine built on its 50-plus patented machine-learning inventions. The trade-offs are Dell ownership and contract gating: the companion infrastructure-observability product, formerly CloudIQ, is included only with Dell ProSupport or ProSupport Plus service agreements. Pricing for Incident Management is enterprise and opaque, with no public per-event or per-seat rate published by Dell. Either way, that is ML correlation, not agentic investigation, and it is neither open source nor self-hostable on your terms.

Grafana OnCall is a different category again, and an important one to get right. OnCall is alert routing, scheduling, and escalation, not investigation, and the grafana/oncall OSS repository was archived on 24 March 2026, pushing users toward Grafana Cloud IRM. OnCall and Aurora are complementary, not substitutes. Aurora does not replace your routing layer. It is the AI investigation layer that sits on top of whatever routing you migrate to, whether that is a self-hosted option like Keep or notifications via ntfy or Twilio. If you are mid-migration off OnCall, point your new router's webhook at Aurora and keep the two concerns separate.

Which should you choose?

Choose HolmesGPT if your incidents live inside Kubernetes and CNCF governance matters to you. Choose Aurora if your infrastructure spans multiple clouds, you want the agent to execute and not only diagnose, or you need postmortems, a dependency graph, and remediation PRs in one platform.

Pick HolmesGPT when:

Your stack is heavily Kubernetes, Prometheus, and Grafana, and your incidents stay there.
You value CNCF Sandbox governance and a fast-moving ecosystem with 30-plus observability toolsets.
You want strict read-only behavior for the simplest possible security review.
You do not need cross-cloud reasoning out of the box.

Pick Aurora when:

You operate across AWS, Azure, GCP, OVH, or Scaleway and need cross-cloud correlation in a single investigation.
You want the agent to run 'kubectl', 'aws', 'az', and 'gcloud' itself in sandboxed pods, with destructive actions human-gated.
You want a Memgraph blast-radius graph, auto-generated postmortems exported to Confluence and Notion, and remediation pull requests.
You want a vendor-neutral, free, self-hosted platform rather than per-event enterprise licensing.

In practice, some teams run both: HolmesGPT for fast in-cluster Kubernetes triage, Aurora for cross-cloud investigation, execution, and postmortem generation. For a three-way technical breakdown including K8sGPT, see our deeper guide on open-source AI SRE: Aurora vs HolmesGPT vs K8sGPT.

Getting started with Aurora

Aurora is the multi-cloud, execution-capable option among open-source AI SREs. It deploys via Docker Compose or Helm, supports any LLM provider including local models via Ollama for air-gapped deployments, and ingests alerts from eleven monitoring connectors plus a Slack bot. Point your alert source's webhook at Aurora, connect read-only cloud credentials first, and let it investigate alongside your on-call rotation for two weeks before you enable any write actions. For the safety architecture behind sandboxed execution, read AI agent kubectl safety.

HolmesGPT Alternative: Multi-Cloud Open Source AI SRE (2026)

Key Takeaways

What is HolmesGPT?

What is Aurora?

HolmesGPT vs Aurora: the head-to-head

Where HolmesGPT wins

Where Aurora wins

Where do AIOps incumbents fit?

Which should you choose?

Getting started with Aurora

Frequently Asked Questions

Related Articles

Dynatrace Davis Alternative: Open Source AI Root Cause Analysis (2026)

Datadog Bits AI SRE Alternative: Open Source, Self-Hosted, Vendor-Neutral

New Relic AI Alternative: Open Source AI Root Cause Analysis

Try Aurora for Free