Which AI SRE tools are open source?

Three open-source projects dominate the agentic-investigation lane in 2026. Aurora is Apache 2.0 and supports AWS, Azure, GCP, OVH, Scaleway, and Kubernetes in a single self-hosted deployment. HolmesGPT is Apache 2.0, Kubernetes-first, co-maintained by Robusta and Microsoft, and entered the CNCF Sandbox in October 2025. K8sGPT is Apache 2.0 and has been in the CNCF Sandbox since December 19, 2023; it is a Kubernetes-only diagnostic analyzer. Rootly AI Labs publishes open-source prototypes but the Rootly product itself is closed core.

Which AI SRE tools support multi-cloud?

Among open-source tools, only Aurora supports multi-cloud (AWS, Azure, GCP, OVH, Scaleway, and Kubernetes) in a single deployment. Among commercial tools, Resolve.ai, Traversal, NeuBird Hawkeye, and incident.io can investigate across multiple clouds, but they do so by sending telemetry to vendor infrastructure. HolmesGPT supports AWS, Azure, GCP, Oracle Cloud, and OpenShift, but the cloud SDKs are exposed as read-only MCP wrappers rather than first-class integrations. K8sGPT and Causely are Kubernetes-only by design.

What is the difference between AI SRE and AIOps?

AIOps is statistical or machine-learning analysis of telemetry to cluster alerts, correlate events, or detect anomalies. It operates on streams that already exist. AI SRE in the agentic-investigation sense is an LLM agent that runs new tool calls during the incident to gather evidence the system does not already have. Many production deployments run both: AIOps reduces alert volume, and the agent does the deeper evidence-gathering work. PagerDuty's SRE Agent and Splunk's ITSI Episode Summarization are explicit attempts to layer agentic loops on top of mature AIOps.

Which AI SRE tool has the lowest total cost of ownership?

The Apache 2.0 open-source tools (Aurora, HolmesGPT, K8sGPT) have zero license cost. Total cost of ownership is infrastructure plus, optionally, hosted LLM API usage. With a local LLM via Ollama, recurring cost can be effectively zero. Commercial tools span per-user platform tiers (Rootly, incident.io), per-investigation billing (Datadog Bits AI, NeuBird), and custom-quoted enterprise contracts (Resolve.ai, ServiceNow). Per-seat and per-investigation pricing both scale with usage, while the open-source path does not.

Can AI SRE tools be self-hosted?

Only the three open-source projects (Aurora, HolmesGPT, and K8sGPT) are fully self-hostable. Every commercial AI SRE in the shortlist requires data to leave the customer perimeter for inference, even where a VPC peering option exists for telemetry collection. NeuBird Hawkeye offers a VPC deployment, but the inference path is still vendor-managed. For the full deployment-flexibility analysis, see our self-hosted AI SRE guide.

Which AI SRE tools support air-gapped deployment?

Aurora is the only tool in the 2026 shortlist that documents fully air-gapped deployment, using local LLMs via Ollama for inference, HashiCorp Vault for secrets, and customer-owned Memgraph and Weaviate instances for graph and RAG. HolmesGPT can run with a self-hosted LLM endpoint, which approximates air-gap, but the documentation assumes a hosted model provider. K8sGPT operates against the Kubernetes API only and can be configured to point at a local LLM. No commercial AI SRE in the shortlist offers an air-gapped configuration as of May 2026.

How do AI SRE tools handle production safety when running commands?

Most tools sidestep the question by being read-only. HolmesGPT, K8sGPT, NeuBird Hawkeye, Datadog Bits AI, incident.io AI SRE, and PagerDuty SRE Agent do not execute write commands in clusters by default. Aurora executes commands in sandboxed Kubernetes pods inside an isolated 'untrusted' namespace, wrapped in a four-layer safety pipeline: prompt-injection input rail, SigmaHQ signature match against a 37-rule starter set, per-organization command policy, and an LLM safety judge. ServiceNow Now Assist and LogicMonitor Edwin AI execute through their workflow automation engines with platform-native RBAC.

Which AI SRE tools integrate with PagerDuty, Datadog, and Slack?

Aurora ships webhook integrations for PagerDuty, Datadog, Grafana, Netdata, Dynatrace, Coroot, ThousandEyes, and BigPanda, plus a Slack bot. HolmesGPT integrates with PagerDuty, Slack, and Jira out of the box. NeuBird Hawkeye, Cleric.ai, and incident.io are Slack-native or Slack-first. PagerDuty's SRE Agent is Slack-aware via PagerDuty's existing Slack integration. Datadog Bits AI runs inside the Datadog product surface.

How do I evaluate an AI SRE tool in a 30-day pilot?

Pick one alert source and one cluster or service group. Run the tool read-only for the first three weeks, comparing its RCA to the human RCA on every incident. Track agreement rate, time-to-RCA, and findings the human missed. Ingest your past postmortems and runbooks into the agent's knowledge base; that is the single biggest accuracy lever and most teams underinvest in it. Review traces weekly for tool misuse and hallucinated resources. Only after four weeks of clean traces should you graduate to alert-triggered investigation. Remediation is a separate trust escalation that most teams stage over months, not weeks.

Top 15 AI SRE Tools in 2026: Open-Source, Commercial, and Hybrid Compared

Key Takeaways

An AI SRE tool applies large-language-model reasoning to incident response, usually as a multi-step agent that runs infrastructure tools, summarizes events, or drafts postmortems. The label spans five archetypes that vendors blur in marketing: agentic investigation, AIOps correlation, postmortem generation, ITSM-integrated copilots, and workflow-automation suites with AI add-ons.

We score every tool on the AI SRE Capability Matrix. Five axes (Investigation, Remediation, Postmortem, Deployment Flexibility, Source Availability), each 0 to 3, total 15. The matrix tracks publicly documented capability as of May 2026.

Three open-source projects span the agentic-investigation lane. Aurora (Apache 2.0, multi-cloud), HolmesGPT (Apache 2.0, CNCF Sandbox since October 2025, co-maintained by Robusta and Microsoft), and K8sGPT (Apache 2.0, CNCF Sandbox since 19 December 2023, Kubernetes diagnostics).

Cited funding rounds in the last twelve months. Resolve.ai raised $125M at a $1B valuation in February 2026 and extended at a $1.5B valuation in April 2026. Traversal raised $48M in June 2025. incident.io closed a $62M Series B in September 2024.

Incumbents shipped AI SRE features by Q2 2026. PagerDuty SRE Agent, Datadog Bits AI SRE, Splunk ITSI Episode Summarization announced at .conf25 (September 2025), ServiceNow Now Assist SRE Specialist (GA targeted June 2026), and LogicMonitor Edwin AI. The procurement question moves from "is there an AI option" to "which archetype, at what deployment tier."

Site reliability teams in 2026 are evaluating tools in a market that has reorganised faster than most procurement processes can keep up with. Five archetypes share the "AI SRE" label, and buyers regularly compare a postmortem generator to an agentic investigator as if they did the same job. This guide compares the fifteen most-cited tools across both open-source and commercial categories, scored on a single capability matrix so the decision becomes one of fit.

A note on bias. Arvo builds Aurora, an open-source agentic AI SRE tool listed below. We applied the same scoring rubric to every product on the list, including our own, and cited every numeric or capability claim that is not common knowledge.

What is an AI SRE tool?

An AI SRE tool applies large-language-model reasoning to incident response. The term covers five distinct archetypes, and only two of them actually investigate incidents.

Agentic investigation. A multi-step LLM agent that calls infrastructure tools (kubectl, cloud APIs, log queries, dependency graphs) during an incident to gather new evidence and produce a root-cause analysis. Aurora, HolmesGPT, K8sGPT, Resolve.ai, Traversal, NeuBird, Cleric, Causely, and Ciroos all market themselves with this framing.
AIOps correlation. Statistical or ML clustering of alerts to reduce noise. PagerDuty Intelligent Alert Grouping, BigPanda, Dell APEX (Moogsoft), Dynatrace Davis. The category predates LLMs.
Postmortem generation. An LLM that drafts the retrospective from artefacts the team already has (Slack transcripts, monitor data, the investigation trace). Rootly, incident.io Scribe, FireHydrant, Datadog Bits AI, PagerDuty Scribe. Covered in our Automated Post-Mortem Generation guide.
ITSM-integrated copilot. AI inside an existing service-management workflow. ServiceNow Now Assist SRE Specialist, LogicMonitor Edwin AI, Splunk ITSI Episode Summarization.
Workflow-automation suite plus AI add-on. Incident platforms that bolted AI onto existing on-call, runbook, and status-page features. incident.io AI SRE, Rootly AI, FireHydrant AI.

Conflating archetypes is the most common evaluation mistake. A team buying a postmortem generator will not get root-cause analysis. A team buying an AIOps correlator will not get a tool that runs kubectl. For the foundational definitions, see our AI SRE Complete Guide and AI-Powered Incident Investigation.

The AI SRE Capability Matrix

Five axes, each scored 0 to 3. We apply the same rubric to every tool in the shortlist.

Axis	0	1	2	3
Investigation	None	Single-shot LLM summary	Multi-step agent, single cloud or platform	Multi-step agent, multi-cloud, with RAG over historical evidence
Remediation	None	Suggested commands	PR-based fixes with approval	Sandboxed in-cluster execution with policy guardrails
Postmortem	None	Manual export of a transcript	LLM-drafted from artefacts	LLM-drafted from the agent's own investigation trace, exported to Confluence or Jira
Deployment flexibility	SaaS-only, public cloud	SaaS with private VPC peering	Self-hosted in customer VPC	Air-gapped with local LLM (Ollama or vLLM)
Source availability	Closed source	Source-available, paid	Open core	Apache 2.0 or MIT, fully open

A higher score is not always "better." A team without LLM-ops capacity should not score deployment flexibility 3 against its roadmap. The matrix is for like-for-like comparison, not a leaderboard.

For a deeper treatment of the deployment-flexibility axis, see our companion piece, Self-Hosted AI SRE: The 2026 Guide to Air-Gapped, Multi-Cloud, and BYO-LLM Deployment.

Which AI SRE tools are most-cited in 2026?

Ordered alphabetically inside each archetype. Scoring reflects the publicly documented capability of each product as of May 2026, not roadmap claims. For category foundations, see our open-source incident management overview and the root cause analysis complete guide for SREs.

Agentic-investigation tools

1. Aurora (Arvo AI), Apache 2.0, multi-cloud

Best for: SRE teams that need self-hosted, multi-cloud, BYO-LLM agentic investigation with the option to graduate into PR-based remediation.
Deployment: Docker Compose, Helm chart, or air-gapped with Ollama. Customer-owned infrastructure.
License: Apache 2.0. Code at github.com/Arvo-AI/aurora.
Investigation: LangGraph-orchestrated ReAct agent, 30+ integrations across AWS, Azure, GCP, OVH, Scaleway, and Kubernetes. Memgraph dependency graph feeds an alert-correlation pre-step. Weaviate hybrid (BM25 plus vector) RAG over runbooks and past postmortems.
Remediation: Sandboxed kubectl execution into an isolated "untrusted" namespace, wrapped in a four-layer command-safety pipeline (input rail, SigmaHQ signature match, per-org policy, LLM safety judge). Aurora Actions add scheduled and event-triggered automations.
Postmortem: Postmortem agent fed by the investigation trace, exported to Confluence Cloud (OAuth) or Server / Data Center (PAT).
Pricing: Free (Apache 2.0). Infrastructure cost only. Optionally, LLM API usage. With local Ollama the recurring software cost is zero.
Watch out for: Self-host means the team operates the agent. Teams without basic Kubernetes ops capacity should pilot in an existing managed cluster first.
Capability score: Investigation 3, Remediation 3, Postmortem 3, Deployment 3, Source 3, total 15/15. The score reflects the breadth of the open-source feature set against the matrix, not a quality verdict relative to commercial competitors.

2. Causely, closed source, Kubernetes-only

Best for: Kubernetes-only teams that want causal-graph reasoning rather than LLM-first investigation.
Deployment: SaaS with in-cluster collector. CNCF Causely member listing (member, not project).
License: Closed source.
Investigation: Topology graph plus causality graph plus a "codebook" of failure patterns; the authors describe a deterministic abductive-inference layer that precedes any LLM call. See How Causely Works and the InfoQ piece on causal reasoning in observability. Gartner Cool Vendor for AIOps, December 2025.
Remediation: Suggestion-based via MCP server.
Postmortem: Not a first-class artefact.
Pricing: Not publicly disclosed.
Watch out for: Kubernetes-only by design. If the platform spans cloud SDKs and managed services, the model is incomplete.
Capability score: Investigation 2, Remediation 1, Postmortem 0, Deployment 0, Source 0, total 3/15.

3. Cleric.ai, closed source, Slack-first

Best for: SRE teams that triage primarily in Slack and use Datadog or Grafana for telemetry.
Deployment: SaaS.
License: Closed.
Investigation: Slack-native AI SRE per cleric.ai. Integrations with Datadog and Grafana are documented on the product site.
Remediation: Suggestion-based.
Postmortem: Investigation transcript only.
Pricing: Not publicly disclosed.
Watch out for: Slack-first is a strong constraint. Teams on Microsoft Teams or under strict ChatOps governance may find the surface rigid.
Capability score: Investigation 2, Remediation 1, Postmortem 1, Deployment 0, Source 0, total 4/15.

4. HolmesGPT, Apache 2.0, Kubernetes-first

Best for: Kubernetes-heavy teams that want a CNCF-aligned, RBAC-respecting investigation agent.
Deployment: Helm via Robusta, or standalone CLI. LLM provider is the customer's choice.
License: Apache 2.0. Code at github.com/HolmesGPT/holmesgpt. CNCF Sandbox since October 2025, co-maintained by Robusta and Microsoft.
Investigation: Iterative ReAct agent. Built-in toolsets span Prometheus, Grafana, AWS / Azure / GCP via MCP read-only, Datadog, and Confluence. Releases v0.20 through v0.25 shipped between February and April 2026 (Releases page).
Remediation: Read-only by default. Operator mode can open GitHub PRs. No in-cluster execution.
Postmortem: Not first-class. Investigations route to Slack, PagerDuty, or Jira.
Pricing: Free. Robusta sells a managed SaaS that wraps HolmesGPT.
Watch out for: AWS, Azure, and GCP support is exposed through MCP wrappers rather than first-class cloud SDK integration. The customer IAM model must fit MCP's read-only assumptions.
Capability score: Investigation 2, Remediation 1, Postmortem 1, Deployment 2, Source 3, total 9/15.

5. K8sGPT, Apache 2.0, Kubernetes-only diagnostics

Best for: Quick diagnostic sanity checks on a single cluster.
Deployment: CLI, in-cluster operator, or Helm.
License: Apache 2.0. CNCF Sandbox since 19 December 2023.
Investigation: Rule-based analyser set (Pod, Deployment, Ingress, Service, NetworkPolicy, etc.) with an LLM translating findings into natural language. Closer to L3 (single-shot diagnosis) than L4 (agentic multi-step) on the AICL.
Remediation: Suggestion-based per k8sgpt docs.
Postmortem: Not a feature.
Pricing: Free.
Watch out for: Strong privacy feature: resource names and labels are anonymised before LLM calls per the docs. Scope is limited to the cluster API; the tool cannot reach out to cloud APIs or external systems.
Capability score: Investigation 1, Remediation 1, Postmortem 0, Deployment 2, Source 3, total 7/15.

6. NeuBird Hawkeye, closed source, multi-platform

Best for: Datadog-heavy AWS shops that want a managed AI SRE.
Deployment: SaaS or VPC. Mayfield, M12, and AWS GenAI Accelerator backing per neubird.ai.
License: Closed.
Investigation: Ephemeral processing (telemetry not stored). Integrations with Datadog, Splunk, CloudWatch, PagerDuty, and ServiceNow per the Hawkeye deep-dive.
Remediation: Read-only by default. Integrations forward to ITSM.
Postmortem: Investigation transcript export.
Pricing: Per-investigation pricing listed on AWS Marketplace; enterprise contracts also available. See NeuBird's product page for the latest.
Watch out for: "Self-learning" implies a vector store that customers cannot directly inspect. Diligence the data path for regulated workloads.
Capability score: Investigation 2, Remediation 1, Postmortem 1, Deployment 1, Source 0, total 5/15.

7. Resolve.ai, closed source

Best for: Enterprise teams that want a managed "AI Production Engineer" with named-customer case studies.
Deployment: SaaS with in-VPC satellite agent for telemetry. No on-prem option. SOC 2, GDPR, HIPAA per the Resolve trust page.
License: Closed.
Investigation: Knowledge-graph plus LLM agent per the Resolve knowledge-graph post. Founders include Spiros Xanthos, an OpenTelemetry co-creator. Resolve's Series A press release reports vendor-claimed customer results that Arvo has not independently verified: 72% investigation-time reduction at Coinbase, 87% faster investigations at DoorDash, and 30% fewer engineers per incident at Zscaler.
Remediation: Generates suggested commands. Public architecture detail is limited.
Postmortem: Investigation transcript.
Pricing: Enterprise. Public pricing is not disclosed.
Watch out for: Cloud-only and closed-source. The two public LLM benchmark posts (Sonnet 4.6) use a private dataset with no public methodology, so the numbers are unreplicable.
Capability score: Investigation 3, Remediation 1, Postmortem 1, Deployment 1, Source 0, total 6/15.

8. Traversal, closed source

Best for: Log-heavy enterprise environments where causal search across telemetry is the bottleneck.
Deployment: SaaS with flexible deployment options. $48M from Sequoia and Kleiner Perkins, June 2025.
License: Closed.
Investigation: "Production World Model" and "Causal Search Engine" per Traversal's product blog. Vendor-reported production results at American Express, summarised in the Fortune launch coverage and Traversal's Amex announcement: 32% MTTR reduction and 82% RCA accuracy across roughly 250 billion log lines per day. Customer stories at Eventbrite, PepsiCo, and DigitalOcean.
Remediation: Read-only.
Postmortem: Investigation transcript.
Pricing: Enterprise.
Watch out for: Heavy reliance on trademarked frameworks. Confirm during evaluation how much is novel architecture versus packaging.
Capability score: Investigation 3, Remediation 1, Postmortem 1, Deployment 1, Source 0, total 6/15.

Incumbent and incident-workflow tools

9. Datadog Bits AI SRE, closed source

Best for: Teams standardised on Datadog observability who want investigation where the data already lives.
Deployment: SaaS, multi-tenant.
License: Closed.
Investigation: Multi-agent architecture with planner and worker agents. Datadog's engineering posts Building Bits AI SRE and the evaluation platform describe the design without releasing source. HIPAA-compliant per the product page. Seven triage actions including Slack, Teams, and Jira.
Remediation: Triage actions only.
Postmortem: Bits AI drafts post-incident reports per the product page.
Pricing: Per-conclusive-investigation billing on top of host, APM, logs, and RUM licensing per Datadog pricing.
Watch out for: Bits is tightly bound to Datadog's data plane. Using it without the full Datadog stack is not a supported pattern.
Capability score: Investigation 2, Remediation 1, Postmortem 2, Deployment 0, Source 0, total 5/15.

10. Edwin AI (LogicMonitor), closed source

Best for: Existing LogicMonitor Envision customers expanding into agentic AIOps.
Deployment: SaaS layered on LogicMonitor.
License: Closed.
Investigation: Ten-plus specialised sub-agents (investigation, correlation, remediation, orchestrator) per the agent-taxonomy post. MCP ecosystem support (Dynatrace, Splunk, ServiceNow, Elastic, GitHub, Confluence). A Forrester Total Economic Impact study commissioned by LogicMonitor reports 313% ROI on a composite organisation with sub-six-month payback.
Remediation: Closed-loop with policy guardrails per LogicMonitor's product description.
Postmortem: Investigation transcript.
Pricing: Bundled with LogicMonitor; quoted.
Watch out for: Customers must purchase LogicMonitor to use Edwin. Not a standalone option.
Capability score: Investigation 2, Remediation 2, Postmortem 1, Deployment 1, Source 0, total 6/15.

11. incident.io AI SRE, closed source

Best for: Teams already using incident.io for on-call and incident workflow who want the AI add-on.
Deployment: SaaS.
License: Closed.
Investigation: Multi-agent system searching GitHub PRs, Slack, historical incidents, logs, metrics, and traces per incident.io's AI SRE introduction. An "ambient agent" continuously monitors. The ZenML LLMOps case study documents the retrieval evolution from embeddings-only to deterministic tagging plus re-ranking.
Remediation: Recommendations only.
Postmortem: Scribe drafts post-incident reports.
Pricing: Platform tiers on incident.io's pricing page. AI SRE access is gated to design partners as of the launch announcement.
Watch out for: Verify AI SRE availability for your tier before assuming you can use it on day one.
Capability score: Investigation 2, Remediation 1, Postmortem 2, Deployment 0, Source 0, total 5/15.

12. PagerDuty SRE Agent, closed source

Best for: PagerDuty Operations Cloud customers who want a memory-equipped agent inside the existing on-call surface.
Deployment: SaaS, inside PagerDuty Operations Cloud per the product page.
License: Closed.
Investigation: Per-tenant memory: service-scoped observations, incident recollections, human-promoted playbooks. See PagerDuty's engineering post We Built an SRE Agent With Memory. MCP server. Connectors to Grafana, New Relic, and Honeycomb. Three-tier engagement model (agent-led, collaborative, human-led).
Remediation: Suggestions and automation hooks through existing PagerDuty workflows.
Postmortem: PagerDuty Scribe.
Pricing: Per-seat tiers and AIOps add-ons listed on PagerDuty pricing.
Watch out for: AI pricing across the incident-management category is moving from per-seat to usage-based. Model the long-term cost against incident volume rather than seat count.
Capability score: Investigation 2, Remediation 1, Postmortem 2, Deployment 0, Source 0, total 5/15.

13. Rootly AI, closed source

Best for: Teams that want an AI-first ChatOps incident response with an open MCP server and an actively published agent roadmap.
Deployment: SaaS.
License: Closed core. Rootly AI Labs publishes open-source prototypes.
Investigation: Analyses code changes, telemetry, and past incidents per the Rootly AI SRE page. An AI Meeting Bot joins incident bridges and transcribes. The Rootly API agent-first announcement describes the MCP-based agentic surface used by Cursor, Windsurf, and Claude.
Remediation: Suggestions plus workflow automation.
Postmortem: AI-drafted from incident artefacts.
Pricing: Tiers listed on Rootly pricing.
Watch out for: "AI-first" branding outpaces the published architecture detail; in evaluation, ask for the agent loop description and the rule-based-automation boundary.
Capability score: Investigation 2, Remediation 1, Postmortem 2, Deployment 0, Source 1, total 6/15.

14. ServiceNow Now Assist SRE Specialist, closed source

Best for: Enterprises on ServiceNow ITSM that want triage and post-mortems inside the same platform.
Deployment: SaaS, ServiceNow cloud.
License: Closed.
Investigation: The "SRE Specialist" performs triage (what, impact, priority, who) and autonomous post-mortem authoring, announced as part of the Autonomous Workforce in ServiceNow's Knowledge 2026 release. GA targeted June 2026.
Remediation: Workflow automation.
Postmortem: Autonomous authoring claimed.
Pricing: Custom-quoted. Public pricing is not disclosed.
Watch out for: As of May 2026 the product is pre-GA and most coverage is press-release or keynote material. Treat capabilities as preliminary until verified during the design-partner phase.
Capability score: Investigation 2, Remediation 2, Postmortem 2, Deployment 0, Source 0, total 6/15.

15. Splunk ITSI Episode Summarization, closed source (Alpha)

Best for: Splunk-heavy enterprises that want LLM summaries layered on existing KPI engines.
Deployment: Splunk Cloud.
License: Closed.
Investigation: ITSI Episode Summarization, announced at .conf25 (September 2025), is in Alpha. The feature layers an LLM-generated summary (what happened, when, key events, suspected cause) onto Splunk ITSI's KPI-based episodes. Splunk also ships Event iQ for AI-driven alert correlation, listed on the ITSI product page.
Remediation: Recommendation-based.
Postmortem: Not yet a published feature.
Pricing: Splunk ITSI is data-volume or entity-count licensed. The AI features are in Alpha.
Watch out for: Alpha contract and capability terms can shift. Plan a re-evaluation after GA.
Capability score: Investigation 1, Remediation 1, Postmortem 1, Deployment 0, Source 0, total 3/15.

Scoring summary

#	Tool	License	Score
1	Aurora	Apache 2.0	15
4	HolmesGPT	Apache 2.0	9
5	K8sGPT	Apache 2.0	7
7	Resolve.ai	Closed	6
8	Traversal	Closed	6
10	Edwin AI	Closed	6
13	Rootly AI	Closed (Labs OSS)	6
14	ServiceNow Now Assist SRE	Closed	6
6	NeuBird Hawkeye	Closed	5
9	Datadog Bits AI SRE	Closed	5
11	incident.io AI SRE	Closed	5
12	PagerDuty SRE Agent	Closed	5
3	Cleric.ai	Closed	4
2	Causely	Closed	3
15	Splunk ITSI Episode Summarization	Closed	3

The open-source projects lead the deployment-flexibility and source-availability axes by definition. Aurora is the only entry that scores 3 on every axis. Commercial leaders cluster around 5 to 6 because they are uniformly strong on investigation but weak on deployment flexibility and source availability. Kubernetes-only projects (K8sGPT, Causely) and pre-GA incumbents (Splunk ITSI) cluster low because their scope or maturity caps multiple axes.

The score does not pick a winner. It picks a fit. A bank under FedRAMP High obligations evaluates this list differently from a 50-engineer Series B startup. The deployment axis answers the fitness question; investigation answers the depth question; source availability answers the trust question.

How do I choose an AI SRE tool?

Most procurement processes stall because the team compares across all five axes at once. Asking these three questions in order eliminates twelve of the fifteen tools before vendor demos.

Does the data have to stay in our perimeter? If yes, the answer is Aurora, HolmesGPT, or K8sGPT. Every commercial product on this list requires data to leave the customer perimeter for inference. See Self-Hosted AI SRE for the architecture you will need.
Is the scope multi-cloud or Kubernetes-only? If multi-cloud, the open-source shortlist narrows to Aurora; in the commercial set, Resolve.ai, Traversal, NeuBird, and incident.io are the credible candidates. If Kubernetes-only, every tool except Aurora's non-Kubernetes integrations remains valid.
Do you need to take action, or only investigate? Read-only covers most of the open-source category and most incumbent AI features. Actioning agents narrow the list to Aurora (PR-based, sandboxed kubectl, plus Aurora Actions), ServiceNow Now Assist (workflow automation), and Edwin AI (closed-loop within LogicMonitor).

For depth on the action-safety question, see our AI Agent kubectl Safety guide and CI/CD Auto-Remediation Complete Guide.

What to watch next

Arvo expects the category to converge along three axes through the rest of 2026.

Model Context Protocol convergence. PagerDuty, Rootly, Aurora, HolmesGPT, Causely, and Edwin AI have all shipped MCP servers. MCP is on track to become table stakes by year-end, which means differentiation will shift to prompt graphs, RAG quality, and policy guardrails.
Open benchmarking. Resolve.ai and Rootly have published proprietary LLM benchmark posts, neither with a reproducible dataset. The first open, named benchmark with a public incident corpus is likely to set the citation surface the category orbits.
Pricing model fragmentation. Per-seat (PagerDuty, Rootly, incident.io), per-investigation (Datadog Bits AI, NeuBird), per-credit (ServiceNow), per-cloud-host (Edwin AI), and free open source (Aurora, HolmesGPT, K8sGPT) coexist today. Expect convergence on a published reference cost per investigation as buyers compare more rigorously.

Differentiation in this market is structural rather than feature-list. Buyers who score against the capability matrix and apply the deployment, scope, and action questions usually land a credible shortlist of two or three tools within a week. Buyers running feature-list comparisons evaluate for a quarter.