Grafana OnCall Alternative: Open Source On-Call and AI SRE After the 2026 Archival
Grafana OnCall OSS was archived in March 2026. Compare open-source routing replacements like Keep and ntfy, then layer Aurora on for AI SRE investigation.
Key Takeaways
- The grafana/oncall OSS repository was archived on March 24, 2026, and is now read-only, with security patches limited to CVEs scoring CVSS 7.0 or higher, per the Grafana OnCall OSS archival notice.
- The Cloud Connection that powered mobile push, SMS, and phone notifications for OSS users was disabled on the same date, so self-hosted users must move push to ntfy, Pushover, or Gotify and bring their own Twilio account for calls and SMS.
- Grafana OnCall was alert routing, scheduling, and escalation, not investigation, so any replacement only restores the routing layer. The paid successor is Grafana Cloud IRM, free for 3 active users and then roughly 20 USD per active IRM user.
- Keep (MIT, around 11.9k GitHub stars) is the strongest open-source routing and correlation layer, though its maintainers confirm on-call scheduling is not a native feature and pairs with a calling tool.
- Aurora (Apache 2.0) is a different layer: an open-source AI SRE that autonomously investigates incidents across AWS, Azure, GCP, OVH, Scaleway, and Kubernetes, and it sits on top of whatever routing you choose rather than replacing it.
If you self-hosted Grafana OnCall OSS, the March 2026 archival forced a decision. This guide confirms exactly what broke, lays out the open-source options for rebuilding the routing layer, and then explains where an AI investigation layer like Aurora fits. The honest framing up front: OnCall did routing and Aurora does investigation, so this is not a one-for-one swap.
What happened to Grafana OnCall OSS?
Grafana OnCall OSS was archived on March 24, 2026, and the repository is now read-only with no further feature development. According to the Grafana OnCall OSS archival notice, the project entered maintenance mode on March 11, 2025, and was fully archived a year later. Grafana Labs explained the timeline and reasoning in its maintenance-mode announcement, which coincided with the launch of Grafana Cloud IRM, the unified cloud product that merged OnCall and Incident.
Two concrete things broke for self-hosted users on the archival date:
- Cloud-dependent notifications stopped. Mobile app push notifications, plus SMS and phone calls that relied on the Grafana Cloud Connection, are no longer supported for OSS users, per the archival notice. Your schedules and escalation chains keep running, but the paging channels that routed through Grafana Cloud went dark.
- Security coverage narrowed. The codebase still receives fixes only for critical bugs and CVEs with a CVSS score of 7.0 or higher. Everything else is frozen.
The repository itself remains open source under AGPLv3 with roughly 3.9k stars, so you can keep running or even fork the archived code. But you would own all maintenance, and the cloud-backed notification path is gone for good.
What did Grafana OnCall actually do?
Grafana OnCall was an on-call management tool: scheduling, alert routing, escalation chains, and notification delivery. The Grafana OnCall OSS page describes it as calendar-based on-call schedules, automatic escalation chains with flexible routing to reach the right person during an outage, alert grouping to cut noise, and notifications over Slack, Telegram, voice, and SMS.
What it did not do was investigate. OnCall answered 'who gets paged and how do we escalate,' not 'why did this break and what do we fix.' That distinction matters when you pick a replacement, because the routing layer and the investigation layer are separate jobs. If you are rethinking the whole stack, our guide to open-source incident management covers how the pieces fit together.
What are the open-source replacements for the routing layer?
The honest answer is that no single open-source project is a drop-in clone of OnCall plus its cloud notifications, so most teams assemble two or three pieces: a routing and correlation engine, a notification transport, and optionally a managed path for phone and SMS. Here is how the realistic options compare.
| Option | What it covers | License and cost | On-call scheduling | Notes |
|---|---|---|---|---|
| Keep | Alert management, correlation, de-noising, workflow automation | MIT, self-hosted free | Not native; workflows can route and escalate, calling needs a third party | Around 11.9k stars; strongest open-source single-pane-of-glass for alerts |
| ntfy | Push notification transport via HTTP pub-sub | Dual Apache 2.0 and GPLv2, self-hosted free | None; it is a delivery channel | Around 29.7k stars; replaces lost OSS push, self-hostable |
| Twilio (BYO account) | Voice calls and SMS for escalation | Commercial, usage-based; no flat per-seat figure published for this use | None; it is a delivery channel | The path the archival notice itself recommends for OSS calls and SMS |
| Grafana Cloud IRM | Scheduling, routing, escalation, incident response, built-in paging | Free for 3 active users, then about 20 USD per active IRM user plus a 19 USD monthly platform fee on Pro | Native and managed | The official paid successor; gives back cloud push, SMS, and voice |
Keep: the open-source alert routing and correlation layer
Keep is the closest open-source project to a routing-layer replacement. It is MIT licensed, self-hostable for free, and carries roughly 11.9k GitHub stars. Keep ingests alerts from many sources, deduplicates and correlates them, and runs declarative YAML workflows that feel like GitHub Actions for your monitoring tools, including conditional routing by team, environment, or business hours.
Be precise about one thing: Keep does not ship native on-call scheduling and escalation the way OnCall did. In a public discussion of Keep as a Grafana OnCall alternative, a Keep team member who also created the original OnCall states that Keep is a toolbox for alerts focused on de-noising, correlation, and enrichment, and that for phone and SMS escalation you still need a third-party service. A common pattern they suggest is Keep for the single pane of glass plus a minimal calling tool for the actual paging.
ntfy and Twilio: restoring the notification channels
ntfy is a self-hosted, open-source push notification service, dual licensed under Apache 2.0 and GPLv2 with around 29.7k stars. It directly replaces the mobile push you lost when the Cloud Connection was disabled, with no third-party dependency. For voice and SMS, the Grafana archival notice itself points OSS users to bring their own Twilio credentials. Twilio is commercial and billed by usage, and there is no flat per-seat rate published for this specific routing use, so budget by message and call volume.
Grafana Cloud IRM: the paid official path
If you would rather not assemble pieces, Grafana Cloud IRM is the maintained successor that unifies on-call scheduling, alert routing, escalation, and incident response with built-in multi-channel paging. It is free for 3 active IRM users, then roughly 20 USD per active IRM user with a 19 USD monthly platform fee on Pro, and an enterprise tier with a custom annual commitment. An active IRM user is one included in schedules or escalation chains, so you pay for engineers who actually go on call. Migration from OSS uses Terraform or the OnCall API, and the migration guide recommends moving resources in order: integrations, then escalation chains, then routes, then schedules.
Where does Aurora fit, and does it replace OnCall?
No, Aurora does not replace Grafana OnCall, and it does not pretend to. Aurora is an open-source, Apache 2.0 AI SRE that handles the investigation layer, the part OnCall never touched. You still need a routing layer underneath it, whether that is Keep, ntfy and Twilio, Grafana Cloud IRM, PagerDuty, or anything else that can fire a webhook.
Here is the division of labor. Your routing layer decides who gets paged and escalates if they miss it. Aurora, triggered by the same alert, autonomously investigates why the incident is happening while the engineer is still reading the page.
| Dimension | Grafana OnCall OSS (archived) | Aurora |
|---|---|---|
| Primary job | Alert routing, scheduling, escalation | Autonomous incident investigation and root cause analysis |
| License | AGPLv3, archived March 2026 | Apache 2.0, actively developed |
| Deployment | Self-hosted, cloud notifications now disabled | Self-hosted, air-gapped capable, BYO-LLM |
| Multi-cloud reach | Routes alerts, does not query clouds | Queries AWS, Azure, GCP, OVH, Scaleway, and Kubernetes directly |
| Investigation vs correlation | Neither; it is routing | Multi-step agentic investigation with a Memgraph blast-radius graph |
| Write or execute actions | Sends notifications only | Runs kubectl, aws, az, and gcloud in sandboxed Kubernetes pods, human-gated for destructive steps |
| Pricing model | Free but unmaintained | Free, self-hosted; LLM cost only, and zero with local models |
| Self-host and air-gap | Self-host, no air-gap story for cloud paging | Self-host and air-gapped, with local inference via Ollama |
What Aurora does after an alert arrives: its LangGraph-orchestrated agents query your cloud and Kubernetes APIs, execute read commands in sandboxed pods, build a Memgraph knowledge graph to estimate blast radius, generate a root-cause analysis and a postmortem you can export to Confluence, Notion, or SharePoint, and suggest code fixes or open a pull request. Destructive actions are always gated on a human approval. Aurora ingests alerts via webhook from eleven monitoring connectors, PagerDuty, Datadog, Grafana, New Relic, OpsGenie, Netdata, Dynatrace, Coroot, ThousandEyes, BigPanda, and incident.io, plus a Slack bot, so your migrated routing layer can hand off cleanly. For teams that need to keep incident data on their own infrastructure, our self-hosted AI SRE guide covers the air-gapped deployment in detail.
Aurora's differentiator against Kubernetes-only assistants is that it spans multiple clouds and actually executes investigation commands rather than only diagnosing. Against closed SaaS investigation tools, its differentiator is being open source, self-hosted, free, and vendor-neutral. If you are weighing the broader category, see how we frame adding AI investigation to a paging tool and how investigation works across clouds in multi-cloud incident management.
Which should you choose?
Choose based on which layer you are rebuilding. If you only need to restore routing, Keep plus ntfy plus a Twilio account is the most complete open-source assembly, and Grafana Cloud IRM is the lowest-effort paid path that gives back managed paging. None of those investigate incidents.
If your real pain after the archival is that investigation was always manual, add Aurora on top of whichever routing layer you land on. A practical stack is Keep for correlation and routing, ntfy or Twilio for delivery, and Aurora subscribed to the same alerts for autonomous root-cause analysis. The routing tools answer who and when. Aurora answers why and how to fix it, and it stays free and self-hosted while doing so.