PagerDuty Alternative for Root Cause Analysis: Why SRE Teams Are Adding AI Investigation
PagerDuty handles alerting and on-call. But who investigates the root cause? Aurora is an open source AI agent that autonomously investigates incidents across AWS, Azure, GCP, and Kubernetes.
Key Takeaway: PagerDuty is the industry standard for alerting and on-call management — but it doesn't investigate why incidents happen. Aurora is an open source AI agent that plugs into PagerDuty via webhooks and autonomously investigates root causes across AWS, Azure, GCP, and Kubernetes. They're complementary tools, but for teams spending hours on manual RCA, Aurora fills the gap PagerDuty doesn't cover.
PagerDuty has over 30,000 customers and dominates on-call management. It's excellent at what it does: detecting alerts, routing them to the right person, coordinating incident response, and tracking SLAs.
But here's the problem: PagerDuty pages you. Then you're on your own.
The actual investigation — SSHing into servers, querying CloudWatch, checking Kubernetes pod logs, correlating deployments with error spikes — is still manual. According to the VOID (Verica Open Incident Database), the median incident involves 3.5 contributing factors, and the investigation phase consumes the majority of mean time to resolve (MTTR).
This is the gap Aurora fills.
PagerDuty vs Aurora: Different Tools, Different Jobs
This isn't a "which is better" comparison. PagerDuty and Aurora solve different problems:
| PagerDuty | Aurora | |
|---|---|---|
| Primary job | Alert routing, on-call, coordination | Root cause investigation |
| Answers the question | "Who needs to know and how do we coordinate?" | "Why did this happen and what should we fix?" |
| Trigger | Monitoring tool fires alert | PagerDuty webhook (or Datadog, Grafana, etc.) |
| Output | Engineer gets paged, war room opens | Structured RCA with timeline, root cause, remediation |
They work together. Aurora ingests PagerDuty incident.triggered webhooks. When PagerDuty pages your SRE, Aurora is already investigating in the background.
What PagerDuty Does Well
PagerDuty's strengths are real and well-established:
- On-call scheduling — Flexible rotations, escalation policies, shift overrides
- Alert routing — 700+ integrations for ingesting alerts from every monitoring tool
- Multi-channel paging — SMS, phone, push notifications, email
- Incident coordination — War rooms, stakeholder communications, status pages
- SLA tracking — Urgency-based alerting and escalation
- AI noise reduction — AIOps add-on claims 91% alert noise reduction via intelligent correlation and deduplication
PagerDuty has also added AI features through PagerDuty Advance, including:
- AI incident summaries ("catch me up" in Slack)
- AI-generated status updates
- AI postmortem drafts (Beta)
- SRE Agent for triage and approved remediation actions
- Probable Origin for pattern-based root cause suggestions
Where PagerDuty Stops
Despite the AI additions, PagerDuty's investigation capabilities have limits:
No autonomous multi-step investigation. PagerDuty's SRE Agent surfaces past incidents and patterns, but it doesn't autonomously query your AWS accounts, check Kubernetes pod status, correlate Terraform changes, or trace dependency graphs. The investigation itself is still on the engineer.
No native cloud infrastructure querying. PagerDuty receives alerts from CloudWatch, Azure Monitor, etc. — it doesn't query them directly. It can't run kubectl get pods or aws cloudwatch get-metric-data on your behalf during an investigation.
No knowledge base with vector search. PagerDuty's RAG capability is partial — it requires configuring Amazon Q Business as an external integration. There's no native vector search over your runbooks and past postmortems.
No code fix suggestions. PagerDuty can surface recent code changes that may be related to an incident, but it doesn't generate remediation code or create pull requests.
AI features are paid add-ons. AIOps starts at $699/month. PagerDuty Advance starts at $415/month. These are on top of per-user pricing ($21-$41+/user/month depending on tier).
What Aurora Does Differently
Aurora is an open source (Apache 2.0) AI agent that automates the investigation phase — the part that happens after you get paged.
Autonomous Investigation
When Aurora receives an alert webhook, its LangGraph-orchestrated AI agents:
- Analyze the alert context (severity, service, timing)
- Dynamically select from 30+ tools to investigate
- Execute
kubectl,aws,az,gcloudcommands in sandboxed Kubernetes pods - Query logs, metrics, and recent deployments across cloud providers
- Search the knowledge base for relevant runbooks and past incidents
- Traverse the infrastructure dependency graph for blast radius
- Synthesize everything into a structured root cause analysis
No human in the loop during investigation. The SRE gets paged by PagerDuty and finds a completed RCA waiting in Aurora.
Multi-Cloud Native
Aurora connects directly to your cloud infrastructure:
| Provider | Authentication |
|---|---|
| AWS | STS AssumeRole (temporary credentials) |
| Azure | Service Principal |
| GCP | OAuth |
| OVH | API key |
| Scaleway | API token |
| Kubernetes | Kubeconfig via outbound WebSocket agent |
25+ Verified Integrations
| Category | Tools |
|---|---|
| Monitoring | PagerDuty, Datadog, Grafana, New Relic, Netdata, Dynatrace, Coroot, ThousandEyes, BigPanda, Splunk |
| Cloud | AWS, Azure, GCP, OVH, Scaleway |
| Infrastructure | Kubernetes, Terraform, Docker |
| CI/CD | GitHub, Bitbucket, Jenkins, CloudBees, Spinnaker |
| Docs & Knowledge | Confluence, Jira, SharePoint |
| Network | Cloudflare, Tailscale |
| Communication | Slack |
Knowledge Base with RAG
Aurora includes a built-in Weaviate-powered vector store. Upload your runbooks, past postmortems, and documentation — the AI agent searches them during every investigation using semantic similarity, not just keyword matching.
AI Code Fix Suggestions
Aurora can generate pull requests with remediation code via its GitHub and Bitbucket integrations. It doesn't just tell you what's wrong — it suggests how to fix it with actual code.
Automated Postmortems
Structured postmortem documents generated automatically with:
- Incident timeline with timestamps
- Root cause identification with evidence and citations
- Impact assessment
- Remediation steps (taken and recommended)
- One-click export to Confluence or Jira
Feature Comparison
| Feature | PagerDuty | Aurora |
|---|---|---|
| On-call scheduling | Yes (core) | No |
| Alert routing & escalation | Yes (core) | No |
| SMS/phone/push paging | Yes (core) | No |
| Status pages | Yes (add-on, from $89/mo) | No |
| SLA/SLO tracking | Yes | No |
| Autonomous AI investigation | Partial (SRE Agent for triage) | Yes (full multi-step) |
| Native cloud querying | No (receives alerts) | Yes (AWS, Azure, GCP, OVH, Scaleway) |
| CLI execution on infra | Via Runbook Automation add-on | Yes (sandboxed K8s pods) |
| Knowledge base (RAG) | Via Amazon Q Business integration | Yes (native Weaviate) |
| Infrastructure graph | No | Yes (Memgraph) |
| AI postmortems | Beta (via Jeli) | Yes (with Confluence export) |
| AI code fix PRs | No | Yes (GitHub, Bitbucket) |
| Open source | No (Rundeck only) | Yes (Apache 2.0) |
| Self-hosted | No (SaaS only) | Yes (Docker, Helm) |
| LLM provider choice | No (undisclosed, fixed) | Yes (OpenAI, Anthropic, Google, Ollama) |
| Integrations | 700+ | 25+ |
| Pricing | From $21/user/mo + AI add-ons ($415-$699/mo) | Free (self-hosted) |
Cost Comparison
For a team of 20 SREs on PagerDuty Business with AI features:
| Line Item | PagerDuty | Aurora |
|---|---|---|
| Base platform | $41/user/mo x 20 = $820/mo | $0 |
| AIOps | $699/mo | Included |
| PagerDuty Advance (GenAI) | $415/mo | Included |
| Status pages | $89/mo | Not included |
| Total | ~$2,023/mo | $0 + infra + LLM API |
Aurora's costs are infrastructure (a VM or K8s cluster) and LLM API usage. With Ollama running local models, the LLM cost is also $0.
Note: PagerDuty pricing verified from pagerduty.com/pricing as of March 2026. Aurora is free under Apache 2.0.
When to Use PagerDuty + Aurora Together
The strongest setup is running both:
- PagerDuty receives alerts from your monitoring tools (Datadog, CloudWatch, Grafana)
- PagerDuty pages the right on-call engineer via SMS/phone
- Aurora receives the same alert via PagerDuty webhook (
incident.triggered) - Aurora's AI agents investigate autonomously in the background
- The on-call SRE opens Aurora and finds a completed RCA with root cause, timeline, and remediation
- Aurora generates the postmortem and exports it to Confluence
PagerDuty handles the who and when. Aurora handles the why and how to fix it.
When Aurora Alone Might Be Enough
For smaller teams or budget-conscious organizations:
- You don't need enterprise on-call — Your team is small enough that a simple rotation works
- You already have alerting — Datadog, Grafana, or CloudWatch can send webhooks directly to Aurora
- Investigation is your bottleneck — You're spending more time diagnosing than coordinating
- You need self-hosted — Compliance or security requires keeping incident data on-premise
- Budget is limited — PagerDuty + AI add-ons at $2,000+/mo isn't feasible
Aurora can ingest webhooks directly from any monitoring tool — PagerDuty is not required.
Getting Started
git clone https://github.com/Arvo-AI/aurora.git
cd aurora
make init
make prod-prebuilt
Configure your PagerDuty webhook to point at Aurora, add your cloud provider credentials, and investigations start automatically.
Learn more at arvoai.ca or read the full documentation. For a comparison with other tools, see Aurora vs Traditional Incident Management Tools. To understand how AI investigation works, read What is Agentic Incident Management?.