Escalation Policies

Configure multi-step escalation policies to route incidents to the right people at the right time through any combination of channels, teams, and on-call schedules.

When an incident occurs, the first few minutes matter. Escalation policies give you layered, time-based notification routing — automatically escalating incidents through progressively more urgent channels and people until someone responds.

What Are Escalation Policies

An escalation policy is a workspace-scoped set of steps that define who gets notified, when, and through what channel. Each step has a delay (in minutes from incident creation) and one or more targets — the people or channels that should be notified.

Policies are evaluated in priority order (lower number = higher priority). The first policy that matches the incident's project is used. A policy with no project filter acts as a catch-all for any project without a more specific policy.

Policy Settings

Setting	Required	Description
Name	Yes	Human-readable label for this policy
Priority	Yes	Evaluation order within the workspace (0 = highest priority). Must be unique per workspace.
Project filter	No	Specific projects this policy applies to. Leave empty for a catch-all policy.
Ack timeout (minutes)	No	If an incident is acknowledged but not resolved within this time, the acknowledgment is cleared and escalation restarts. `null` means no timeout.
Repeat count	No	How many times to repeat the full escalation cycle if the incident remains unresolved. Default: `0` (no repeat).
Enabled	No	Whether the policy is active. Default: `true`.

Escalation Steps

Each policy contains one or more steps, ordered by position. Each step has:

Setting	Description
Position	Order of this step within the policy (0-indexed)
Delay (minutes)	How many minutes after incident creation to fire this step. `0` means immediate.
Targets	One or more notification targets (see below)
Conditions	Optional conditions that must be met for this step to fire (e.g., severity filter)

Step Targets

Each step can notify one or more targets. A target is one of four types:

Target Type	Description
Alert Channel	Send a notification through a configured alert channel (email, Slack, PagerDuty, etc.)
User	Notify a specific user via their configured contact methods
Team	Notify all members of a team
On-Call Schedule	Notify whoever is currently on-call in a schedule

You can combine multiple target types in a single step. For example, step 1 might notify both the on-call engineer and post to the #ops Slack channel simultaneously.

How Escalation Works

When a check goes down and an incident is created:

Policy matching. Ionhour finds the highest-priority enabled policy that matches the incident's project. Policies with specific project filters are checked first; catch-all policies (no project filter) are used as fallback.
Step scheduling. For each step in the matched policy, a delayed job is queued based on the step's delayMinutes.
Dispatch check. When the delay elapses, Ionhour checks whether the incident is still active and unacknowledged. If resolved or acknowledged, the notification is skipped.
Target resolution. For each target in the step:
- Alert Channel — sends via the configured channel
- User — sends via the user's contact methods
- Team — sends to all team members
- On-Call Schedule — resolves who is currently on-call and sends to them
Dispatch recording. A dispatch record ensures the same step never fires twice for the same incident.

Cancellation Triggers

Pending escalation jobs are automatically cancelled when:

The incident is resolved. If the check recovers before a step fires, pending notifications are removed.
The incident is acknowledged. Acknowledgment stops the current escalation cycle.

If ack timeout is configured on the policy and the incident is acknowledged but not resolved within the timeout, the acknowledgment is cleared and escalation restarts from the beginning.

Repeat Behavior

If repeatCount is greater than 0, the entire escalation cycle repeats after the last step has fired — up to the configured number of times. This ensures that a long-running unacknowledged incident continues to generate notifications rather than going silent after the final step.

Building an Escalation Policy

Example: Three-Tier Production Escalation

Suppose you have these alert channels and teams configured:

Resource	Type	Purpose
`ops-email`	Email	On-call engineer's inbox
`Platform Team`	Team	The platform engineering team
`Primary On-Call`	On-Call Schedule	Weekly rotation of on-call engineers
`engineering-slack`	Slack	Team Slack channel
`urgent-pagerduty`	PagerDuty	Engineering manager escalation

You would create a policy with three steps:

Step	Delay	Targets	What happens
1	0 min	`Primary On-Call` + `ops-email`	Immediately notify whoever is on-call and send an email
2	5 min	`Platform Team` + `engineering-slack`	If unacknowledged after 5 min, notify the whole team via Slack
3	15 min	`urgent-pagerduty`	If still unacknowledged after 15 min, page the engineering manager

The flow:

Incident Created (T=0)
  ├── T+0:  On-call engineer notified + ops email sent
  ├── T+5:  Platform team + #engineering Slack (if not acknowledged)
  └── T+15: PagerDuty page (if not acknowledged)

If the on-call engineer acknowledges at T+3, steps 2 and 3 are cancelled.

Example: Catch-All Policy

Create a policy with no project filter and a lower priority (higher number) to act as a default for any project that doesn't have a specific policy:

Setting	Value
Name	Default Escalation
Priority	100
Project filter	(empty — catch-all)
Steps	Step 1: Delay 0, Target: `ops-email`

Policy Priority and Matching

Policies are evaluated in ascending priority order (0 first). The first match wins:

Policies with a specific project filter that includes the incident's project
Catch-all policies (no project filter)

This means you can have a strict P1-only PagerDuty escalation for your payment service (priority 0) while using a gentler email-only policy for internal tools (priority 10), with a catch-all for everything else (priority 100).

Interaction with Other Features

Deployments

If a check is covered by an active deployment window and auto-pause is enabled, the check is paused and won't create incidents. No incidents means no escalation.

Maintenance Windows

During an active maintenance window, alerts and/or incidents can be suppressed depending on the window's configuration. Suppressed incidents don't trigger escalation.

Muted Checks

Checks with notifications muted still create incidents, but alert dispatches are suppressed. Escalation steps schedule normally, but the dispatcher skips muted checks.

Retry Behavior

If an alert dispatch fails (e.g., Slack webhook is temporarily unreachable), Ionhour retries up to 3 times with exponential backoff (starting at 5 seconds). After all retries are exhausted, the failure is logged but the next step in the policy still fires at its scheduled delay.

Startup Recovery

Ionhour is resilient to restarts. When the escalation engine starts, it scans all open, unacknowledged incidents and re-schedules any steps that haven't been dispatched yet. Remaining delays are recalculated from the incident's startedAt timestamp, so a restart doesn't reset the escalation clock.

Migrating from Escalation Rules

If you previously used project-scoped escalation rules, those continue to work. However, we recommend migrating to escalation policies for:

Multi-target steps — notify a team, on-call schedule, and Slack channel in one step
Workspace-scoped management — manage all escalation in one place instead of per-project
Ack timeout — automatically re-escalate if acknowledged incidents aren't resolved
Repeat cycles — keep escalating for long-running incidents
Priority ordering — fine-grained control over which policy matches which project

Best Practices

Always include an immediate step. Every policy should have at least one step with delayMinutes: 0. Someone should know about an incident the moment it's created.
Use on-call schedules as primary targets. Instead of hardcoding a specific user, point step 1 at an on-call schedule. This ensures the right person is always notified, even during handoffs.
Use 3–5 minute gaps between steps. Spacing steps too closely creates noise. Spacing them too far risks slow response.
Set an ack timeout for critical services. If your payment service goes down and someone acknowledges but gets pulled away, the ack timeout ensures the incident re-escalates rather than going silent.
Keep policies short. Two or three steps is usually enough. If you need six levels of escalation, the problem is likely your incident response process, not your notification config.
Use severity filtering on channels. Route P1 incidents to PagerDuty but keep P3/P4 incidents in email only.
Create a catch-all policy. Ensure every project has at least one matching policy by creating a low-priority catch-all with basic email notification.
Test your policies. Create a test project, configure your policy, and let a check go down intentionally. Verify each step fires at the expected time and that acknowledgment cancels remaining steps.

On this page