Escalation Policies
Configure multi-step escalation policies to route incidents to the right people at the right time through any combination of channels, teams, and on-call schedules.
When an incident occurs, the first few minutes matter. Escalation policies give you layered, time-based notification routing — automatically escalating incidents through progressively more urgent channels and people until someone responds.
What Are Escalation Policies
An escalation policy is a workspace-scoped set of steps that define who gets notified, when, and through what channel. Each step has a delay (in minutes from incident creation) and one or more targets — the people or channels that should be notified.
Policies are evaluated in priority order (lower number = higher priority). The first policy that matches the incident's project is used. A policy with no project filter acts as a catch-all for any project without a more specific policy.
Policy Settings
| Setting | Required | Description |
|---|---|---|
| Name | Yes | Human-readable label for this policy |
| Priority | Yes | Evaluation order within the workspace (0 = highest priority). Must be unique per workspace. |
| Project filter | No | Specific projects this policy applies to. Leave empty for a catch-all policy. |
| Ack timeout (minutes) | No | If an incident is acknowledged but not resolved within this time, the acknowledgment is cleared and escalation restarts. null means no timeout. |
| Repeat count | No | How many times to repeat the full escalation cycle if the incident remains unresolved. Default: 0 (no repeat). |
| Enabled | No | Whether the policy is active. Default: true. |
Escalation Steps
Each policy contains one or more steps, ordered by position. Each step has:
| Setting | Description |
|---|---|
| Position | Order of this step within the policy (0-indexed) |
| Delay (minutes) | How many minutes after incident creation to fire this step. 0 means immediate. |
| Targets | One or more notification targets (see below) |
| Conditions | Optional conditions that must be met for this step to fire (e.g., severity filter) |
Step Targets
Each step can notify one or more targets. A target is one of four types:
| Target Type | Description |
|---|---|
| Alert Channel | Send a notification through a configured alert channel (email, Slack, PagerDuty, etc.) |
| User | Notify a specific user via their configured contact methods |
| Team | Notify all members of a team |
| On-Call Schedule | Notify whoever is currently on-call in a schedule |
You can combine multiple target types in a single step. For example, step 1 might notify both the on-call engineer and post to the #ops Slack channel simultaneously.
How Escalation Works
When a check goes down and an incident is created:
- Policy matching. Ionhour finds the highest-priority enabled policy that matches the incident's project. Policies with specific project filters are checked first; catch-all policies (no project filter) are used as fallback.
- Step scheduling. For each step in the matched policy, a delayed job is queued based on the step's
delayMinutes. - Dispatch check. When the delay elapses, Ionhour checks whether the incident is still active and unacknowledged. If resolved or acknowledged, the notification is skipped.
- Target resolution. For each target in the step:
- Alert Channel — sends via the configured channel
- User — sends via the user's contact methods
- Team — sends to all team members
- On-Call Schedule — resolves who is currently on-call and sends to them
- Dispatch recording. A dispatch record ensures the same step never fires twice for the same incident.
Cancellation Triggers
Pending escalation jobs are automatically cancelled when:
- The incident is resolved. If the check recovers before a step fires, pending notifications are removed.
- The incident is acknowledged. Acknowledgment stops the current escalation cycle.
If ack timeout is configured on the policy and the incident is acknowledged but not resolved within the timeout, the acknowledgment is cleared and escalation restarts from the beginning.
Repeat Behavior
If repeatCount is greater than 0, the entire escalation cycle repeats after the last step has fired — up to the configured number of times. This ensures that a long-running unacknowledged incident continues to generate notifications rather than going silent after the final step.
Building an Escalation Policy
Example: Three-Tier Production Escalation
Suppose you have these alert channels and teams configured:
| Resource | Type | Purpose |
|---|---|---|
ops-email | On-call engineer's inbox | |
Platform Team | Team | The platform engineering team |
Primary On-Call | On-Call Schedule | Weekly rotation of on-call engineers |
engineering-slack | Slack | Team Slack channel |
urgent-pagerduty | PagerDuty | Engineering manager escalation |
You would create a policy with three steps:
| Step | Delay | Targets | What happens |
|---|---|---|---|
| 1 | 0 min | Primary On-Call + ops-email | Immediately notify whoever is on-call and send an email |
| 2 | 5 min | Platform Team + engineering-slack | If unacknowledged after 5 min, notify the whole team via Slack |
| 3 | 15 min | urgent-pagerduty | If still unacknowledged after 15 min, page the engineering manager |
The flow:
Incident Created (T=0)
├── T+0: On-call engineer notified + ops email sent
├── T+5: Platform team + #engineering Slack (if not acknowledged)
└── T+15: PagerDuty page (if not acknowledged)If the on-call engineer acknowledges at T+3, steps 2 and 3 are cancelled.
Example: Catch-All Policy
Create a policy with no project filter and a lower priority (higher number) to act as a default for any project that doesn't have a specific policy:
| Setting | Value |
|---|---|
| Name | Default Escalation |
| Priority | 100 |
| Project filter | (empty — catch-all) |
| Steps | Step 1: Delay 0, Target: ops-email |
Policy Priority and Matching
Policies are evaluated in ascending priority order (0 first). The first match wins:
- Policies with a specific project filter that includes the incident's project
- Catch-all policies (no project filter)
This means you can have a strict P1-only PagerDuty escalation for your payment service (priority 0) while using a gentler email-only policy for internal tools (priority 10), with a catch-all for everything else (priority 100).
Interaction with Other Features
Deployments
If a check is covered by an active deployment window and auto-pause is enabled, the check is paused and won't create incidents. No incidents means no escalation.
Maintenance Windows
During an active maintenance window, alerts and/or incidents can be suppressed depending on the window's configuration. Suppressed incidents don't trigger escalation.
Muted Checks
Checks with notifications muted still create incidents, but alert dispatches are suppressed. Escalation steps schedule normally, but the dispatcher skips muted checks.
Retry Behavior
If an alert dispatch fails (e.g., Slack webhook is temporarily unreachable), Ionhour retries up to 3 times with exponential backoff (starting at 5 seconds). After all retries are exhausted, the failure is logged but the next step in the policy still fires at its scheduled delay.
Startup Recovery
Ionhour is resilient to restarts. When the escalation engine starts, it scans all open, unacknowledged incidents and re-schedules any steps that haven't been dispatched yet. Remaining delays are recalculated from the incident's startedAt timestamp, so a restart doesn't reset the escalation clock.
Migrating from Escalation Rules
If you previously used project-scoped escalation rules, those continue to work. However, we recommend migrating to escalation policies for:
- Multi-target steps — notify a team, on-call schedule, and Slack channel in one step
- Workspace-scoped management — manage all escalation in one place instead of per-project
- Ack timeout — automatically re-escalate if acknowledged incidents aren't resolved
- Repeat cycles — keep escalating for long-running incidents
- Priority ordering — fine-grained control over which policy matches which project
Best Practices
- Always include an immediate step. Every policy should have at least one step with
delayMinutes: 0. Someone should know about an incident the moment it's created. - Use on-call schedules as primary targets. Instead of hardcoding a specific user, point step 1 at an on-call schedule. This ensures the right person is always notified, even during handoffs.
- Use 3–5 minute gaps between steps. Spacing steps too closely creates noise. Spacing them too far risks slow response.
- Set an ack timeout for critical services. If your payment service goes down and someone acknowledges but gets pulled away, the ack timeout ensures the incident re-escalates rather than going silent.
- Keep policies short. Two or three steps is usually enough. If you need six levels of escalation, the problem is likely your incident response process, not your notification config.
- Use severity filtering on channels. Route P1 incidents to PagerDuty but keep P3/P4 incidents in email only.
- Create a catch-all policy. Ensure every project has at least one matching policy by creating a low-priority catch-all with basic email notification.
- Test your policies. Create a test project, configure your policy, and let a check go down intentionally. Verify each step fires at the expected time and that acknowledgment cancels remaining steps.