Ionhour Docs
Incidents & Alerts

Escalation Policies

Configure multi-step escalation policies to route incidents to the right people at the right time through any combination of channels, teams, and on-call schedules.

When an incident occurs, the first few minutes matter. Escalation policies give you layered, time-based notification routing — automatically escalating incidents through progressively more urgent channels and people until someone responds.

What Are Escalation Policies

An escalation policy is a workspace-scoped set of steps that define who gets notified, when, and through what channel. Each step has a delay (in minutes from incident creation) and one or more targets — the people or channels that should be notified.

Policies are evaluated in priority order (lower number = higher priority). The first policy that matches the incident's project is used. A policy with no project filter acts as a catch-all for any project without a more specific policy.

Policy Settings

SettingRequiredDescription
NameYesHuman-readable label for this policy
PriorityYesEvaluation order within the workspace (0 = highest priority). Must be unique per workspace.
Project filterNoSpecific projects this policy applies to. Leave empty for a catch-all policy.
Ack timeout (minutes)NoIf an incident is acknowledged but not resolved within this time, the acknowledgment is cleared and escalation restarts. null means no timeout.
Repeat countNoHow many times to repeat the full escalation cycle if the incident remains unresolved. Default: 0 (no repeat).
EnabledNoWhether the policy is active. Default: true.

Escalation Steps

Each policy contains one or more steps, ordered by position. Each step has:

SettingDescription
PositionOrder of this step within the policy (0-indexed)
Delay (minutes)How many minutes after incident creation to fire this step. 0 means immediate.
TargetsOne or more notification targets (see below)
ConditionsOptional conditions that must be met for this step to fire (e.g., severity filter)

Step Targets

Each step can notify one or more targets. A target is one of four types:

Target TypeDescription
Alert ChannelSend a notification through a configured alert channel (email, Slack, PagerDuty, etc.)
UserNotify a specific user via their configured contact methods
TeamNotify all members of a team
On-Call ScheduleNotify whoever is currently on-call in a schedule

You can combine multiple target types in a single step. For example, step 1 might notify both the on-call engineer and post to the #ops Slack channel simultaneously.

How Escalation Works

When a check goes down and an incident is created:

  1. Policy matching. Ionhour finds the highest-priority enabled policy that matches the incident's project. Policies with specific project filters are checked first; catch-all policies (no project filter) are used as fallback.
  2. Step scheduling. For each step in the matched policy, a delayed job is queued based on the step's delayMinutes.
  3. Dispatch check. When the delay elapses, Ionhour checks whether the incident is still active and unacknowledged. If resolved or acknowledged, the notification is skipped.
  4. Target resolution. For each target in the step:
    • Alert Channel — sends via the configured channel
    • User — sends via the user's contact methods
    • Team — sends to all team members
    • On-Call Schedule — resolves who is currently on-call and sends to them
  5. Dispatch recording. A dispatch record ensures the same step never fires twice for the same incident.

Cancellation Triggers

Pending escalation jobs are automatically cancelled when:

  • The incident is resolved. If the check recovers before a step fires, pending notifications are removed.
  • The incident is acknowledged. Acknowledgment stops the current escalation cycle.

If ack timeout is configured on the policy and the incident is acknowledged but not resolved within the timeout, the acknowledgment is cleared and escalation restarts from the beginning.

Repeat Behavior

If repeatCount is greater than 0, the entire escalation cycle repeats after the last step has fired — up to the configured number of times. This ensures that a long-running unacknowledged incident continues to generate notifications rather than going silent after the final step.

Building an Escalation Policy

Example: Three-Tier Production Escalation

Suppose you have these alert channels and teams configured:

ResourceTypePurpose
ops-emailEmailOn-call engineer's inbox
Platform TeamTeamThe platform engineering team
Primary On-CallOn-Call ScheduleWeekly rotation of on-call engineers
engineering-slackSlackTeam Slack channel
urgent-pagerdutyPagerDutyEngineering manager escalation

You would create a policy with three steps:

StepDelayTargetsWhat happens
10 minPrimary On-Call + ops-emailImmediately notify whoever is on-call and send an email
25 minPlatform Team + engineering-slackIf unacknowledged after 5 min, notify the whole team via Slack
315 minurgent-pagerdutyIf still unacknowledged after 15 min, page the engineering manager

The flow:

Incident Created (T=0)
  ├── T+0:  On-call engineer notified + ops email sent
  ├── T+5:  Platform team + #engineering Slack (if not acknowledged)
  └── T+15: PagerDuty page (if not acknowledged)

If the on-call engineer acknowledges at T+3, steps 2 and 3 are cancelled.

Example: Catch-All Policy

Create a policy with no project filter and a lower priority (higher number) to act as a default for any project that doesn't have a specific policy:

SettingValue
NameDefault Escalation
Priority100
Project filter(empty — catch-all)
StepsStep 1: Delay 0, Target: ops-email

Policy Priority and Matching

Policies are evaluated in ascending priority order (0 first). The first match wins:

  1. Policies with a specific project filter that includes the incident's project
  2. Catch-all policies (no project filter)

This means you can have a strict P1-only PagerDuty escalation for your payment service (priority 0) while using a gentler email-only policy for internal tools (priority 10), with a catch-all for everything else (priority 100).

Interaction with Other Features

Deployments

If a check is covered by an active deployment window and auto-pause is enabled, the check is paused and won't create incidents. No incidents means no escalation.

Maintenance Windows

During an active maintenance window, alerts and/or incidents can be suppressed depending on the window's configuration. Suppressed incidents don't trigger escalation.

Muted Checks

Checks with notifications muted still create incidents, but alert dispatches are suppressed. Escalation steps schedule normally, but the dispatcher skips muted checks.

Retry Behavior

If an alert dispatch fails (e.g., Slack webhook is temporarily unreachable), Ionhour retries up to 3 times with exponential backoff (starting at 5 seconds). After all retries are exhausted, the failure is logged but the next step in the policy still fires at its scheduled delay.

Startup Recovery

Ionhour is resilient to restarts. When the escalation engine starts, it scans all open, unacknowledged incidents and re-schedules any steps that haven't been dispatched yet. Remaining delays are recalculated from the incident's startedAt timestamp, so a restart doesn't reset the escalation clock.

Migrating from Escalation Rules

If you previously used project-scoped escalation rules, those continue to work. However, we recommend migrating to escalation policies for:

  • Multi-target steps — notify a team, on-call schedule, and Slack channel in one step
  • Workspace-scoped management — manage all escalation in one place instead of per-project
  • Ack timeout — automatically re-escalate if acknowledged incidents aren't resolved
  • Repeat cycles — keep escalating for long-running incidents
  • Priority ordering — fine-grained control over which policy matches which project

Best Practices

  • Always include an immediate step. Every policy should have at least one step with delayMinutes: 0. Someone should know about an incident the moment it's created.
  • Use on-call schedules as primary targets. Instead of hardcoding a specific user, point step 1 at an on-call schedule. This ensures the right person is always notified, even during handoffs.
  • Use 3–5 minute gaps between steps. Spacing steps too closely creates noise. Spacing them too far risks slow response.
  • Set an ack timeout for critical services. If your payment service goes down and someone acknowledges but gets pulled away, the ack timeout ensures the incident re-escalates rather than going silent.
  • Keep policies short. Two or three steps is usually enough. If you need six levels of escalation, the problem is likely your incident response process, not your notification config.
  • Use severity filtering on channels. Route P1 incidents to PagerDuty but keep P3/P4 incidents in email only.
  • Create a catch-all policy. Ensure every project has at least one matching policy by creating a low-priority catch-all with basic email notification.
  • Test your policies. Create a test project, configure your policy, and let a check go down intentionally. Verify each step fires at the expected time and that acknowledgment cancels remaining steps.