Ionhour Docs
Reporting

Analytics & Uptime Reporting

Track uptime, latency, and reliability metrics across your workspace to measure SLA compliance and identify trends.

Ionhour collects signals from every check in your workspace and computes analytics that help you understand your infrastructure's health over time. These metrics range from workspace-wide reliability scores down to per-check latency percentiles, giving you the data you need for SLA reporting, capacity planning, and incident retrospectives.

Analytics Overview

Ionhour provides analytics at three levels:

LevelWhat it coversKey metrics
WorkspaceAll checks across all projectsReliability score, total checks, incident count, MTTR
ProjectAll checks within a projectUptime percentage, signal volume, check health overview
CheckIndividual check performanceUptime %, latency (avg, P50, P95), drift stats, downtime duration

Each level builds on the one below it. Workspace reliability is derived from individual check uptimes. Project health is an aggregate of its checks. And check-level metrics are computed from raw signal data.

Workspace-Level Metrics

Reliability Score

The workspace reliability score is an aggregate uptime percentage across all checks that have received signals in the selected time period. It answers the question: "What percentage of the time were my services available?"

The score is calculated by:

  1. Computing the uptime ratio for each check (successful signals / total signals).
  2. Averaging those ratios across all checks with signal data.
  3. Rounding to two decimal places.

A workspace with 10 checks, where 9 have 100% uptime and 1 has 90% uptime, would show a reliability score of 99.00%.

Checks with no signals in the selected period are excluded from the reliability calculation. A newly created check that hasn't received its first ping won't drag your reliability score down.

You can query the workspace reliability score via the API or through the Ionhour MCP server:

# Via API — using the checks-stats endpoint with workspace scope
curl "https://api.ionhour.com/api/checks-stats?workspaceId=1" \
  -H "Authorization: Bearer YOUR_TOKEN"

The response includes:

FieldDescription
checksWidget.activeChecksTotal number of active checks in the workspace
checksWidget.activeChecksDiffPercentageMonth-over-month change in check count
signalsWidget.totalSignalsTotal signals received across all checks
signalsWidget.uptimePercentageWorkspace-wide uptime for the current month
signalsWidget.signalsDiffPercentageMonth-over-month change in signal volume
newSignalsSignals received in the last 24 hours
alertsTriggeredTotal incidents created for workspace checks

Workspace Summary

The checks overview endpoint provides a health breakdown of all checks in the workspace:

curl "https://api.ionhour.com/api/checks-stats/overview?workspaceId=1" \
  -H "Authorization: Bearer YOUR_TOKEN"
FieldDescription
totalChecksTotal checks in scope
downChecksChecks currently in DOWN status
dependencyImpactedChecksChecks impacted by a dependency outage
unstableChecksChecks with a high SUSPECT/LATE signal ratio (>20% over 5+ samples)
impactedChecksUnion of down, dependency-impacted, and unstable checks

This overview is the data behind the dashboard health widgets. Use it to get a quick pulse on your infrastructure without drilling into individual checks.

MTTR (Mean Time to Resolution)

The workspace reliability report includes MTTR, calculated from resolved incidents in the selected period:

MTTR = sum(resolvedAt - startedAt) / count(resolved incidents)

MTTR is one of the most important incident response metrics. A decreasing MTTR over time indicates improving incident response processes. An increasing MTTR may signal alert fatigue, staffing gaps, or growing system complexity.

MTTR only includes resolved incidents. Active incidents are excluded because their final duration is unknown. If you have long-running active incidents, your MTTR may appear artificially low.

Project-Level Metrics

Signal Stats

Project-level signal stats aggregate across all checks in a project:

curl "https://api.ionhour.com/api/checks-stats?projectId=1" \
  -H "Authorization: Bearer YOUR_TOKEN"

The response includes the same fields as the workspace-level query, but scoped to a single project. This is useful for per-team or per-service reporting when your projects map to organizational boundaries.

Response Time Overview

For projects with inbound (heartbeat) checks, Ionhour tracks signal drift — the difference between when a signal was expected and when it actually arrived.

curl "https://api.ionhour.com/api/checks-stats/response-time/overview?projectId=1" \
  -H "Authorization: Bearer YOUR_TOKEN"
FieldDescription
avgDriftMsAverage drift across all checks in the project
checksCountNumber of checks included
onTimeRatePercentage of signals arriving within tolerance
degradedChecksChecks where average drift exceeds 50% of their schedule

This helps identify checks that are consistently running late but haven't yet triggered incidents — a leading indicator of capacity issues.

Check-Level Uptime

Inbound Check Uptime

For inbound (heartbeat) checks, uptime is calculated from the ratio of SUCCESS signals to total signals (SUCCESS + FAIL) for the current calendar month:

uptime = successSignals / (successSignals + failSignals) * 100

Access per-check stats through the check detail view in the dashboard, or via the API:

# Dashboard stats endpoint returns uptime for a specific check
curl "https://api.ionhour.com/api/checks-stats?projectId=1" \
  -H "Authorization: Bearer YOUR_TOKEN"

Per-check stats include:

FieldDescription
uptimeUptime percentage for the current month
avgDriftAverage signal drift in milliseconds
totalPingsTotal signals received since check creation
currentMonthDowntimeTotal downtime in milliseconds for the current month
lastMonthDowntimeTotal downtime in milliseconds for the previous month

Downtime is calculated by summing the duration of all overlapping incidents within the time period, not from signal gaps. This means a check that goes down for 10 minutes, recovers, and goes down again for 5 minutes would show 15 minutes of downtime.

Outbound Check Uptime

For outbound checks, uptime is calculated from probe results:

uptimePercent = successfulProbes / totalProbes * 100

Query outbound stats with a configurable time range:

curl "https://api.ionhour.com/api/checks/42/outbound-stats?rangeHours=24" \
  -H "Authorization: Bearer YOUR_TOKEN"

The response provides:

FieldDescription
uptimePercentPercentage of successful probes
totalProbesTotal probe attempts in the range
successCountProbes that returned an OK result
failureCountProbes that failed for any reason
avgLatencyMsMean response time across all probes
p50LatencyMsMedian (50th percentile) response time
p95LatencyMs95th percentile response time
maxLatencyMsMaximum observed response time

You can filter by region using the probeId parameter to see per-region performance:

# Stats from the EU West probe only
curl "https://api.ionhour.com/api/checks/42/outbound-stats?rangeHours=24&probeId=eu-west-1" \
  -H "Authorization: Bearer YOUR_TOKEN"

Outbound Performance Metrics

Latency Percentiles

Ionhour computes P50 (median) and P95 latency for outbound checks. These percentiles are more useful than averages for understanding user experience:

  • P50 tells you the typical response time experienced by most users.
  • P95 tells you the worst-case response time experienced by 5% of requests. This is the number you should use for SLA reporting.
  • Average can be misleading when a few slow outliers skew the mean.

For example, if your P50 is 120ms and your P95 is 800ms, that means half your probes complete in under 120ms, but 5% of probes take over 800ms. If your SLA promises sub-500ms responses, you have a problem that the average (which might be 180ms) would hide.

Outbound Timeline

The timeline endpoint provides bucketed probe results over time, enabling trend visualization:

curl "https://api.ionhour.com/api/checks/42/outbound-timeline?rangeHours=24&bucketMinutes=5" \
  -H "Authorization: Bearer YOUR_TOKEN"
ParameterDefaultDescription
rangeHours24How many hours of data to return
bucketMinutes5Size of each time bucket in minutes
probeIdallFilter to a specific probe region

Each bucket in the response contains:

FieldDescription
timestampStart of the time bucket
avgLatencyMsAverage response time in this bucket
maxLatencyMsMaximum response time in this bucket
successCountNumber of successful probes
failureCountNumber of failed probes

Use the timeline to identify patterns like:

  • Periodic latency spikes that correlate with batch job schedules.
  • Gradual latency increases indicating capacity degradation.
  • Failure clusters that suggest network or DNS issues at specific times.

Outbound check timeline chart showing latency and success/failure over 24 hours

Inbound Check Response Time

Drift Statistics

For heartbeat checks, Ionhour measures drift — how late (or early) each signal arrives relative to the expected schedule. This is distinct from latency, which measures HTTP response time.

curl "https://api.ionhour.com/api/checks-stats/42/response-time?range=24h" \
  -H "Authorization: Bearer YOUR_TOKEN"
FieldDescription
avgDriftMsAverage drift across all signals in the range
p50DriftMsMedian drift
p95DriftMs95th percentile drift
p99DriftMs99th percentile drift
minDriftMsSmallest observed drift
maxDriftMsLargest observed drift
onTimeRatePercentage of signals arriving within 5% of the schedule
avgDurationMsAverage execution time reported by the signal payload
sampleCountNumber of signals included in the calculation

Available time ranges: 1h, 24h, 7d, 30d.

Drift Timeline

The drift timeline provides bucketed drift statistics for trend visualization:

curl "https://api.ionhour.com/api/checks-stats/42/response-time/timeline?range=24h" \
  -H "Authorization: Bearer YOUR_TOKEN"

Bucket sizes vary by range:

RangeBucket size
1h5 minutes
24h1 hour
7d6 hours (uses pre-aggregated hourly data)
30d1 day (uses pre-aggregated daily data)

For the 7d and 30d ranges, Ionhour uses pre-aggregated statistics rather than querying raw signals. This keeps queries fast even for checks that generate thousands of signals per day.

Uptime Timeline

The uptime timeline provides a time-bucketed view of signal health across all checks in a project or workspace:

# Workspace-wide, last 7 days
curl "https://api.ionhour.com/api/checks-stats/uptime?workspaceId=1&range=7d" \
  -H "Authorization: Bearer YOUR_TOKEN"

# Single check, last 24 hours
curl "https://api.ionhour.com/api/checks-stats/uptime?workspaceId=1&range=24h&checkId=42" \
  -H "Authorization: Bearer YOUR_TOKEN"

Each bucket contains counts of SUCCESS, FAIL, SUSPECT, and DEPLOYMENT signals. This is the data behind the uptime visualization bars in the dashboard.

Signal TypeMeaning
SUCCESSCheck reported healthy
FAILCheck missed its schedule or returned an error
SUSPECTSignal arrived late but within the grace period
DEPLOYMENTA deployment event was recorded

Status Page Uptime Bars

If you use status pages, each component can display an uptime bar showing availability history over a configurable period (up to 365 days). These bars are powered by the same signal data used in the analytics endpoints.

When a status page component is linked to a check, the uptime bar reflects that check's actual signal history. Each day segment is colored based on the ratio of successful to failed signals that day:

  • Green — 100% successful signals
  • Yellow/Orange — Partial failures
  • Red — Majority or all signals failed
  • Gray — No signals received

Configure the uptime bar length per-component in the status page settings. See the Status Pages guide for details.

Using Analytics for SLA Reporting

Ionhour's analytics map directly to common SLA metrics:

SLA MetricIonhour Data Source
Availability (uptime %)Workspace reliability score or per-check uptime percentage
Response time (P95)Outbound check P95 latency
Incident countWorkspace or project incident count
MTTRWorkspace reliability MTTR calculation
Downtime durationPer-check currentMonthDowntime

Generating an SLA Report

To produce a monthly SLA report, query these endpoints:

Get workspace reliability for the reporting period. Use the checks-stats endpoint with your workspace ID to get the overall uptime percentage, incident count, and signal volume.

Get per-check downtime for each critical service. The check stats endpoint returns currentMonthDowntime and lastMonthDowntime in milliseconds, which you can convert to minutes of downtime.

Get outbound latency percentiles for any HTTP-monitored endpoints. The outbound-stats endpoint returns P50 and P95 latency, which map directly to response time SLAs.

Calculate SLA compliance. Compare the measured uptime and latency against your SLA thresholds. For example, if your SLA promises 99.9% uptime (roughly 43 minutes of downtime per month), compare the total downtime against that threshold.

Using Analytics for Capacity Planning

Analytics trends over time reveal capacity issues before they become incidents:

  • Rising P95 latency on outbound checks suggests your service is approaching capacity limits. If P95 is climbing while P50 stays flat, a subset of requests is hitting a bottleneck.
  • Increasing drift on inbound checks means your cron jobs are taking longer to complete. This often indicates growing data volumes or resource contention.
  • Declining on-time rate below 95% signals that your jobs are routinely running late. Consider increasing the schedule interval or allocating more resources.
  • Growing unstable check count in the workspace overview means more checks are showing intermittent issues. Investigate before they become persistent failures.

Use the 30d time range for capacity planning analysis. Shorter ranges (1h, 24h) show too much noise from transient spikes. The 30-day view reveals sustained trends.

Dashboard Widgets

The Ionhour dashboard surfaces analytics through several widgets:

WidgetWhat it shows
Stats SegmentActive checks, total signals, uptime %, and month-over-month trends
Uptime StatisticsTime-bucketed signal chart (success/fail/suspect) with range selector
Live Status BadgeCurrent health status based on checks overview
Incidents WidgetActive incidents with severity and duration

Dashboard analytics widgets

These widgets update in real time via SSE. When a new signal arrives or a check status changes, the dashboard reflects the change without requiring a page refresh.

Best Practices

  • Monitor trends, not snapshots. A single 99.5% uptime reading doesn't tell you much. Track uptime weekly to spot declining trends before they breach your SLA.
  • Use P95, not averages, for latency SLAs. Averages hide tail latency issues. If your SLA says "95th percentile response time under 500ms," measure P95 directly.
  • Compare month-over-month. Ionhour provides month-over-month diffs for check count and signal volume. A sudden drop in signal volume might mean a check stopped running, not that everything is healthy.
  • Set up outbound checks for SLA-critical endpoints. Inbound checks measure whether your cron jobs run. Outbound checks measure whether your users can reach your service. For SLA reporting, you usually need outbound data.
  • Review MTTR trends quarterly. MTTR reflects your team's incident response effectiveness. If it's increasing, investigate whether it's due to more complex incidents, slower acknowledgment, or insufficient escalation rules.
  • Export data for external reporting. Use the API endpoints documented above to pull analytics data into your reporting tools, internal dashboards, or customer-facing SLA reports.
  • Filter by region for global services. If you run outbound checks from multiple regions, always break down latency and uptime by region. A global average can mask regional problems.