Skip to content

Status template — monitoring

Use when a fix has been applied and the system is recovering. Customer impact is no longer growing.

<short noun phrase> — monitoring

Examples:

  • “Elevated payout latency — monitoring”
  • “Webhook delivery delays — monitoring”
A fix has been applied for <symptom>. Plaza is monitoring recovery.
Status: <what the dashboards show. "Payout latency has returned to within
SLO." "Webhook delivery success rate is back above 99.9%." Be concrete;
this is the first signal customers see that the worst is over.>
Residual impact: <if any. "Backlogged webhook deliveries are still being
drained; expect them to clear within the next hour." If none, write
"None.">
Next update: <when the entry transitions to "resolved", or in 30 minutes,
whichever comes first.>
  • Do not skip “monitoring” on incidents lasting more than 60 minutes. Customers want to know the inflection point.
  • “Monitoring” is not “resolved”. Stay in monitoring until the dashboards have shown a sustained recovery for at least 30 minutes (60 for SEV-0).
  • If recovery does not hold, move back to “investigating” with a clear note. Do not pretend the incident is over.
  • Tone per AESTHETIC.md §8.