How we communicate about incidents
What is an incident?
An incident is an operational event requiring a timely response because the platform is unavailable or experiences a performance degradation affecting key functionality for a significant number of customers.
An incident will typically have the following stages:
- Investigating: our teams are actively investigating but the cause has not yet been identified.
- Identified: the cause and remediation action have been identified.
- Monitoring: remediation work has been completed and we are monitoring to determine if the Platform is responding as expected.
- Resolved: we are satisfied the platform is responding as expected. There may still be some secondary issues occurring, but if these do not meet the definition of an incident, we will close the incident and put these in a priority queue.
After the incident is resolved, the teams involved meet to conduct a postmortem to determine the cause, assign corrective actions and ensure learnings are carried forward.
What notification timeframe do we target?
- We communicate after we have determined that an operational event meets the definition of an incident.
- We aim to provide a balance between keeping customers informed, and avoiding sending unnecessary notifications.
What do each of the statuses mean on Statuspage?
System performance is as expected.
The Platform is unavailable.
A component is unavailable (but other components are still available).
The Platform or component is noticeably degraded (e.g. slow response).
We give advance notice of maintenance required over and above the usual scheduled maintenance time slots. Scheduled maintenance time slots are not advised via Statuspage notifications. Please see the weekly maintenance schedules.