How we define an incident and when we communicate
What is an incident?
An incident is an operational event requiring a timely response because the platform is unavailable or experiences a performance degradation affecting key functionality for a significant number of customers.
An incident will typically have the following stages:
- Investigating: our teams are actively investigating but the cause has not yet been identified.
- Identified: the cause and remediation action have been identified.
- Monitoring: remediation work has been completed and we are monitoring to determine if the Platform is responding as expected.
- Resolved: we are satisfied the platform is responding as expected. There may still be some secondary issues occurring, but if these do not meet the definition of an incident, we will close the incident and put these in a priority queue.
After the incident is resolved, the teams involved meet to conduct a postmortem to determine the cause, assign corrective actions and ensure learnings are carried forward.
What notification timeframe do we target?
- We communicate after we have determined that an operational event meets the definition of an incident.
- We aim to provide a balance between keeping customers informed, and avoiding sending unnecessary notifications.