Mean Time to Acknowledge
Last updated
Was this helpful?
Last updated
Was this helpful?
Note: this metric is only shown at a team level, not an individual level.
What it is: This measures how long it takes an organization to acknowledge a new incident in production. You will need to integrate with OpsGenie or PagerDuty to get this metric.
Why it matters: This metric indicates the responsiveness of your systems. A higher Mean Time to Acknowledgement
increases the risk of app downtime, as it meant your teams and systems are taking longer to detect a failure in production. This results in a less reliable service or product for your customers. It can also impact flow of work elsewhere, since more time being taken up fixing outages, ultimately impacting your organization's ability to deliver value to customers. It is a subset of Mean Time to Recovery
.
How we calculate it: An incident's Time to Acknowledgement
is calculated as the time from when an incident first fires off, to the first acknowledgement by a team member. See below for more details on how this is calculated for each IMS platform.
For MTTA, the times are averaged over the selected date range, for each cadence (e.g. weekly, monthly). The line chart series are grouped by Multitudes team for Opsgenie, and Service or Escalation policy for PageDuty.
Incidents that have not been acknowledged are not included in the data. This means that if many of your incidents are resolved without getting acknowledged, then your data may look sparse.
Time to Acknowledgement = he time from the first incident.triggered
event* to the first incident.acknowledged
event. We attribute the incident to the team(s) of the acknowledger. This is how we determine whether to show an incident based on the team filters at the top of the page**.
*If a trigger event can not be found, we default to the incident's created date. This is the case for historical data (the data shown when you first onboard).
Also, in historical data, the acknowledger is assumed to be the user who last changed the incident status.
**If an incident was acknowledged by a bot, here's how they are shown in the data:
Incidents acknowledged by bot, with no assignee in its history: only shown when the Teams
filter at the top of the page is set to showing the whole organization.
Incidents acknowledged by bot, with an assignee who is a Multitudes contributor: shown & attributed to the team(s) of that assignee. If there are multiple assignees, or there were multiple assignees throughout the history of the incident (e.g. it was reassigned), we take the last assignee(s)' team(s).
Incidents acknowledged by a Multitudes contributor: shown & attributed to the team(s) of the acknowledger.
Incidents acknowledged by a user who’s not a contributor: not shown.
What good looks like
From looking into the SLAs for P1 incidents of various organizations, and our own research on typical acknowledgements times within our own data, we've found that acknowledgement within 15 minutes of an incident being raised is a good target to aim for.