LogoLogo
  • Getting started
    • Welcome
    • Introduction to Multitudes
    • How Multitudes Works
  • Configuration & Setup
    • Setup: Integration Permissions
    • Permissions and roles
    • Adding Users & Teams
    • Configuring your Team
    • User Linking
    • Configuring Working Hours
    • Customize Work Categories
    • Alerts Configuration
    • Customize Targets
  • Metrics & Definitions
    • Multitudes Insights
    • Our Approach to Metrics
    • Process Metrics
      • Flow of Work
        • Change Lead Time
        • Coding Time
        • Review Wait Time
        • Editing Time
        • Deploy Time
        • PR Size
        • Focus Time
      • Value Delivery
        • Deployment Frequency
        • Merge Frequency
        • Types of Work
        • Feature vs Maintenance Work
      • Quality of Work
        • Change Failure Rate
        • Mean Time to Recovery
        • Mean Time to Acknowledge
        • Number of Pages
        • Deployment Failure Rate
    • People Metrics
      • Wellbeing
        • Out-of-Hours Work
        • Page Disruptions
        • Meeting Load
      • Collaboration
        • PR Participation Gap
        • PR Feedback Given
        • PR Feedback Received
        • Feedback Flows
        • Feedback Quality
    • Deployment Metrics
  • Integrations
    • Deployments API
    • GitHub Actions
    • Google Calendar
    • Outlook Calendar
    • Jira
    • Linear
    • Opsgenie
    • PagerDuty
    • Slack
  • Knowledge base
    • Annotations
    • Exporting your data
    • Types of Alerts
      • Daily Blocked PRs alert
      • Trend Summary alert
      • Multitudes AI Coach
      • 1:1 Prompts
      • Annotations alert
    • Troubleshooting Missing Commits
    • Bot Activity
    • Collaborative PRs & All PRs Toggles
  • Account Management
    • Billing & Payments
    • Security & Privacy
Powered by GitBook

© Multitudes 2025

On this page

Was this helpful?

  1. Metrics & Definitions
  2. Process Metrics
  3. Quality of Work

Mean Time to Recovery

PreviousChange Failure RateNextMean Time to Acknowledge

Last updated 4 months ago

Was this helpful?

⭐️ This metric is one of the team.

You can see all 4 key DORA metrics on the of the Multitudes app.

Note: this metric is only shown at a team level, not an individual level.

Mean Time to Recovery graph

How we calculate it: We take a mean of the recovery times for the incidents that occurred in the selected date range, for the selected cadence (e.g. weekly, monthly). The line chart series are grouped by Multitudes team for Opsgenie, and Service or Escalation policy for PageDuty.

The recovery time is calculated as follows: On OpsGenie: the time from when an incident was opened to when it was closed. On PagerDuty: the time from the first incident.triggered event* to the first incident.resolved event. We attribute the incident to the team(s) of the resolver; this is the user who triggered the first incident.resolved event. This is how we determine whether to show an incident based on the team filters at the top of the page**.

*If a trigger event can not be found, we default to the incident's created date. This is the case for historical data (the data shown when you first onboard).

Also, in historical data, the resolver is assumed to be the user who last changed the incident status; you can't un-resolve an incident, so for resolved incidents this can be assumed to be the responder.

**If an incident was resolved by a bot, here's how they are shown in the data:

  • Incidents resolved by bot, with no assignee in its history: only shown when the Teams filter at the top of the page is set to showing the whole organization.

  • Incidents resolved by bot, with an assignee who is a Multitudes contributor: shown & attributed to the team(s) of that assignee. If there are multiple assignees, or there were multiple assignees throughout the history of the incident (e.g. it was reassigned), we take the last assignee(s)' team(s).

  • Incidents resolved by a Multitudes contributor: shown & attributed to the team(s) of the resolver.

  • Incidents resolved by a user who’s not a contributor: not shown.

What good looks like

DORA research shows that elite performing teams have a Mean Time to Recovery of less than 1 hour.

What it is: This is our take on Mean Time to Recovery metric. It's a measure of how long it takes an organization to recover from an incident or failure in production. You will need to or to get this metric.

Why it matters: This metric indicates the stability of your teams’ software. A higher Mean Time to Recovery increases the risk of app downtime. This can further result in a higher Change Lead Time due to more time being taken up fixing outages, and ultimately impact your organization's ability to deliver value to customers. In (author of DORA and SPACE), high performing teams had the lowest times for Mean Time to Recovery. The study also highlights the importance of organizational culture in maintaining a low Mean Time to Recovery.

DORA's
integrate with OpsGenie
PagerDuty
this study by Nicole Forsgren
4 Key Metrics published by Google's DevOps Research and Assessment (DORA)
DORA Metrics page