LogoLogo
  • Getting started
    • Welcome
    • Introduction to Multitudes
    • How Multitudes Works
  • Configuration & Setup
    • Setup: Integration Permissions
    • Permissions and roles
    • Adding Users & Teams
    • Configuring your Team
    • User Linking
    • Configuring Working Hours
    • Customize Work Categories
    • Alerts Configuration
    • Customize Targets
  • Metrics & Definitions
    • Multitudes Insights
    • Our Approach to Metrics
    • Process Metrics
      • Flow of Work
        • Change Lead Time
        • Coding Time
        • Review Wait Time
        • Editing Time
        • Deploy Time
        • PR Size
        • Focus Time
      • Value Delivery
        • Deployment Frequency
        • Merge Frequency
        • Types of Work
        • Feature vs Maintenance Work
      • Quality of Work
        • Change Failure Rate
        • Mean Time to Recovery
        • Mean Time to Acknowledge
        • Number of Pages
        • Deployment Failure Rate
    • People Metrics
      • Wellbeing
        • Out-of-Hours Work
        • Page Disruptions
        • Meeting Load
      • Collaboration
        • PR Participation Gap
        • PR Feedback Given
        • PR Feedback Received
        • Feedback Flows
        • Feedback Quality
    • Deployment Metrics
  • Integrations
    • Deployments API
    • GitHub Actions
    • Google Calendar
    • Outlook Calendar
    • Jira
    • Linear
    • Opsgenie
    • PagerDuty
    • Slack
  • Knowledge base
    • Annotations
    • Exporting your data
    • Types of Alerts
      • Daily Blocked PRs alert
      • Trend Summary alert
      • Multitudes AI Coach
      • 1:1 Prompts
      • Annotations alert
    • Troubleshooting Missing Commits
    • Bot Activity
    • Collaborative PRs & All PRs Toggles
  • Account Management
    • Billing & Payments
    • Security & Privacy
Powered by GitBook

© Multitudes 2025

On this page
  • What it is
  • Why it matters
  • How we calculate it
  • What good looks like
  • Provide Feedback on Model Predictions
  • Research on Feedback Quality and Code Review

Was this helpful?

  1. Metrics & Definitions
  2. People Metrics
  3. Collaboration

Feedback Quality

Learn how we use AI and machine learning to surface insights about the quality of feedback in your team's code reviews.

PreviousFeedback FlowsNextDeployment Metrics

Last updated 2 days ago

Was this helpful?

What it is

This shows the overall quality of feedback given in code reviews. We analyze all code review comments (excluding the PR author's own comments) and classify them into quality categories based on how constructive and actionable the feedback is.

Why it matters

Understanding your team's feedback quality helps you create more inclusive review processes and identify opportunities to improve collaboration.

How we calculate it

We analyze feedback given in code reviews using Multitudes's AI models that have been specifically designed to mitigate algorithmic biases and are grounded in research. First, we find each set of feedback – we group all the comments that a commenter did on someone’s PR into one set. The reason we do this is because it’s typical for the first review on a PR to be more detailed than follow-up reviews, so bringing in the full set of comments gives our model more context.

We then classify feedback into four quality categories:

  • Highly Specific: Detailed, actionable feedback that clearly explains what needs to change and provides clear reasoning. This also includes thought-provoking comments that share knowledge or offer alternative approaches.

  • Neutral: Moderately detailed feedback that provides some guidance but could be more comprehensive

  • Unspecific: Vague comments that don't provide clear direction for improvement

  • Negative: Feedback that may come across as harsh, dismissive, or potentially harmful.

We look at patterns across your team's code review conversations to surface insights about collaboration dynamics and feedback culture. If you notice any incorrect model classifications please provide feedback using the 🚩button in the drilldown table.

Understanding Negative Feedback

When feedback is classified as "negative", we also identify the specific reasons based on established research on what makes code review feedback destructive:

Negativity Reasons:

  • Personal Attack: Feedback that targets the person rather than the code

  • Vague Criticism: Critical feedback without clear suggestions for improvement

  • Judgmental: Feedback with a judgmental or condescending tone

  • Harsh Language: Use of inconsiderate or unnecessarily harsh language

  • Excessive Nitpicking: Repeated focus on minor issues without addressing bigger picture concerns

  • Negative Emojis: Overuse of negative emojis that create a hostile tone

  • Terseness: Overly brief feedback that comes across as dismissive

This granular analysis helps teams understand not just that negative feedback occurred, but what specific patterns to address in their code review culture.

What good looks like

Feedback quality patterns vary significantly based on team dynamics, code review practices, and organizational culture. That said, based on our initial analysis across Multitudes customers, teams typically have 15% highly specific feedback, 20% unspecific feedback, 25% minimal feedback and less than 2% negative feedback.

We recommend teams aim for:

  • 20%+ highly specific feedback because this provides clear, actionable guidance

  • Zero negative feedback because this can be destructive for team inclusion and performance.

  • <30% minimal feedback, because some quick approvals are normal, but excessive rates suggest insufficient review depth.

  • Equitable distribution across the team, with all team members receiving similar quality of feedback and no one getting significantly less specific feedback.”

Use these insights in 1:1s, retros, and team discussions to foster a more supportive and effective code review culture.

Provide Feedback on Model Predictions

Your feedback helps us refine our models and reduce algorithmic bias, ensuring better insights for your teams. We review all flagged predictions and use it to make the feedback quality analysis more reliable for everyone using Multitudes.

Note that our language model does better with domain-specific knowledge in English than in other languages, so we’d especially love to hear from Multitudes users who write reviews regularly in other languages.

Research on Feedback Quality and Code Review

The quality of feedback in code reviews directly impacts team psychological safety (see ), , and of course, . Research consistently shows significant feedback gaps across different groups in the workplace (see ). According to , women are more than 20% less likely than men to say their manager gave them critical feedback that contributed to their growth. Additionally, a found that Black and LatinX employees are more likely to receive feedback about their "personality" than the actual quality of their job performance.

Minimal: Short reviews like "LGTM " or "Shippit !!" that provide minimal guidance. These reviews are commonly referred to as "rubber-stamp" code reviews. Note: PR Feedback where it was a code review with no comments are not included in this.

Multitudes is actively conducting research to identify what percentages of feedback quality correlate with high-performing teams. As we gather more data and insights, these benchmarks will be updated to provide more precise targets for healthy feedback patterns. Learn more about our .

We're continuously improving our Multitudes's AI models to ensure accurate classification. If you've noticed a comment is misclassified — such as constructive feedback labeled as Negative or vague comments marked as Highly Specific, please use the flag button in the drill down table to report it.

The design of this feature is informed by extensive research on feedback quality and code review dynamics and alongside collaboration with our academic partners (see ). Some key studies and articles we recommend users read include:

👍
🔥
🚩
Belschak et al.
learning outcomes
code quality
Gunawardena et al.
Lean In and McKinsey & Co
recent survey by Textio
original research here
Multitudes Research
Destructive Criticism in Software Code Review Impacts
Predicting developers' negative feelings about code review
Unlearning toxic behaviors in a code review culture
Expectations, outcomes, and challenges of modern code review
Negative Effects of Destructive Criticism: Impact on Conflict, Self-Efficacy, and Task Performance
Process Aspects and Social Dynamics of Contemporary Code Review: Insights from Open Source Development and Industrial Practice at Microsoft
Detecting interpersonal conflict in issues and code review: cross pollinating open- and closed-source approaches
Towards Unmasking LGTM Smells in Code Reviews: A Comparative Study of Comment-Free and Commented Reviews
Research and Insights into Negative and Destructive Criticism
Stacked bar chart showing percentage of feedback given on PRs grouped by specific, neutral, unspecific, and minimal review.