Feedback Quality
Learn how we use AI and machine learning to surface insights about the quality of feedback in your team's code reviews.
Last updated
Was this helpful?
Learn how we use AI and machine learning to surface insights about the quality of feedback in your team's code reviews.
Last updated
Was this helpful?
This shows the overall quality of feedback given in code reviews. We analyze all code review comments (excluding the PR author's own comments) and classify them into quality categories based on how constructive and actionable the feedback is.
Understanding your team's feedback quality helps you create more inclusive review processes and identify opportunities to improve collaboration.
We analyze feedback given in code reviews using Multitudes's AI models that have been specifically designed to mitigate algorithmic biases and are grounded in research. First, we find each set of feedback – we group all the comments that a commenter did on someone’s PR into one set. The reason we do this is because it’s typical for the first review on a PR to be more detailed than follow-up reviews, so bringing in the full set of comments gives our model more context.
We then classify feedback into four quality categories:
Highly Specific: Detailed, actionable feedback that clearly explains what needs to change and provides clear reasoning. This also includes thought-provoking comments that share knowledge or offer alternative approaches.
Neutral: Moderately detailed feedback that provides some guidance but could be more comprehensive
Unspecific: Vague comments that don't provide clear direction for improvement
Negative: Feedback that may come across as harsh, dismissive, or potentially harmful.
We look at patterns across your team's code review conversations to surface insights about collaboration dynamics and feedback culture. If you notice any incorrect model classifications please provide feedback using the 🚩button in the drilldown table.
When feedback is classified as "negative", we also identify the specific reasons based on established research on what makes code review feedback destructive:
Personal Attack: Feedback that targets the person rather than the code
Vague Criticism: Critical feedback without clear suggestions for improvement
Judgmental: Feedback with a judgmental or condescending tone
Harsh Language: Use of inconsiderate or unnecessarily harsh language
Excessive Nitpicking: Repeated focus on minor issues without addressing bigger picture concerns
Negative Emojis: Overuse of negative emojis that create a hostile tone
Terseness: Overly brief feedback that comes across as dismissive
This granular analysis helps teams understand not just that negative feedback occurred, but what specific patterns to address in their code review culture.
Feedback quality patterns vary significantly based on team dynamics, code review practices, and organizational culture. That said, based on our initial analysis across Multitudes customers, teams typically have 15% highly specific feedback, 20% unspecific feedback, 25% minimal feedback and less than 2% negative feedback.
We recommend teams aim for:
20%+ highly specific feedback because this provides clear, actionable guidance
Zero negative feedback because this can be destructive for team inclusion and performance.
<30% minimal feedback, because some quick approvals are normal, but excessive rates suggest insufficient review depth.
Equitable distribution across the team, with all team members receiving similar quality of feedback and no one getting significantly less specific feedback.”
Use these insights in 1:1s, retros, and team discussions to foster a more supportive and effective code review culture.
Your feedback helps us refine our models and reduce algorithmic bias, ensuring better insights for your teams. We review all flagged predictions and use it to make the feedback quality analysis more reliable for everyone using Multitudes.
Note that our language model does better with domain-specific knowledge in English than in other languages, so we’d especially love to hear from Multitudes users who write reviews regularly in other languages.
The quality of feedback in code reviews directly impacts team psychological safety (see ), , and of course, . Research consistently shows significant feedback gaps across different groups in the workplace (see ). According to , women are more than 20% less likely than men to say their manager gave them critical feedback that contributed to their growth. Additionally, a found that Black and LatinX employees are more likely to receive feedback about their "personality" than the actual quality of their job performance.
Minimal: Short reviews like "LGTM " or "Shippit !!" that provide minimal guidance. These reviews are commonly referred to as "rubber-stamp" code reviews. Note: PR Feedback where it was a code review with no comments are not included in this.
Multitudes is actively conducting research to identify what percentages of feedback quality correlate with high-performing teams. As we gather more data and insights, these benchmarks will be updated to provide more precise targets for healthy feedback patterns. Learn more about our .
We're continuously improving our Multitudes's AI models to ensure accurate classification. If you've noticed a comment is misclassified — such as constructive feedback labeled as Negative
or vague comments marked as Highly Specific
, please use the flag button in the drill down table to report it.
The design of this feature is informed by extensive research on feedback quality and code review dynamics and alongside collaboration with our academic partners (see ). Some key studies and articles we recommend users read include: