Feedback Themes
Learn how we use AI and machine learning to surface insights about common topics that appear in your team's code reviews.

What it is
This shows the feedback themes that appear in code reviews. We analyze all code review comments (excluding the PR author's own comments) and classify them into themes. Within each theme, we also classify each set of feedback into an underlying code, which provides more specificity about what the feedback covered. The themes and codes are grouped together in an underlying thematic map – read below for more about that.
Why it matters
Looking for common themes in our code reviews can help us identify systemic areas of improvement across our engineering team, and it can help us identify coaching opportunities for specific team members.
We all know how important code reviews are for good outcomes – when they're done well, they're linked to:
Better codebases: Higher code quality, fewer defects, and more maintainable-code
Happier people: More knowledge-sharing, more learning, more psychological safety, and less turnover
Developers consider code review an integral part of maintaining the quality of a code base. Improved knowledge-sharing as a part of code review also leads to less waste during the development process, allowing team members to work more collaboratively and to maintain overall code base quality. However, poor quality code review can actually introduce more bugs – contrary to one of the main motivations for conducting reviews in the first place.
Essentially, investing time into carrying out good code reviews means your people will work better and have a codebase that lets them move faster – the ultimate win-win.
Going back to themes: Seeing what topics are coming up in code reviews (or not) can help us see where the knowledge-sharing and learning is concentrated on our teams, and where we might want to spread more knowledge.
We also know that there's bias in the types of feedback that people in marginalized groups. Textio research showed that women and people of color get worse-quality feedback and more negative stereotyping in feedback. Research on code reviews in particular showed that destructive criticism is common (half of their respondents had received some in the last year) and has a larger negative impact on women (see Gunawardena et al.) than on men. These are even more reasons why it's important to look at the patterns in code reviews.
Standardizing the feedback that team members receive is an important step in closing those gaps and working towards impactful code reviews – making sure that everyone gets feedback to write great code and grow as a developer.
How we calculate it
We analyze feedback given in code reviews using Multitudes's AI models that have been specifically designed to mitigate algorithmic biases and are grounded in research. First, our model first identifies high-level themes in code review feedback. Note that a single piece of feedback can have multiple themes; the themes are not mutually exclusive. Then we use the identified themes as context to predict research codes – these are the specific topics that are talked about across your team in code reviews.
The research codes are visible in the drill down tables, which also show examples of each theme and code in your team's code reviews. We experimented with a variety of other classification methods but decided on this two-step approach because it mirrors human cognitive processes – we first understand the general topic, then progress to specifics. Finally, our insights provide reflection questions and conversation-starters for use in 1:1s, retros, and team discussions to foster a more effective code review culture.

Altogether, our model identifies the 15 themes and 45 codes that appear most commonly in code reviews. The thematic map below shows how they link up. We developed this thematic map based on our analysis and using academic methods like thematic analysis (see for example Braun et al).

Because this model outputs a hierarchical multi-label output, evaluation isn't straightforward. For that reason, we implemented several complementary evaluation metrics, including basic classification metrics, edge distance measurements, Wu-Palmer Similarity Metrics, and Match metrics. We compared the model's outputs to our ground-truth (human-labeled) dataset to ensure that we had enough accuracy.
Some examples of themes and code that our model can classify your code reviews as containing:
Code Structure: Code Organization, Code Reusability
Risk/Error Management: Error management, Edge Case Handling, Technical Debt Management, Bug Prevention
Knowledge Sharing and Growth: Knowledge Sharing, Implementation Suggestions
Deployment Ops: Deployment Process, Environment Configuration, Release Management, Dependencies
Code Quality: Code Redundancy, Code Simplification, Code Fix, Code Optimization
Provide Feedback on Model Predictions
We're continuously improving our Multitudes AI models to ensure accurate classification. If you believe that your feedback has been misclassified — for example, reviews being mislabeled for an incorrect different theme or code – please use the 🚩 (the red flag) button in the drill down table to report it.
Your feedback helps us refine our models and reduce algorithmic bias, ensuring better insights for your teams. We review all flagged predictions and use it to make the feedback themes analysis more reliable for everyone using Multitudes.
Research on Feedback Themes and Code Review
The design of this feature is informed by extensive research on feedback themes and code review dynamics and alongside collaboration with our academic partners (see Multitudes Research). Some key studies and articles we recommend users read include:
Our Feedback Quality feature was also informed by complementary research to do with code review processes and standards – you can read more about those in that feature's documentation linked here.
Last updated
Was this helpful?