Feedback Themes

Learn how we use AI and machine learning to surface insights about common topics that appear in your team's code reviews.

Stacked bar chart showing percentage of feedback given on PRs grouped by themes like Code Quality, Code Structure, Documentation and Risk/Error Mitigation.

What it is

This shows the feedback themes that appear in code reviews. We analyze all code review comments (excluding the PR author's own comments) and classify them into themes. Within each theme, we also classify each set of feedback into an underlying code, which provides more specificity about what the feedback covered. The themes and codes are grouped together in an underlying thematic map – read below for more about that.

Why it matters

Identifying common themes in our code reviews can help us identify systemic areas of improvement across our engineering team and processes, and it can help us identify coaching opportunities for specific team members.

We all know how important code reviews are for good outcomes – when they're done well, they're linked to:

However, poor-quality code reviews can actually introduce more bugs – contrary to one of the main motivations for conducting reviews in the first place.

How we calculate themes

We analyze feedback given in code reviews using Multitudes's AI models that have been specifically designed to mitigate algorithmic biases and are grounded in research. Our approach builds on the methodology shared by Mathias et al — of applying thematic analysis (see Braun et al.) with language models.

Our two-step classification process:

  1. First, our model identifies high-level themes in code review feedback (note: one unit of feedback can consist of multiple themes)

  2. Then, we use these themes as context to predict specific research codes—the detailed topics discussed in your team's reviews. The predicted codes are visible in the drill down tables, which also show examples of each theme and code in your team's code reviews (see below).

Within the drill-down the codes are ordered by how much of the feedback is related to the predicted code. We also exclude all Minimal Reviews from our analysis. If you want to see the amount of Minimal Review happening, refer to our Feedback Quality feature.

Stacked bar chart showing percentage of feedback given on PRs, showing the person has received the most feedback on “Risk/Error Mitigation”.

Altogether, our model classifies the set of themes and codes that appear in each code review. The model selects these from 15 themes and 45 research codes which are linked together via a thematic map.

What is a thematic map?

In qualitative research, a thematic map is a visual representation of the relationships between different themes and codes derived from qualitative data. It's a way to organize and display the key ideas and their connections, helping researchers to understand and interpret the data more effectively. To learn more, read the original paper by Braun et al.

How do we measure model performance?

Because this model outputs are both hierarchical and multi-label, evaluation and accuracy metrics aren't straightforward. We implemented several complementary evaluation metrics:

  • Basic classification metrics (F-score, precision, recall at each hierarchy level)

  • Wu-Palmer similarity and edge-distance metrics (accounting for hierarchical relationships between themes)

We prioritize minimizing cross-theme errors (e.g., labeling "Security" as "Documentation") over within-theme errors (e.g., "Code Optimization" vs "Code Readability"). All metrics are validated against our human-labeled ground truth dataset which was labelled by domain experts.

Some examples of themes and code that our model can classify your code reviews as containing:

  • Code Structure: Code Organization, Code Reusability

  • Risk/Error Management: Error management, Edge Case Handling, Technical Debt Management, Bug Prevention

  • Knowledge Sharing and Growth: Knowledge Sharing, Implementation Suggestions

  • Deployment Ops: Deployment Process, Environment Configuration, Release Management, Dependencies

  • Code Quality: Code Redundancy, Code Simplification, Code Fix, Code Optimization

Multitudes is actively conducting research to identify what breadth of topics present in feedback themes correlates with high-performing teams. As we gather more data and insights, these benchmarks will be updated to provide more precise targets for healthy feedback patterns. Learn more about our original research here.

Provide Feedback on Model Predictions

We're continuously improving our Multitudes AI models to ensure accurate classification. If you believe that your feedback has been misclassified — for example, reviews being mislabeled for an incorrect different theme or code – please use the 🚩 (the red flag) button in the drill down table to report it.

Your feedback helps us refine our models and reduce algorithmic bias, ensuring better insights for your teams. We review all flagged predictions and use it to make the feedback themes analysis more reliable for everyone using Multitudes.

Research on Feedback Themes and Code Review

The design of this feature is informed by extensive research on feedback themes and code review dynamics and alongside collaboration with our academic partners (see Multitudes Research). Some key studies and articles we recommend users read include:

Our Feedback Quality feature was also informed by complementary research to do with code review processes and standards – you can read more about those in that feature's documentation linked here.

Last updated

Was this helpful?