🧇 Our New Floor Supervisor: Building an AI Code Review Agent with Claude Code (and a Hint of Lumon)

Written by Jen Andre | Apr 25, 2025 6:32:43 PM

Behind the Build: Maro Engineers at Work

Building can be messy. That’s half the fun! As we develop new software and explore what generative AI can do, we’re here to share the lessons, surprises, and code along the way. Explore hands-on guides, architecture deep dives, and practical tools from the Maro engineering team.

The Code Refinement Protocol

At Maro, we believe in the four principles of good code:

Clarity - Code should be readable without consulting your outie's notes (which you can't access anyway)

Efficiency - Work smarter, not harder (you'll still get your Waffle Party)

Compliance - Follow the standards, or risk a trip to the Break Room

Documentation - Remember, future you is like an outie—they don't remember what you were thinking

Milcheck ensures adherence to these principles, gently guiding developers toward the light of Kier's wisdom in every PR review.

Introduction

Every day, developers shuffle into their workstations, stare at pull requests, and methodically sort through lines of code like numbers waiting to be binned.

Sound familiar? Code reviews: necessary, but often as spiritually fulfilling as endless Macrodata Refinement on a Monday morning.

Here at Maro, we're big fans of leveraging AI tools to accelerate feature development, improve quality, squash bugs, generate docs, and generally make our severed work lives more bearable. Tools like GitHub Copilot, Roo Code, Cursor, and more have already made our lives much easier. AI Agents truly are game-changers!

There are several emerging agent-based tools specifically aimed at assisting with code reviews, integrated directly into GitHub (think CodeRabbit, Qodo.ai, Sourcegraph Cody, and so on). These solutions are great, but they often come with tradeoffs: they're typically subscription-based, pricey, and can lock you into specific platforms or workflows – almost like a Lumon contract.

Recently, Anthropic introduced Claude Code—an exciting, developer-focused tool that brings Claude's powerful reasoning capabilities directly to the command line.

After experimenting with Claude Code, we began to wonder: could this become an additional voice in our code review process? Perhaps a floor supervisor? Traditionally, teams rely on automated GitHub checks like running test suites, checking builds, or running linters to ensure code quality and security. But these tools often miss the bigger picture, the why behind the numbers.

What if we could introduce an AI-powered Code Review agent capable of deeper analysis—one that could actually understand whether code changes meet the functional requirements laid out in tickets, ensuring compliance without needing a trip to the Break Room?

Claude Code seemed ideally suited for this task. Thanks to its agentic architecture and CLI headless interface, it can not only read and interpret ticket specifications but also thoroughly analyze code changes to verify they meet those requirements. Plus, it can handle other crucial tasks typically covered by human reviewers, such as identifying potential security issues.

So we set out to integrate Claude Code directly into GitHub as an automated reviewer – our own Milchick 'milcheck' – kicking into action whenever a developer submits a pull request. Here's how we built our proof of concept.

How We Built the Integration

We built a GitHub Action workflow with two primary components:

Central workflow (claude-code-review.yml): Located in our organization’s global .github repository, this reusable workflow handles the heavy lifting.
Trigger workflow (milcheck-reviewer.yml): Included in each project's repository, it triggers on PR openings or specific comment phrases, like a well-timed announcement from management.

Here's the basic workflow:

A PR is opened or someone comments @milcheck review, activating the trigger workflow.
The trigger workflow calls the central Claude Code Review workflow, passing along PR details.
The central workflow then:
- Collects PR information (title, description, files changed)
- Generates a diff for all code changes
- Crafts a prompt instructing Claude to check our ticketing system (Shortcut) for related ticket specs. Here, we integrate to Shortcut using Anthropic’s Model Context Protocol to allow Claude Code to fetch ticket details, but you can imagine using any tools that have a MCP server.
- Executes Claude Code with our prompt instructions.
- Posts Claude's review as a structured comment on the PR

The key is in the prompt. We instruct Claude to:

Review PR title and description for context
Fetch any linked Shortcut ticket information, including requirements and acceptance criteria
Examine the code changes for bugs, security issues, and spec adherence
Present the findings in a clear, structured format

The reason this is possible is also due to Claude Code’s ‘non interactive’ mode (-p) combined with the flexibility of MCP. By configuring your MCP servers (and giving it just the permissions you want to enable with —-allowedTools ), you can enable Claude to execute a series of tasks to gather context for a richer analysis.

In our POC, this process is fully automated via GitHub Actions, requiring no manual intervention. And best of all, it’s completely configurable and customizable to your needs – no mysterious Eagan doctrines involved!

THE KIER EAGAN CODE MANIFESTO

"Let not your users be the first to find thy errors, for they lurk like goats in darkness."

- Kier Eagan, Founder of Clean Code

CODE IS TEMPERED THROUGH:

THE FOUR PRINCIPLES OF REFINEMENT • Detect errors like scared numbers • Optimize like happy numbers • Refactor like sad numbers • Document like wormy numbers

THE NINE VIRTUES OF PRODUCTION • Vigilance in testing • Diligence in reviewing • Humility in receiving feedback • [and six others too sacred to list]

Each pull request is a chance to serve Kier's vision. Each merge is a step toward the Revolving.

Real-World Example

Let's look at an actual Milcheck-generated review. A developer submitted a PR changing permissions on an API endpoint, including the relevant tickets related to their work in the PR and commits. Milcheck:

Fetched the diff and the relevant files that were changed
Confirmed the code aligned with the Shortcut ticket specs
Provided detailed feedback, highlighting clarity improvements, potential security considerations, and documentation updates
Raised thoughtful questions about permission consistency across related endpoints, in code the developer didn’t even change in the diff.

The magic here is really how Claude grabbed the Shortcut ticket mentioned in the PR description, read its requirements, and assessed whether the code changes fulfilled them. This moves the process from superficial checks into semantic analysis - something that historically required humans to do. That’s the power of using generative AI!

🧇 Bonus: Milchick Voice Mode! For added fun (or perhaps, added compliance), you can even trigger Milcheck to deliver its review in the distinctive voice of our favorite floor supervisor by commenting on your PR with: @milcheck review, praise Kier!

Enhancement Ideas

Expanding beyond tickets: looking at ADRs and PRD documentation referenced in other tools (like Notion or Google Docs) to get the full context of requirements for developer changes.
Altering the workflow to comment on specific areas of code in the Github diff (just like a human reviewer would!).
Refining the prompt to address our patterns and needs, e.g. ensuring the commits adhere specific conventions or patterns we use (aligning with the Kier Eagan principles of code, of course).
Turning this into a full fledge Github app to make it even easier to integrate.

THE SEVERED DEVELOPMENT METHODOLOGY

At Maro, we understand that your code has two lives - the writing (innie) and the maintenance (outie). Our code standards ensure neither suffers:

WRITING STANDARDS (INNIE)

Write as if you'll never remember writing this code (because you won't)

Comment everything that future-you would question

Assume every variable could trigger a Break Room visit if misnamed

Test thoroughly - your outie doesn't want weekend emergency calls— or Overtime Contingency Protocols

MAINTENANCE STANDARDS (OUTIE)

Approach each file as if seeing it for the first time

Trust, but verify, all documentation left by your innie

Remember: refactoring is just another form of refinement

Celebrate successful builds with your own personal Waffle Party

Milcheck bridges the gap between your innie and outie developer selves, maintaining continuity where severance creates division.

Benefits of Rolling Your Own AI Code Reviewer with Claude Code

Why adopt something like this?

Instant feedback: Developers get immediate, thoughtful reviews without waiting for a human reviewer to find time. Less waiting, more refining!
Context-aware reviews: When you let Claude access your ticketing system and other documentation, it understands not just the code but the requirements behind it.
Consistent quality: Every PR gets the same level of attention, regardless of size or timing. Praise Kier!
Cost-effective: No expensive SaaS subscriptions – you pay only for the Claude API usage you need.
Fully customizable: Tailor the review focus to your team's specific needs by modifying the prompt.
Works with your tools: Built on GitHub Actions and integrates with your ticketing systems (and you can plug in any other tools with the power of MCP!)

Of course, there’s also nothing wrong with adopting a commercial SaaS tool to assist with code reviews. Choose the path that feels right for your 'innie'.

Humans Still Gotta Be In The Loop

This won't replace human code reviews (and we wouldn't want it to). Generative AI isn’t perfect, and the human element of knowledge sharing, mentoring, and collaboration is still incredibly valuable. But Claude provides another set of eyes that can catch issues early and ensure that code changes align with requirements. It's like having a tireless junior reviewer (or a well-programmed MDR employee) who's always available and never gets tired of checking for edge cases.

Want to build your own AI-powered code review agent? We've shared the core components in a Github repo. Take this proof of concept and adapt it to your own workflow and ticketing systems.

Remember—the numbers may be mysterious, but your code doesn't have to be. Your inner developer will thank you, even if your outie never knows about the improved productivity.

Happy coding! And remember to enjoy each line equally.

[milcheck]

View full post