How Ramp Engineers Use Codex to Cut Code Review Time

How Ramp Engineers Use Codex to Cut Code Review Time

Code review is one of those things every engineering team says they take seriously — and one of the first things that gets deprioritized when sprints get tight. At Ramp, the corporate finance platform that’s grown into one of the fastest-scaling fintech companies in the US, engineers were hitting a familiar wall: waiting hours for a colleague to look at a pull request, or shipping changes with less scrutiny than anyone was comfortable with. Their fix? Hand that first pass to OpenAI Codex, running on GPT-5.5. According to OpenAI’s writeup on the Ramp deployment, the team is now getting substantive code review feedback in minutes instead of hours — and that gap is changing how they ship software.

Why Code Review Became a Bottleneck Worth Solving With AI

To understand why this matters, you need to appreciate what code review actually costs at scale. It’s not just latency. When a senior engineer spends 45 minutes reviewing a PR, that’s 45 minutes not spent on architecture decisions, debugging, or mentoring. Multiply that across a fast-growing engineering org and you’re looking at a meaningful drag on output — even before you account for the context-switching costs of being pulled out of deep work to give feedback.

Ramp has been scaling aggressively. The company processes billions in transactions annually and has been expanding its product surface area — expense management, bill pay, accounting integrations, corporate cards — at a pace that demands a lot from its engineering team. The traditional model of “wait for a human to review your code” simply doesn’t keep up with that velocity.

This isn’t a new problem. GitHub Copilot has offered inline suggestions for a while, and tools like CodiumAI have attacked the testing side of this equation. But what Ramp is describing is different: using Codex as a genuine reviewer that understands context, catches logic issues, flags security concerns, and gives actionable feedback — not just autocomplete on steroids.

What Ramp Is Actually Doing With Codex

The specifics here are what make this interesting. Ramp’s engineers aren’t just running Codex as a linter or a style checker. They’re using it to do the kind of review work that previously required a senior engineer’s judgment — catching edge cases, questioning architectural decisions, identifying potential performance problems.

Here’s what the Codex-powered review workflow looks like in practice at Ramp:

  • Automated first-pass review: When a PR is opened, Codex analyzes the diff in context of the broader codebase and surfaces issues before a human ever looks at it.
  • Substantive feedback, not just style: The system flags logic errors, potential null pointer exceptions, missing error handling, and security concerns — not just whether a variable is named correctly.
  • Faster iteration cycles: Engineers can address Codex’s feedback and resubmit within the same session, tightening the feedback loop dramatically.
  • Knowledge transfer at scale: Junior engineers get senior-level feedback without waiting for a senior engineer to have bandwidth — effectively democratizing code quality guidance across the team.
  • GPT-5.5 as the reasoning engine: The jump from earlier models is significant here. GPT-5.5 can hold more context, reason more reliably about complex logic flows, and produce feedback that engineers actually trust.

That last point deserves emphasis. Trust is everything in this context. If engineers feel like they’re getting generic, low-confidence suggestions they have to second-guess, they’ll stop using the tool. The fact that Ramp has made this a core part of their workflow suggests the quality bar is high enough to be genuinely useful.

GPT-5.5: The Model That Makes This Work

It’s worth being specific about why GPT-5.5 matters here. Earlier versions of Codex were capable, but they struggled with large context windows — reviewing a complex PR that touches multiple files across a codebase is a hard problem for a model that loses the thread after a few thousand tokens. GPT-5.5 handles this significantly better, which is what allows Codex to do real structural analysis rather than just line-by-line syntax checking.

OpenAI hasn’t published a detailed technical spec sheet for GPT-5.5 in the same way some competitors release model cards, but the practical evidence from deployments like Ramp’s suggests meaningful improvements in code comprehension and multi-step reasoning — the two capabilities that matter most for review tasks.

How This Compares to Other AI Coding Tools

The competitive context here is worth laying out. GitHub Copilot (which runs on OpenAI models but is packaged and distributed by Microsoft) has a code review feature that’s been in development, but it’s primarily focused on inline suggestions rather than asynchronous PR review. Anthropic’s Claude has been used by developers informally for code review, and its large context window makes it genuinely capable — but it’s not integrated into CI/CD workflows the way Codex can be. Google’s Gemini, as we covered in our piece on Google I/O 2026 and the agentic Gemini era, is building toward similar developer workflow integration, but enterprise adoption at the depth Ramp is describing is still emerging there.

What Ramp has done is essentially build Codex into the connective tissue of their engineering process — not as an optional helper developers can consult, but as a required step in the review pipeline. That’s a different bet than most teams are making.

What This Means for Engineering Teams Watching From the Sidelines

If you’re an engineering leader looking at what Ramp is doing and wondering whether it translates to your team, the honest answer is: probably yes, but the implementation details matter enormously.

The Ramp case study is notable partly because Ramp is a well-resourced company with engineers who are comfortable experimenting with new tooling. Smaller teams or more risk-averse engineering cultures might face adoption friction that Ramp didn’t have to deal with. There’s also the question of codebase specificity — Codex needs enough context about your particular conventions, frameworks, and standards to give feedback that’s actually aligned with how your team works, not just generically correct.

That said, the economics are hard to argue with. If a single senior engineer spends even three hours a week doing first-pass code review that an AI system could handle with reasonable accuracy, you’re looking at meaningful reclaimed capacity. Across a team of 50 engineers, that math becomes significant quickly.

We’ve seen similar patterns play out in other functions. Our earlier coverage of how data science teams are using Codex for real work and Sea Limited’s AI-native engineering bet shows the same thread running through: the teams getting the most value aren’t using AI as a toy or a curiosity — they’re rebuilding workflows around it.

The Risks Nobody Talks About

There’s a real concern worth naming here. If engineers start treating Codex’s approval as sufficient and human review becomes a rubber stamp, you’ve potentially introduced a new failure mode. AI systems can confidently miss things that an experienced human would catch — particularly around business logic that requires understanding of domain context that isn’t expressed in the code itself. Ramp is presumably aware of this, but teams adopting this approach should be deliberate about where human review remains mandatory.

There’s also a skill development angle. Junior engineers learning to code partly by getting feedback from seniors — that’s a mentorship function, not just a quality function. If most feedback now comes from Codex, does that change how junior engineers develop judgment? I think it probably does, and not necessarily in ways we can predict yet.

Key Takeaways

  • Ramp is using OpenAI Codex with GPT-5.5 to automate first-pass code review, cutting feedback time from hours to minutes.
  • The workflow goes beyond syntax checking — Codex is catching logic errors, edge cases, and security issues that require genuine reasoning.
  • GPT-5.5’s improved context handling is what makes large, multi-file PR reviews tractable in a way earlier models couldn’t manage.
  • This is part of a broader pattern: engineering teams at scale (Sea Limited, Ramp, others) are embedding Codex into mandatory workflow steps, not treating it as optional tooling.
  • Adoption requires deliberate implementation — teams need to calibrate where AI review is sufficient and where human judgment remains essential.

Frequently Asked Questions

What exactly is OpenAI Codex doing in Ramp’s code review process?

Codex is acting as an automated first reviewer on pull requests, analyzing code changes in context and producing feedback on logic errors, security issues, edge cases, and architectural concerns. It’s not replacing human review entirely — it’s handling the initial pass so human reviewers can focus on higher-level judgment calls.

Is this available to other companies, or is it a custom Ramp integration?

Codex is available through OpenAI’s API and can be integrated into CI/CD pipelines by engineering teams. Ramp has built their own integration on top of that foundation, but the underlying capability is accessible to any team willing to do the integration work.

How does Codex with GPT-5.5 compare to GitHub Copilot for code review?

GitHub Copilot is primarily optimized for inline code completion and generation, with code review as a secondary capability. Ramp’s use of Codex represents a more deliberate, asynchronous review workflow that leverages GPT-5.5’s larger context window for whole-PR analysis — a meaningfully different use case than what Copilot is primarily designed for.

What are the main risks of using AI for code review?

The primary risks are over-reliance (treating AI approval as sufficient without adequate human oversight), blind spots around domain-specific business logic, and potential impact on junior engineer development if human mentorship in the review process is reduced. Teams adopting this approach should define clear escalation rules for when human review is mandatory.

As more engineering orgs at Ramp’s scale publish their AI tooling playbooks, the pressure on teams still running purely manual review processes will intensify. The question isn’t really whether AI-assisted code review becomes standard practice — it’s how quickly, and whether teams figure out the human-AI balance before they learn from a production incident that they got it wrong.