How AI Is Helping Scientists Simulate Black Holes

How AI Is Helping Scientists Simulate Black Holes

Here’s something that would have sounded absurd five years ago: a physicist is using an AI coding assistant to help simulate one of the most extreme environments in the universe — the swirling, light-bending chaos around a black hole. Not as a gimmick. Not as a demo. As actual, peer-reviewed scientific infrastructure. OpenAI Codex is now part of the toolkit that astrophysicist Chi-kwan Chan at the University of Arizona uses to model black hole physics, and the implications stretch well beyond one researcher’s workflow.

Why Black Hole Simulations Are So Hard to Write

To understand why this matters, you need to appreciate what goes into simulating a black hole. We’re not talking about a pretty render for a documentary. Researchers like Chan are building numerical simulations that solve Einstein’s field equations of general relativity in extreme spacetime conditions — where gravity warps space itself, where magnetic fields behave in ways that have no everyday analogy, and where the computational cost of getting things wrong can mean weeks of wasted supercomputer time.

The code underlying these simulations is typically written in low-level languages like C or Fortran, sometimes Python for orchestration, and it has to interface with high-performance computing clusters. It’s deeply specialized work. A mistake in a numerical integrator or a mishandled boundary condition doesn’t throw a clean error — it produces subtly wrong physics that might take days to notice.

Chan’s project, EHT (Event Horizon Telescope) related simulation work, sits at the intersection of observational astronomy and theoretical physics. The Event Horizon Telescope collaboration famously produced the first image of a black hole in 2019 — the M87* image that went everywhere. The simulations Chan works on help interpret what that telescope sees, testing whether the observed shadows and photon rings match what general relativity predicts.

That’s the science. Now here’s where Codex comes in.

What Chan Is Actually Using Codex For

According to OpenAI’s writeup on Chan’s work, he’s using Codex primarily as a coding accelerator — not a replacement for scientific judgment, but a way to handle the repetitive, boilerplate-heavy parts of scientific software development that eat into research time.

A few specific use cases stand out:

  • Generating boilerplate infrastructure code — the scaffolding around simulations (file I/O, parameter parsing, logging) that every researcher has to write but that adds almost zero scientific value
  • Translating between languages — porting routines from older Fortran or C code into more modern Python wrappers that can interface with visualization tools
  • Writing and debugging parallel computing code — MPI and GPU-accelerated routines that require very specific syntax and are notoriously painful to debug by hand
  • Generating test cases — Codex can write unit tests for physics routines, helping catch numerical errors before they propagate through a simulation
  • Documentation — auto-generating docstrings and comments for legacy code that was written without them, making it easier for collaborators to pick up

What Chan describes is less about Codex writing the core physics algorithms — those still require deep domain expertise — and more about compressing the time between having an idea and being able to run it. That compression turns out to be significant. Scientific software development has historically been a bottleneck, not because researchers can’t code, but because they’re splitting attention between physics and software engineering simultaneously.

The Specific Tools in Play

Chan uses GRRMHD (General Relativistic Radiation Magnetohydrodynamics) codes — specifically tools like HARM and related frameworks that simulate plasma behavior near black holes. These are not off-the-shelf packages. They’re bespoke scientific codes with decades of accumulated complexity. Getting Codex to be useful here requires careful prompting — you can’t just ask it to “simulate a black hole” and expect anything useful. But ask it to “write an MPI scatter routine for a 3D array with ghost zones for a structured grid in C” and suddenly you’re saving a few hours of tedious work.

That’s the right mental model for what’s happening here. Codex as a very fast, knowledgeable junior programmer who never gets bored of boilerplate.

Why This Is More Significant Than It Looks

Scientific computing has a reproducibility crisis that doesn’t get nearly enough attention. A 2016 Nature survey found that more than 70% of researchers had tried and failed to reproduce another scientist’s results. A substantial fraction of those failures trace back to software — undocumented code, untested routines, environment dependencies nobody recorded.

If Codex makes it easier to write documented, tested, readable scientific code, that’s not a productivity story. That’s a scientific integrity story.

There’s also a workforce angle. The pipeline of researchers who are both domain experts in astrophysics and strong software engineers is narrow. Most graduate students learn enough coding to get their research done, but they’re not software engineers. Tools like Codex could genuinely widen who can do computationally intensive research, lowering the barrier for researchers at smaller institutions or in countries with less technical infrastructure.

How This Compares to Other AI Coding Tools

Codex isn’t the only option here. GitHub Copilot (which is also built on OpenAI models) is the most widely used AI coding assistant in general software development. Anthropic’s Claude has been praised for handling long, complex codebases with better context retention. Google’s Gemini is increasingly embedded in development environments. And open-source alternatives like Code Llama from Meta give researchers who can’t share data with cloud providers a local option.

For scientific computing specifically, the jury is still out on which tool performs best. Scientific code often involves domain-specific libraries (NumPy, SciPy, HEALPix, FFTW), unusual mathematical conventions, and performance constraints that general-purpose coding assistants sometimes fumble. Chan’s experience suggests Codex handles the infrastructure layer well — but the core numerical physics still needs a human who understands what the equations actually mean.

This is a theme you see across serious Codex use cases. We’ve written about how Notion uses Codex to ship features faster and how Nextdoor engineers accelerated their release cycles — in both cases, the productivity gains are real, but they come from AI handling the mechanical parts of software work, not the judgment parts. Science is no different.

What It Means for the Broader AI-in-Research Conversation

There’s been a lot of hype around AI for scientific discovery — models predicting protein structures (AlphaFold), generating hypotheses, analyzing experimental data. But the unglamorous truth is that a huge chunk of what slows down science isn’t the ideas — it’s the software plumbing. Researchers spend enormous amounts of time on code that isn’t their core contribution, just a prerequisite to doing the work they actually care about.

Codex attacking that layer is, in some ways, a more tractable and immediately useful application than the moonshot stuff. You don’t need the AI to understand black holes. You need it to write the file parser so the physicist can focus on the black holes.

I wouldn’t be surprised if we see more domain scientists — computational chemists, climate modelers, genomics researchers — quietly adopting these tools over the next two to three years, not because they read a press release, but because a colleague mentioned it saved them a week of debugging.

Key Takeaways

  • Chi-kwan Chan at the University of Arizona is using OpenAI Codex to accelerate development of black hole simulation software, not to replace the core physics work
  • Primary use cases include boilerplate generation, language translation between C/Fortran/Python, parallel computing code, unit tests, and documentation
  • The underlying science involves GRRMHD simulations that support Event Horizon Telescope research and testing of general relativity
  • Codex’s real value in scientific settings appears to be compressing the time between idea and execution by handling mechanical software tasks
  • This has implications beyond productivity — better-documented, tested scientific code could improve reproducibility across the field
  • Competing tools (Copilot, Claude, Gemini, Code Llama) are viable alternatives, but scientific computing has specific demands that don’t always align with general-purpose AI coding tools

Frequently Asked Questions

What exactly is OpenAI Codex?

OpenAI Codex is an AI system designed to understand and generate code across dozens of programming languages. It powers tools like GitHub Copilot and is available via OpenAI’s API. It’s distinct from ChatGPT in that it’s optimized specifically for coding tasks rather than general conversation.

Is Codex actually writing the physics behind these simulations?

No — and that distinction matters. Chan uses Codex for infrastructure code, boilerplate, and tooling. The core numerical algorithms that encode general relativity and magnetohydrodynamics are still written by domain experts who understand the physics. Codex handles the surrounding scaffolding that makes that physics code actually run.

Could other researchers use this approach?

Yes, and many likely already are informally. Any researcher working with computational code — climate science, genomics, materials science, particle physics — could apply a similar workflow. The key is knowing which parts of your codebase genuinely require domain expertise and which are generic software engineering tasks that an AI assistant can handle reliably.

How does this connect to OpenAI’s broader research ambitions?

OpenAI has been publicly positioning AI as a tool for accelerating scientific discovery, and showcasing use cases like Chan’s fits that narrative. For a deeper look at how OpenAI is thinking about its role in society and research, our piece on OpenAI’s plan to make AGI work for everyone covers the broader strategic context. The OpenAI Economic Research Exchange is another angle on how they’re building credibility in the research community.

What Chan is doing with black hole simulations is a small but telling signal about where scientific computing is heading. The researchers who figure out how to integrate these tools thoughtfully — not naively, not skeptically, but practically — are going to move faster than those who don’t. And in fields where supercomputer time costs real money and grant cycles are finite, moving faster isn’t just convenient. It’s competitive. Watch for more stories like this one over the next 18 months as the practice becomes normalized and researchers start publishing not just their results, but their AI-assisted workflows.