OpenAI Launches GPT-5.3-Codex-Spark: Real-Time AI Coding at 1,000+ Tokens per Second, Powered by Cerebras

GPT-5.3-Codex-Spark by OpenAI - real-time AI coding at 1000+ tokens per second powered by Cerebras Wafer-Scale Engine 3

OpenAI has officially launched GPT-5.3-Codex-Spark, its first real-time coding model built for speed above all else. Powered by Cerebras‘ Wafer-Scale Engine 3, the model delivers over 1,000 tokens per second — making it roughly 15 times faster than the full GPT-5.3-Codex model. Codex-Spark is now rolling out as a research preview for ChatGPT Pro users.

Why Speed Matters for Coding AI

Developers have long faced a tradeoff between AI capability and responsiveness. Powerful models like GPT-5.3-Codex can handle complex, multi-file engineering tasks (see our comparison of the best AI coding tools in 2026), but their slower output speeds break the creative flow that makes coding productive. Codex-Spark changes this dynamic entirely.

At over 1,000 tokens per second, Spark delivers near-instant responses — fast enough for real-time collaboration where a developer can write, edit, and iterate with the AI as a live coding partner rather than a batch-processing tool. OpenAI describes it as a “daily driver of productivity” optimized for the rapid back-and-forth that defines modern software development.

The Cerebras Partnership: Moving Beyond Nvidia

Codex-Spark marks the first milestone in OpenAI’s partnership with Cerebras, which was announced in January 2026 as a multi-year agreement reportedly worth over $10 billion. This is also OpenAI’s first major inference deployment outside its traditional Nvidia-dominated infrastructure.

The Cerebras Wafer-Scale Engine 3 (WSE-3) is a third-generation chip featuring approximately 4 trillion transistors spread across nearly 900,000 cores. Its architecture boasts the largest on-chip memory of any AI processor, which eliminates the memory bottlenecks that typically slow down token generation. The chip architecture is specifically designed for ultra-low latency inference, scaling to thousands of systems while supporting trillion-parameter models.

Sachin Katti, Head of Compute at OpenAI, commented on the collaboration: “Cerebras has been a great engineering partner, and we’re excited about adding fast inference as a new platform capability.”

What Codex-Spark Can Do

Despite being a smaller, speed-optimized model, Codex-Spark brings strong real-world coding capabilities to the table:

  • Precise code edits — Quickly modify specific functions, fix bugs, or refactor code blocks
  • Plan revision — Rapidly iterate on implementation plans and architectural decisions
  • Contextual Q&A — Answer questions about your codebase with full context awareness
  • Interface prototyping — Fast layout visualization, UI scaffolding, and styling refinement
  • Real-time collaboration — Interrupt, redirect, and iterate with near-instant responses

The model supports a 128K context window, allowing it to maintain awareness of large codebases during interactive sessions.

Benchmark Performance: Speed vs. Capability

OpenAI is transparent about the tradeoff. On SWE-Bench Pro and Terminal-Bench 2.0 — two key benchmarks for evaluating agentic software engineering — Codex-Spark underperforms the full GPT-5.3-Codex model. For reference, GPT-5.3-Codex scores 56.8% on SWE-Bench Pro and 77.3% on Terminal-Bench 2.0.

However, Spark completes tasks in a fraction of the time and still outperforms GPT-5.1-Codex-mini on these same benchmarks. OpenAI positions this as an acceptable exchange: developers get responses fast enough to maintain their creative flow while retaining strong coding capability for the tasks that matter most in day-to-day work.

Dual-Mode Vision: The Future of Codex

Codex-Spark is not a replacement for the full GPT-5.3-Codex — it’s the beginning of a dual-mode architecture. This aligns with a broader industry trend we explored in our analysis of why the age of giant AI models is shifting toward specialized intelligence. OpenAI envisions Codex operating in two complementary modes:

  • Real-time collaboration mode (Spark) — For rapid iteration, quick edits, and interactive development
  • Long-horizon reasoning mode (Full Codex) — For complex, multi-step engineering tasks that require deep analysis

Over time, OpenAI plans to blend these two modes, creating a unified coding experience that automatically selects the right balance of speed and depth based on the task at hand.

Availability

GPT-5.3-Codex-Spark is available today as a research preview for ChatGPT Pro subscribers through:

  • The Codex app (web)
  • The Codex CLI (command line)
  • The VS Code extension

API access is also rolling out to select design partners as of February 12, 2026.

What This Means for Developers

The launch of Codex-Spark signals an important shift in how AI coding tools are designed. Rather than pursuing maximum benchmark scores at the cost of usability, OpenAI is acknowledging that speed and interactivity are just as important as raw capability for real developer workflows.

The Cerebras partnership also demonstrates that the future of AI infrastructure is not a single-vendor story. By deploying on purpose-built silicon optimized for fast inference, OpenAI can offer specialized performance profiles that wouldn’t be possible on general-purpose GPU clusters alone — a trend we covered in depth in The State of Machine Learning in 2026.

For developers already using Codex, Spark adds a complementary tool for the moments when you need immediate feedback rather than deep analysis. For the broader industry, it sets a new bar for what “responsive AI” means in production developer tools. For a look at how competing models stack up, see our Claude AI models comparison guide.

Source: OpenAI — Introducing GPT-5.3-Codex-Spark