Gemini Omni and 3.5: What Google’s 9 Demos Actually Show

Gemini Omni and 3.5: What Google's 9 Demos Actually Show

Nine demos. That’s what Google dropped alongside the Gemini Omni and Gemini 3.5 announcements at Google I/O 2026 — and unlike the polished, suspiciously perfect demos we’ve grown used to from AI companies, these ones are worth actually dissecting. Not because they’re flawless, but because the specific things Google chose to show tells you a lot about where the company thinks the AI race is heading. If you missed the broader spectacle, our Google I/O 2026 roundup covers the full picture. But the model demos deserve their own treatment.

Why These Two Models, Why Now?

Google has been playing catch-up optics for the better part of two years. Even when Gemini Ultra matched or beat GPT-4 on benchmarks in late 2024, the narrative stuck: OpenAI moves fast, Google moves carefully. That perception cost Google real developer mindshare.

Gemini Omni is Google’s answer to the “native multimodal” gap. Where earlier Gemini versions were trained on multiple modalities but still felt stitched together in practice, Omni is designed from the ground up to treat text, audio, image, and video as genuinely equal inputs — not a text model with vision bolted on. Gemini 3.5, meanwhile, slots in as the smarter, faster successor to Gemini 2.5 Pro, targeting the developer and enterprise crowd who need something that can reason deeply without waiting ten seconds per response.

The timing is pointed. OpenAI shipped GPT-5.5 just weeks ago, and the coding benchmarks have been brutal for competitors. Google needed a moment. These demos are that moment — or at least Google’s attempt at one.

Breaking Down All 9 Demos

Google published all nine demo videos on the official Google Blog, and watching them back-to-back reveals a clear strategy: Omni handles the “wow” real-time interaction stuff, while 3.5 is the workhorse underneath.

The Real-Time Conversation Demos (Omni)

Three of the nine demos focus on live, low-latency conversation — and this is where Omni genuinely surprises. The model doesn’t just respond; it interrupts naturally when you trail off, adjusts tone when you sound frustrated, and picks up on ambient context. In one demo, a user is cooking and asks a question mid-sentence. Omni catches the incomplete query, asks a clarifying question, and then adjusts the answer when the user says “no wait, I meant the other one.”

That sounds trivial. It’s not. Getting a model to handle mid-sentence corrections without losing conversational context has been genuinely hard. Google says Omni’s latency in these demos is under 300 milliseconds end-to-end. I’d want to test that claim at scale, but in the videos at least, it holds up.

The Vision and Video Understanding Demos

Four demos cover visual understanding — two with images, two with live video feed. The image demos are impressive but expected at this point. The video ones are more interesting.

In the most striking clip, a user points their phone camera at a cluttered desk and asks Omni to help them find a specific document. The model identifies objects in real time, narrows down likely locations based on what’s visible, and talks the user through a physical search. It’s less “AI reads an image” and more “AI acts as a second pair of eyes.” Whether this works reliably outside a demo setup is a separate question, but the concept lands.

The second video demo involves a live sports clip. Omni watches a few seconds of footage and provides tactical commentary — not just describing what happened, but offering an interpretation of player positioning. Sports analytics people will rightly point out that this is surface-level compared to dedicated tools, but for a general-purpose model doing it on the fly, it’s notable.

The Coding and Reasoning Demos (Gemini 3.5)

The final two demos belong to Gemini 3.5, and they’re more subdued — deliberately so, I think. Google is clearly positioning 3.5 as a serious tool for developers, not a party trick. One demo shows 3.5 taking a vague bug report, pulling relevant context from a large codebase, identifying the root cause, and writing a fix with an explanation. The other involves multi-step mathematical reasoning with a long-horizon problem.

On coding specifically: this is where the comparison to OpenAI Codex gets unavoidable. Codex has had a significant head start in enterprise settings — you can read about that in our piece on Cisco’s enterprise Codex deployment. Gemini 3.5 looks genuinely competitive here, but “looks competitive in a demo” and “wins production deployments” are different things.

Key Capabilities Shown Across the 9 Demos

  • Sub-300ms conversational latency in live audio interaction (Omni)
  • Mid-sentence correction handling without context loss
  • Real-time video understanding with physical-world navigation assistance
  • Ambient tone detection — model adjusts response style based on user emotional cues
  • Large codebase reasoning with root-cause identification (Gemini 3.5)
  • Multi-step mathematical problem solving with step-by-step explanation
  • Sports and event video analysis with interpretive (not just descriptive) commentary
  • Object location assistance via live camera feed
  • Natural conversation interruption — model knows when to interject vs. wait

What This Actually Changes — and What It Doesn’t

Here’s the thing: demos are curated. Every company does this, and Google is no exception. The Omni latency numbers are impressive but tested under controlled conditions. The video understanding demos don’t show failure modes. The coding demo doesn’t show Gemini 3.5 hallucinating a function that doesn’t exist — which it almost certainly does sometimes.

That said, I think the Omni real-time audio work is the most strategically significant part of this release. OpenAI’s GPT-4o has had a version of real-time voice since mid-2024, but it still feels like a text model doing an impression of a conversational partner. From the demos, Omni feels more… present. The interruption handling and tone adaptation are the kind of thing that makes a product feel alive rather than responsive.

For Google’s consumer products — search, Assistant, Google Meet — this matters enormously. If Omni’s real-time conversation quality holds up outside demo conditions, Google has something that could genuinely shift how people interact with its products day-to-day. The Gemini-powered ads work we’ve covered shows Google is already threading AI deep into its revenue-critical products. Omni would accelerate that significantly.

For Gemini 3.5 specifically: developers should pay attention to the codebase reasoning demo more than the math one. Math benchmarks have become almost meaningless as a differentiator — every major model scores well. But navigating a real, messy codebase with partial information and producing actionable output? That’s a real developer problem, and if 3.5 does it reliably, it’s a legitimate competitor to what OpenAI is offering enterprise customers.

What This Means for Different Users

For Developers

Gemini 3.5 is the model to watch. If the codebase reasoning holds up in production, it’s worth evaluating seriously alongside Codex and Claude 3.7. Google typically offers competitive API pricing — historically undercutting OpenAI on per-token costs — so the economics could be attractive for high-volume use cases.

For Enterprise Teams

The Omni voice capabilities are interesting for customer-facing applications — support bots, internal tools, anything where a more natural conversation flow reduces friction. The caveat is that enterprise deployment requires reliability, not just impressive demos, and Google’s track record on sustained product support has been mixed.

For Regular Users

The practical impact will land through Google’s existing products — Search, Assistant, Workspace — rather than direct model access for most people. Expect Omni’s capabilities to show up in Workspace features over the next two quarters if Google’s integration timeline holds.

FAQ

What is Gemini Omni exactly?

Gemini Omni is Google’s natively multimodal AI model, designed to process and respond across text, audio, image, and video simultaneously rather than treating each modality separately. It’s optimized especially for real-time, low-latency conversation with emotional and contextual awareness.

How does Gemini 3.5 compare to GPT-5.5 or Claude?

It’s genuinely competitive on reasoning and coding benchmarks, though independent evaluations outside Google-controlled settings are still limited. The codebase navigation demo suggests strong performance on real developer workflows, but enterprise customers should run their own tests rather than relying on demo results alone.

When can developers access these models?

Google announced availability through Google AI Studio and Vertex AI following Google I/O 2026, with Gemini 3.5 available for API access shortly after the announcement. Omni’s full real-time capabilities are rolling out in phases, with broader access expected through Q3 2026.

Do these demos prove Google has caught up to OpenAI?

“Caught up” is the wrong frame — both companies are moving simultaneously, and leadership shifts depending on which capability you measure. What the demos do show is that Google has closed meaningful ground on real-time multimodal interaction, which was a genuine weakness in earlier Gemini versions.

Google will need developer adoption numbers, not just demo reactions, to prove these models have staying power. The next few months of API usage data, third-party benchmarks, and actual production deployments will tell us far more than nine carefully choreographed videos — however impressive some of them genuinely are.