Gemini Diffusion: How this generation write everything at once!

May 26, 2025 admin

magine an AI that doesn’t type word by word but instead starts with a messy blob and carves out fully coherent paragraphs—all at once. That’s Google’s Gemini Diffusion, and it’s turning heads with speeds topping 1,600 tokens per second.

In our latest video, we explore:

How it mimics image diffusion to generate text
Real-world demos that include instant app generation and multi-language translations
Benchmarks where it holds its own against much bigger models
Why it might be the key to unlocking massive context windows with blazing speed

Average Token per Second for LLMs (2024-2025)

These are typical average speeds for LLMs under standard conditions (may vary based on model size, backend hardware, and prompt complexity):

Model	Avg. Token/sec (output)	Notes
GPT-4 (API)	~30–60 tokens/sec	Slower, optimized for quality; GPT-4 Turbo is faster
GPT-3.5 (API)	~60–100 tokens/sec	Snappier than GPT-4
Claude 2	~20–40 tokens/sec	Focuses on coherence and safety
Claude 3 (Opus/Sonnet)	~50–100 tokens/sec	Sonnet is noticeably faster than Opus
Mistral 7B (local)	~60–120 tokens/sec	Blazing fast if run on good hardware
Gemini 1.5	~50–100 tokens/sec	Comparable to Claude 3 in speed
Gemini Diffusion (preview)	1000–1600 tokens/sec	Order of magnitude faster — thanks to parallel generation