VideosYouTube

Gemini Diffusion: How this generation write everything at once!

magine an AI that doesn’t type word by word but instead starts with a messy blob and carves out fully coherent paragraphs—all at once. That’s Google’s Gemini Diffusion, and it’s turning heads with speeds topping 1,600 tokens per second.

In our latest video, we explore:

  • How it mimics image diffusion to generate text
  • Real-world demos that include instant app generation and multi-language translations
  • Benchmarks where it holds its own against much bigger models
  • Why it might be the key to unlocking massive context windows with blazing speed
YouTube player

Average Token per Second for LLMs (2024-2025)

These are typical average speeds for LLMs under standard conditions (may vary based on model size, backend hardware, and prompt complexity):

ModelAvg. Token/sec (output)Notes
GPT-4 (API)~30–60 tokens/secSlower, optimized for quality; GPT-4 Turbo is faster
GPT-3.5 (API)~60–100 tokens/secSnappier than GPT-4
Claude 2~20–40 tokens/secFocuses on coherence and safety
Claude 3 (Opus/Sonnet)~50–100 tokens/secSonnet is noticeably faster than Opus
Mistral 7B (local)~60–120 tokens/secBlazing fast if run on good hardware
Gemini 1.5~50–100 tokens/secComparable to Claude 3 in speed
Gemini Diffusion (preview)1000–1600 tokens/secOrder of magnitude faster — thanks to parallel generation

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.