FEATUREDNewsTechnology

Google project Astra, Veo, 2M token Gemini

The latest Google I/O conference was packed with exciting AI innovations! Dive into the details of Veo, Google DeepMind’s advanced video generation model that brings high-quality, minute-long videos to life with cinematic flair. Discover Project Astra, Google’s futuristic AI assistant prototype, designed to rival OpenAI’s GPT-4o. Explore the enhanced Gemini 1.5 Pro, including a lightweight version and a model with an impressive 2M token context length. Plus, learn about other significant updates, including the powerful Imagen 3 image generator and new open-source models Gemma 2 and PaliGemma.

Just one day after OpenAI’s demonstration of GPT-4o, it’s now Google’s turn to present what they had quietly been working on.

Among other announcements, the Google IO conference introduced:

  • Veo: their most capable video generation model
  • Project Astra: their new project focused on building a future AI assistant
  • Updates to Gemini 1.5 Pro: two new versions of the flagship model, one that’s more light-weight, the other with a 2M token context length

Let’s go through each of these announcements one by one.

Veo

Veo is Google DeepMind’s most capable video generation model to date. It generates videos:

  • of high-quality with a1080p resolution
  • that can go over a minute
  • in a wide range of cinematic and visual styles

Veo can take as input an image or a video along with a textual prompt. It can animate the image or edit the video when passed in the input.

In addition, it supports masked editing, enabling changes to specific areas of the video when you add a mask area to your video and text prompt.

When it comes to technical details, Google shared that they added more details to the captions of each video in Veo’s training data. The model uses high-quality, compressed representations of video (also known as latents) to improve performance, generation speed and efficiency.

Project Astra

Astra is Google’s new project focused on building a future AI assistant, very similar to OpenAI’s GPT-4o that was showcased live yesterday.

Google’s new assistant is powered by Gemini and supports audio, text, video and image shared in real-time. This project is still presented by Google as a prototype, and the capabilities of Astra were only shared through pre-recorded videos since it is still not available to all users.

Early testers report a longer latency, and less emotional intelligence and tone for Astra compared to GPT-4o, but strong text to speech and potentially better ongoing video a long context support.

Gemini 1.5 Pro

Google unveiled two iterations of their flagship model Gemini 1.5 Pro.

Gemini 1.5 Pro Flash is the light-weight, fast and cost-efficient version of the model, meaning it is also multimodal and has a 1M token context length. The performance cost is small, with an MMLU of 78.9% compared to 81.9% for the original Gemini 1.5 Pro model.

Gemini 1.5 Pro had its context length doubled to 2M tokens. The new model is available via a waitlist for select developers building through the API.

Other announcements

Imagen 3, their most capable image generation model, which will be available in multiple versions, each optimized for different types of tasks, from generating quick sketches to high-resolution images.

Gemma 2 and PaliGemma, two new open-source models added to the Gemma family. PaliGemma is Google’s first vision-language open-source model and it is available now. Gemma 2 is a 27B parameter model that outperforms the previous version and will be available starting in June.

The 2-hour sessions was very dense in product updates and announcements across the Google stack and products, including improvements across Search, Workspace, Photos, Android and more.

Access
The Gemini API and Google AI Studio are now available in 200+ countries, Gemini 1.5 Flash costs $0.35 per 1M tokens, with context caching coming next month.

Veo, Astra and the 2M context version of Gemini 1.5 Pro are not available for now, but you can join the waitlist to get access.

However, Gemini 1.5 Pro Flash is available now through the API and PaliGemma is openly released on Kaggle.


Many product or project launches at the Google IO conference closely matched an OpenAI product. Namely, Veo is now OpenAI’s Sora competitor, along with Astra that is GPT-4o’s counterpart. Google still seems to be in the AI race, although we still need to gain access and test the models and products they introduced to get a better sense of their performance.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.