Question 1

What is Google Veo 3.1 and how is it different from Veo 3?

Accepted Answer

Veo 3.1 is Google DeepMind's refined flagship video model. It builds on Veo 3 by adding three capabilities that matter most in practice: native audio generation in the same pass as video, Start & End Frame shot control, and video extension for building sequences beyond a single clip. Portrait 9:16 generation and support for reference images round out the upgrade.

Question 2

How does Veo 3.1's audio generation work?

Accepted Answer

Veo 3.1 natively generates dialogue, sound effects, and ambient noise alongside the video — audio is always on, not optional. There is no separate audio prompt: the model infers what the scene should sound like from your visual description. That means writing a more specific, scene-detailed prompt naturally produces more accurate sound.

Question 3

What is Start & End Frame and when should I use it?

Accepted Answer

You supply both the opening and closing image of a shot; Veo 3.1 generates the motion and audio between them. It's most useful when you already know your shot list — product reveals, transitions, or any sequence where the final frame matters as much as the first. The model handles camera movement and subject motion to connect the two frames naturally.

Question 4

How does video extension work and how long can a sequence get?

Accepted Answer

After generating a clip, you can extend it. Each extension adds 7 seconds and Veo 3.1 analyzes the existing footage to produce a seamless continuation. You can chain up to 20 extensions, building sequences up to 140 seconds in total. Note that video extension only supports 720p resolution.

Question 5

What can I do with reference images?

Accepted Answer

You can supply up to 3 reference images per generation to anchor character identity, object appearance, or visual style. This is called Ingredients to Video — instead of describing what something looks like, you show the model directly. It's particularly effective for maintaining subject consistency across multiple shots.

Question 6

What resolutions and durations does Veo 3.1 support?

Accepted Answer

Veo 3.1 generates at 720p, 1080p, or 4K — but 1080p and 4K are only available for 8-second clips. 4-second and 6-second clips are limited to 720p. Video extension is also 720p only. Aspect ratios are 16:9 (landscape) and 9:16 (portrait), both rendered at 24fps.

Question 7

Does Veo 3.1 support portrait (vertical) video?

Accepted Answer

Yes. Veo 3.1 generates native 9:16 vertical video — not a cropped version of a landscape output, but content composed for vertical-first viewing. This matters for TikTok, Instagram Reels, and YouTube Shorts, where vertical framing behaves differently from widescreen.

Question 8

How does Veo 3.1 perform on benchmarks?

Accepted Answer

Google evaluated Veo 3.1 against competitors using human raters. It ranked first on MovieGenBench (1,003 prompts) for overall preference, text alignment, and visual quality. It also ranked first on VBench I2V (355 image-text pairs). On a separate audio evaluation of 527 MovieGenBench prompts, Veo 3.1 placed first for audio-video synchronization and overall preference with audio.

Question 9

Are Veo 3.1 outputs watermarked?

Accepted Answer

Yes. Every video generated with Veo 3.1 is embedded with Google's SynthID — an invisible, machine-readable watermark that identifies the content as AI-generated. It doesn't affect the visual output and survives common post-processing like compression and re-encoding.

Veo 3.1: The AI Video Model
That Hears What It Sees.

What Makes Veo 3.1 Different?

Four Capabilities That Change the Workflow

Scene-Inferred Audio

Start & End Frame

Video Extension

Reference Images

Veo 3.1 Technical Specifications

How to Generate Video with Veo 3.1

Pick your starting point

Write a scene-specific prompt

Add reference images if needed

Generate, extend & download

Frequently Asked Questions about Veo 3.1

What is Google Veo 3.1 and how is it different from Veo 3?

How does Veo 3.1's audio generation work?

What is Start & End Frame and when should I use it?

How does video extension work and how long can a sequence get?

What can I do with reference images?

What resolutions and durations does Veo 3.1 support?

Does Veo 3.1 support portrait (vertical) video?

How does Veo 3.1 perform on benchmarks?

Are Veo 3.1 outputs watermarked?

Ready to produce your
first Veo 3.1 masterpiece?

Veo 3.1: The AI Video Model That Hears What It Sees.

What Makes Veo 3.1 Different?

Four Capabilities That Change the Workflow

Scene-Inferred Audio

Start & End Frame

Video Extension

Reference Images

Veo 3.1 Technical Specifications

How to Generate Video with Veo 3.1

Pick your starting point

Write a scene-specific prompt

Add reference images if needed

Generate, extend & download

Frequently Asked Questions about Veo 3.1

What is Google Veo 3.1 and how is it different from Veo 3?

How does Veo 3.1's audio generation work?

What is Start & End Frame and when should I use it?

How does video extension work and how long can a sequence get?

What can I do with reference images?

What resolutions and durations does Veo 3.1 support?

Does Veo 3.1 support portrait (vertical) video?

How does Veo 3.1 perform on benchmarks?

Are Veo 3.1 outputs watermarked?

Ready to produce your first Veo 3.1 masterpiece?

Veo 3.1: The AI Video Model
That Hears What It Sees.

Ready to produce your
first Veo 3.1 masterpiece?