Google AI Releases Veo 3.1 Lite: Offers Developers Low-Cost Fast Video Generation with Gemini API

Google announced the release of Veo 3.1 Litea new model category within the video production portfolio designed to address the main challenge of production-scale deployments: pricing. While the video production space has seen rapid progress in visual fidelity, the cost per second of content produced remains high, often prohibiting developers building high-volume applications.
Veo 3.1 Lite is now available with Gemini API again Google AI Studio for users in the paid category. By offering the same generation speed as the existing Veo 3.1 Fast model at almost half the cost, Google is positioning this model as a developer level focused on producing structured video and iterative prototyping.

Technical Architecture: Diffusion Transformer (DiT)
The most important feature of the Veo 3.1 family is its origin Diffusion Transformer (DiT) properties. Traditional video production models often rely on U-Net-based streaming, which can be difficult with high-dimensional data and long-range temporal dependencies.
The Veo 3.1 Lite uses a transformer-based core that works on it spatio-temporal patches. In this architecture, video frames are not processed as static 2D images but as a continuous sequence of tokens in latent space. By applying self-attention to all these tracts, the model becomes better temporal consistency. This ensures that objects, lighting, and textures remain consistent throughout the duration of the clip, reducing artifacts often seen in earlier models.
The model performs its calculations in compression hidden place instead of pixel space. This allows the model to handle the high computing demands of video processing while keeping memory low. For developers, this translates to a model that can produce high-definition content without the exponential increase in computing time that often accompanies resolution scaling.
Performance and output specifications
Veo 3.1 Lite provides specific resolution and duration parameters, allowing AI devs to integrate it into a structured workflow. Unlike the Veo 3.1 flagship model, which supports 4K resolution, the Lite version is designed for high definition (HD) results.
- Decisions based on: 720p and 1080p.
- Appearance of measurements: Native support in both landscape (16:9) and portrait (9:16).
- Clip length: Developers can specify a production length of 4, 6, or 8 seconds.
- Quick attachment: The model is optimized for ‘Cinematic Control,’ recognizing technical commands such as ‘pan,’ ’tilt,’ and certain lighting commands.
The ‘Lite’ tag does not refer to a reduction in production speed compared to the ‘Fast’ category. Instead, it refers to a set of optimized parameters that allow the Google team to offer a model at a much lower price point while maintaining the same low-latency performance characteristics of the Veo 3.1 Fast.
Changing Prices: Democratizing Video Inference
The main value proposition of the Veo 3.1 Lite is its cost structure. In today’s market, high-quality video targeting often costs several dollars per minute of video, making it difficult to justify applications such as generating dynamic ads or automated social media.
The price of Veo 3.1 Lite is set as follows:
- 720p: $0.05 per second.
- 1080p: $0.08 per second.
Deployment with Gemini API and AI Studio
Accessibility is managed by Gemini API. This allows integration of video production into existing Python or Node.js applications using standard REST or gRPC calls.
One important technical feature for business developers is the inclusion of SynthID. Developed by Google DeepMind, SynthID is a tool for tagging and identifying AI-generated content. It embeds a digital watermark directly into video pixels that are not visible to the human eye but are detected by special software. This is a mandatory component for developers concerned with security, compliance, and media encryption.
Key Takeaways
- Half the Cost, Same Speed: Provides the same low latency performance as the ‘Fast’ category at less than 50% of the price ($0.05/second at 720p).
- Scalable HD output: Supports 720p and 1080p resolution in 4, 6, or 8 second clips with native aspect ratios of 16:9 and 9:16.
- Buildings: Built upon a Diffusion Transformer (DiT) using spatio-temporal patches for maximum movement and body alignment.
- The developer is OK: Available now with Gemini API (paid category) and Google AI Studio, which contains built-in SynthID digital watermarking.
Check it out Technical details. You can access the model through the premium version of Gemini API and Google AI Studio. Also, feel free to follow us Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Michal Sutter is a data science expert with a Master of Science in Data Science from the University of Padova. With a strong foundation in statistical analysis, machine learning, and data engineering, Michal excels at turning complex data sets into actionable insights.



