Sber Unveils Kandinsky 4.1 Video Model for Text-to-Video Generation at GigaConf

Sberbank has introduced a new model, Kandinsky 4.1 Video, designed to generate videos from text. This announcement was made by Andrey Belevtsev, Senior Vice President and Head of the Technology Development Division at Sber, during the GigaConf technology conference. The information service from Habra was present at the event.

Participants of GigaConf, along with select artists and designers, were the first to access Kandinsky 4.1 Video. It will soon be available to all users.

Kandinsky 4.1 Video can generate videos of up to 10 seconds long and supports SD (720×576) and HD (1280×720) resolutions. Users can create videos based on textual descriptions or an arbitrary initial frame. The model accommodates any aspect ratio.

The underlying architecture of the model is based on a diffusion transformer. The quality of video generation has improved through supervised fine-tuning (SFT) with specifically curated data. The training process involved over 100 experts, including designers, photographers, and artists with relevant qualifications, which has enhanced the artistic expressiveness, composition, and cinematic feel of the videos.

This new architecture requires greater computational resources. Therefore, distillation and acceleration methods were applied during development, resulting in a more than threefold reduction in generation time compared to the previous version. In many scenarios, the quality has been maintained or even improved.