Alibaba Unveils Qwen VLo: An Innovative Multimodal AI Model for Image Creation and Editing

The Chinese tech giant Alibaba has unveiled Qwen VLo, a multimodal artificial intelligence model designed for image analysis, creation, and editing.

According to Alibaba, Qwen VLo employs an innovative generation technique that constructs images from left to right and top to bottom, continually refining its output. This method enhances control over the results, particularly when handling lengthy texts. While the company has not disclosed specific technical details, it is likely that Qwen VLo utilizes an autoregressive approach similar to that of GPT-4o, rather than a diffusion-based method.

Qwen VLo is capable of interpreting complex natural language editing instructions, allowing users to change backgrounds, insert new objects, modify visual styles, or even merge multiple images into a single composition.

The system accommodates both artistic and technical image alterations. For instance, it can create segmentation maps, carry out edge detection, or generate depth maps with color overlays upon request.

Qwen VLo processes images with varying resolutions and aspect ratios, including extreme formats such as 4:1 or 1:3, although this feature is not yet active. The model also supports several languages, including Chinese and English.

Currently, Qwen VLo is available for preview via Qwen Chat, Alibaba’s web interface. The company acknowledges that the model still makes errors in generation, does not always match original images, and requires detailed instructions. Alibaba intends to continue improving the model’s reliability and stability.

Until now, Alibaba has been a consistent source of competitive AI language models. For example, in April, it launched Qwen3 and its weights, solidifying the company’s role as a key player in open AI research. It remains unclear why Qwen VLo has not been released with model weights, and whether this indicates broader changes in Alibaba’s approach to open publication.

Delegate some of your routine tasks with BotHub! There is no need for a VPN to access the service, and it can be used with a Russian card. Use this link to receive 100,000 free tokens for your initial tasks and start working with neural networks right away!

Translation and news source can be found here.