New Era of Animation: Tencent Unveils HunyuanPortrait, a Groundbreaking Open-Source AI for Lifelike Portrait Animation

On Tuesday, Tencent unveiled a new AI model capable of animating static portrait images. Named HunyuanPortrait, this large language model (LLM) employs diffusion architecture and is designed to create realistic animations by utilizing a reference image alongside guiding video input. The team behind this innovation emphasized the model’s ability to capture facial data as well as spatial movements, ensuring precise synchronization with the reference image. HunyuanPortrait has been made available as open source, allowing users to download and operate it locally from widely used repositories.

Breathing Life into Portraits with Tencent’s HunyuanPortrait

In a announcement on X (previously known as Twitter), Tencent Hunyuan’s official account declared that the HunyuanPortrait model is now accessible to the public. Users can download the AI model from Tencent’s GitHub and Hugging Face repositories. Furthermore, a preprint research paper providing details on the model will be found on arXiv. Importantly, this AI model is permitted for academic and research applications, while commercial uses are restricted.

The HunyuanPortrait model is capable of producing animated videos that appear lifelike, relying on a static image and corresponding driving videos. It gathers facial data and head orientations from the video and projects them onto the static portrait. The company asserts that the synchronization of movements is precise, even capturing subtle variations in facial expressions.

Architecture of HunyuanPortrait
Image Credit: Tencent

 

On the model’s webpage, Tencent’s researchers elaborated on the structure of HunyuanPortrait. The model is constructed using the framework of Stable Diffusion models integrated with a conditional control encoder. These pre-trained encoders separate motion data from identity within the videos. Control signals are extracted from this data and integrated into the still portrait through a denoising Unet, which the company claims enhances both spatial precision and temporal coherence in the results.

Tencent asserts that the AI model surpasses current open-source options in terms of temporal coherence and controllability, although these claims have yet to be independently validated.

Such innovative models hold significant promise for the film and animation sectors. Traditionally, animators either manually keyframe facial expressions or rely on costly motion capture technology to achieve realistic character animations. HunyuanPortrait enables creators to input character designs alongside desired movements and expressions, generating the final product effortlessly. This technology could also democratize high-quality animation for smaller studios and independent artists.

[IMAGE_2]