Tencents Hunyuan-T1: A Compelling Contender to OpenAIs Leading Models

Опубликовано: 24 марта, 2025

Tencent has announced that its new Hunyuan-T1 model is capable of competing with the leading data processing systems from OpenAI.

In line with the strategy employed for all major logical reasoning models, Tencent heavily relied on reinforcement learning during the development phase. A significant 96.7% of the computational resources after training were dedicated to enhancing logical reasoning abilities and aligning with human preferences.

In the MMLU-PRO benchmark, which assesses knowledge across 14 subjects, Hunyuan-T1 achieved a score of 87.2, securing second place behind OpenAI’s model, o1. In the GPQA-diamond test, focused on scientific reasoning, it scored 69.3.

Tencent claims this model excels particularly at solving mathematical problems, earning 96.2 points on the MATH-500 assessment, second only to Deepseek-R1. Other noteworthy performances include scores of 64.9 on LiveCodeBench and 91.9 on ArenaHard.

For its training, Tencent employed a curriculum learning approach, methodically increasing the difficulty of tasks. The company also implemented a self-learning system where earlier iterations of the model evaluated the outcomes of newer versions for enhancement.

The model is built on the Mamba Transformer architecture, which Tencent states processes long texts twice as quickly as traditional models under comparable conditions. Hunyuan-T1 is accessible via Tencent Cloud, and a demonstration is available on Hugging Face.

This release follows Baidu’s recent introduction of its own o1-level model and Alibaba’s previous launch. All three firms—Alibaba, Baidu, and Deepseek—adopt open-source strategies. AI investor and former Google head in China, Kai-Fu Lee, describes these developments as an existential threat to OpenAI.

As top models consistently achieve over 90% accuracy in standardized tests, Google DeepMind has introduced a more challenging assessment known as BIG-Bench Extra Hard (BBEH). Even the best models struggle with this new test; OpenAI’s top model, o3-mini (high), managed only 44.8% accuracy.

Another surprising outcome was for Deepseek-R1, which, despite its high performance in other assessments, garnered just about seven percent on this new test. This discrepancy indicates that test results may not represent the entire picture and often do not correlate with real-world performance, especially as some development teams optimize their models specifically for these benchmarks. Certain Chinese models face particular challenges, such as incorporating Chinese characters into English answers.

[Source](https://the-decoder.com/tencent-develops-reasoning-model-that-matches-openais-o1-capabilities/)