Stability AI Launches Compact Text-to-Sound Model for Mobile Devices

Stability AI and Arm have introduced a compact text-to-audio model capable of operating on smartphones, generating stereo audio clips of up to 11 seconds in approximately 7 seconds.

This model, known as Stable Audio Open Small, employs a technique referred to as «Adversarial Relativistic Contrastive» (ARC), which was developed by researchers from the University of California, Berkeley, among other institutions. On high-performance hardware like the Nvidia H100 GPU, it can reproduce stereo sound at 44 kHz in just 75 milliseconds — fast enough for near real-time audio generation.

The original version of Stable Audio Open was launched last year as a free, open-source model featuring 1.1 billion parameters. This smaller version utilizes only 341 million parameters, significantly enhancing its usability on consumer-grade devices. Stability AI and Arm first announced their collaboration in March.

To enable functionality on smartphones, the team restructured the architecture. The system now comprises three components: an autoencoder for compressing audio data, an embedded module for interpreting text prompts, and a diffusion model for generating the final audio output.

This revamped system does not employ distillation but reduces memory usage nearly by half — from 6.5 GB to 3.6 GB. This reduction allows for the model’s deployment on mobile devices for the first time. During testing, researchers utilized the Vivo X200 Pro, an Android phone equipped with 12 GB of RAM and a MediaTek Dimensity 9400 chip launched in late 2024.

Stability AI states that the model excels at generating sound effects and field recordings. However, it still struggles with music, particularly vocalization, and performs best with prompts in English.

The model was trained on around 472,000 clips from the Freesound database, using only materials licensed under CC0, CC BY, or CC-Sampling+. To prevent copyright issues, the team filtered the data through a series of automated checks.

The software is available under the Stability AI Community license for open-source use, with commercial use governed by separate terms. The code can be found on GitHub, while access to the model weights is available via Hugging Face.

Additionally, I’d like to recommend BotHub — a platform where you can test all popular models without restrictions. No VPN is required, and Russian cards can be used. By following this link, you can receive 100,000 free tokens for your initial tasks and start working right away!

[Source](https://the-decoder.com/stability-ai-releases-a-compact-open-text-to-audio-model-that-runs-on-mobile-devices/)