Stability AI·🎵 Music Generation

Stable Audio 2.5

anonymized

Try on Venice.ai ↗

Quick reference

Stable Audio 2.5 — TLDR

- 🎵 Text-to-audio model for music, sound effects, and ambient textures.
- 📏 Generates clips from 5 seconds to over 3 minutes.
- ⚡ Reports under two-second inference for full tracks on a GPU.
- 🔧 Adds audio inpainting to extend or remix uploaded clips.
- 🆕 Post-trained with Stability's Adversarial Relativistic-Contrastive (ARC) method.
- 🎯 Improved prompt adherence to mood and musical descriptors.
- 🏢 Trained on fully licensed data for commercial use.
- 🎚️ Builds multi-part compositions with intro, development, and outro.

💰 Pricing

—

📅 On Venice since

Feb 22, 2026

147 days ago

Provider

Stability AI

Stability AI is a UK-based artificial intelligence company best known for creating Stable Diffusion, one of the most widely adopted text-to-image generation models in the AI ecosystem. The company has established itself as a leading force in open-weight…

Read full profile →

3 models on Venice

2 image · 1 music

Since Feb 4, 2025

Wikipedia ↗Official site ↗

See 2 other models from Stability AI →

About this model

Stable Audio 2.5 is Stability AI's generative audio model for music and sound design, turning plain-text prompts into high-fidelity tracks, sound effects, and ambient textures. Stability positions it as its first audio model built specifically for enterprise-grade sound production at scale, with output flexibility ranging from short cues to tracks longer than three minutes. It supports text-to-audio, audio-to-audio style transfer, and audio inpainting, where users supply a clip and the model continues or fills it using surrounding context. Within Venice's catalog it sits alongside Stability-derived image siblings such as Venice SD35 and Lustify SDXL, though those address image rather than audio generation.

Compared with earlier Stable Audio releases, version 2.5 emphasizes speed and musical structure. Stability reports inference of less than two seconds on a GPU for tracks up to three minutes, attributing this to post-training with its Adversarial Relativistic-Contrastive (ARC) method developed by the Stable Audio research team. The company also describes elevated musical composition, generating multi-part pieces with intros, developments, and outros, plus improved prompt adherence to mood words like "uplifting" and instrument language like "lush synthesizers."

Audio inpainting is a notable functional addition over prior text-to-audio and audio-to-audio workflows, letting users mark where their input should start so the model fills the remainder coherently. Outputs are delivered at high sample rates suitable for professional mixing, and Stability says it can fine-tune the model on an organization's own sound library to embed signature brand audio.

Sources

Stable Audio 3.0 | Generative Audio Models — Stability AIstability.ai ↗

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies — verify critical details against the sources listed above.

Data sources: Venice API · HuggingFace · Wikipedia — enrichment updated 4d ago