Stability AIStability AIยท๐ŸŽต Music Generation

Stable Audio 2.5

anonymized
Try on Venice.ai โ†—
Quick reference
Stable Audio 2.5 โ€” TLDR
  • - ๐ŸŽต Text-to-audio model for music, sound effects, and ambient textures.
  • - ๐Ÿ“ Generates clips from 5 seconds to over 3 minutes.
  • - โšก Reports under two-second inference for full tracks on a GPU.
  • - ๐Ÿ”ง Adds audio inpainting to extend or remix uploaded clips.
  • - ๐Ÿ†• Post-trained with Stability's Adversarial Relativistic-Contrastive (ARC) method.
  • - ๐ŸŽฏ Improved prompt adherence to mood and musical descriptors.
  • - ๐Ÿข Trained on fully licensed data for commercial use.
  • - ๐ŸŽš๏ธ Builds multi-part compositions with intro, development, and outro.
๐Ÿ’ฐ Pricing
โ€”
๐Ÿ“… On Venice since
Feb 22, 2026
101 days ago
Provider

Stability AI is a UK-based artificial intelligence company best known for creating Stable Diffusion, one of the most widely adopted text-to-image generation models in the AI ecosystem. The company has established itself as a leading force in open-weightโ€ฆ

Read full profile โ†’
3 models on Venice
2 image ยท 1 music
Since Feb 4, 2025

About this model

Stable Audio 2.5 is Stability AI's generative audio model for music and sound design, turning plain-text prompts into high-fidelity tracks, sound effects, and ambient textures. Stability positions it as its first audio model built specifically for enterprise-grade sound production at scale, with output flexibility ranging from short cues to tracks longer than three minutes. It supports text-to-audio, audio-to-audio style transfer, and audio inpainting, where users supply a clip and the model continues or fills it using surrounding context. Within Venice's catalog it sits alongside Stability-derived image siblings such as Venice SD35 and Lustify SDXL, though those address image rather than audio generation.

Compared with earlier Stable Audio releases, version 2.5 emphasizes speed and musical structure. Stability reports inference of less than two seconds on a GPU for tracks up to three minutes, attributing this to post-training with its Adversarial Relativistic-Contrastive (ARC) method developed by the Stable Audio research team. The company also describes elevated musical composition, generating multi-part pieces with intros, developments, and outros, plus improved prompt adherence to mood words like "uplifting" and instrument language like "lush synthesizers."

Audio inpainting is a notable functional addition over prior text-to-audio and audio-to-audio workflows, letting users mark where their input should start so the model fills the remainder coherently. Outputs are delivered at high sample rates suitable for professional mixing, and Stability says it can fine-tune the model on an organization's own sound library to embed signature brand audio.

This About section is AI-generated from public sources (Claude Opus 4.8), with no human editing. It may contain inaccuracies โ€” verify critical details against the sources listed above.

Data sources: Venice API ยท HuggingFace ยท Wikipedia โ€” enrichment updated 1d ago