Open-source music generation ACE-Step one-click launch package.
ACE-Step is a new open-source music generation model 🎶 that combines advanced technology 🚀 to enhance generation speed and musical coherence ✨. It supports features such as text-to-original music generation and voice cloning, providing powerful tools for creators 🎤!
ACE-Step: A New Generation of Open-Source Music Generation Model
ACE-Step is an open-source music generation foundation model jointly launched by StepFun AI and ACE Studio on May 8, 2025. Through innovative architecture design, it effectively addresses the bottlenecks of existing music generation technologies and achieves significant improvements in generation speed, music coherence, and controllability.
Technical Innovations
The core of ACE-Step lies in its unique hybrid architecture, which cleverly combines the following technologies:
- Diffusion Model: Responsible for generating high-quality audio.
- Sana’s Deep Compressing Autoencoder (DCAE): Used for efficient audio compression and reconstruction.
- Lightweight Linear Transformer: Handles long-term temporal dependencies in music.
This architecture overcomes the limitations of existing music generation methods.
Compared to other models, ACE-Step’s advantages include:
- Ultra-High Efficiency: It only takes 20 seconds to synthesize music up to 4 minutes long on an A100 GPU, which is 15 times faster than LLM-based models.
- Excellent Musical Coherence: Excels in melody, harmony, and rhythm, with more accurate lyric alignment.
- Detail Preservation: Able to retain fine acoustic details and support advanced control.
At the technical level, ACE-Step also utilizes MERT and m-hubert to align semantic representations during training (REPA), thereby achieving rapid convergence. This comprehensive approach addresses the inherent problems faced by existing methods, such as LLM-based models (like Yue, SongGen) performing well in lyric alignment but having slow inference speeds, while diffusion models (like DiffRhythm) can achieve faster synthesis but often lack long-range structural coherence.
One-Click Launch Package User Guide
To make it easy for everyone to use, we provide a local one-click launch package that allows you to easily experience ACE-Step on your personal computer without worrying about privacy leaks and complex environment configuration issues.
Computer Configuration Requirements
- Windows 10/11 64-bit operating system
- Nvidia graphics card with 8GB or more of video memory
- CUDA >= 12.1
Download and Usage Tutorial
-
Download the compressed package:
Download address: https://1drv.ms/f/c/D9E7E4EAB666A442/Eu9kAXFFmrdCiySxY75xkRABlGAk0UJwE9vZP7JuzkYYrQ
-
Unzip the file:
- After decompression, please ensure that the file path does not contain non-English characters.
- Double-click the “run.exe” file to run.
-
Browser Access:
The software will automatically open the browser interface, and you can start experiencing ACE-Step!
Main Features and Applications
ACE-Step offers a wealth of features and a wide range of applications:
- Text-to-Music Generation: Generate original music through natural language descriptions, supporting various music genres.
- Advanced Control Capabilities: Supports voice cloning, lyric editing, mixing, and track generation (such as lyrics-to-vocals, singing-to-accompaniment).
- Full Song Generation: Able to generate complete songs and control the song length.
In application fields, ACE-Step can be widely used in creative production, education, and entertainment, providing powerful creative tools for music artists, producers, and content creators, seamlessly integrating into creative workflows.