Spark-TTS one-click startup package, easily achieve personalized voice synthesis.

Spark-TTS is a text-to-speech system based on the Qwen 2.5 model, supporting personalized voice synthesis 🎤, adjustable voice features, and zero-shot voice cloning capabilities 🔥. It is suitable for creating audiobooks 📚, virtual hosts, and multilingual content 🌍!

Spark-TTS: Making Text-to-Speech More Natural and Efficient

Spark-TTS Logo

Spark-TTS is an efficient text-to-speech (TTS) system based on the Qwen2.5 model, designed to provide a natural and personalized voice synthesis experience. It supports precise adjustments of features such as gender, tone, and speed, and can achieve zero-shot voice cloning, generating high-quality personalized voices without reference audio. This system employs a BiCodec encoder, streamlining the architecture and improving inference efficiency. The integration with Qwen2.5 allows it to directly handle TTS tasks using a large language model, eliminating the need for additional acoustic models.

Key Features

  • Zero-shot Text-to-Speech Conversion: No extra training required.
  • Supports Bilingual Functionality: Easily achieve cross-language synthesis.
  • Controllable Voice Generation: Adjustable parameters like timbre and speed to create diverse voice effects.

One-Click Launch Package Usage Guide

The Spark-TTS tool has been packaged as a local one-click launch. With just simple steps, you can use it on your personal computer without worrying about privacy leaks or environment configuration issues.

Computer Configuration Requirements


Windows 10/11 64-bit operating system, NVIDIA graphics card with 8GB VRAM or more, CUDA >= 12.1

Download and Usage Instructions

  1. Download the Compressed Package
    Link: https://localai.top/29/

  2. Extract Files
    Extract the files, avoiding non-English paths, and then double-click the “run.exe” file to run it.

    Extraction and Running Example

  3. Browser Access
    The software will automatically open a browser for you to access.

    • Voice Cloning
      Voice Cloning

    • Voice Creation
      Voice Creation

Application Scenarios

  • Audiobook Production: Its natural voice generation capability makes it ideal for creating audiobooks.
  • Virtual Streamers: Supports personalized voice generation, providing various voice styles for virtual streamers.
  • Multilingual Content Creation: Its cross-language generation capability meets the needs of multilingual voice synthesis.