logo

AI-Flow

Sound Generation

Speech 2.6 Turbo

Low-latency, multilingual text-to-speech with 300+ voices and expressive emotions—MiniMax Speech 2.6 Turbo on Replicate.

About This Template

MiniMax Speech 2.6 Turbo delivers fast, natural text-to-speech optimized for real-time and interactive applications. Choose from 300+ curated voices (or your own cloned voice), switch emotions on the fly, and synthesize lifelike audio in 40+ languages—all with predictable, usage-based pricing. Key capabilities: - Low-latency synthesis ideal for chat agents, voice bots, and UI feedback - 300+ voices plus support for voice cloning via minimax/voice-cloning - Emotions set to auto or explicitly (happy, calm, surprised, neutral, and more) - 40+ languages with optional language boosting for better pronunciation - Flexible audio controls: speed, pitch, volume, sample rate, bitrate, mono/stereo - Export to mp3, wav, flac, or pcm; optional sentence-level subtitles (non-streaming) Inputs at a glance: - text (required): Up to 10,000 characters. Insert pauses with markers like <#0.5#> - voice_id: Any MiniMax system voice or a cloned voice ID - emotion: auto or a specific emotion - audio_format: mp3, wav, flac, or pcm - sample_rate: 8000–44100 Hz; bitrate options for mp3 - channel: mono or stereo - speed, pitch, volume: Fine-tune speaking rate, semitone shift (−12 to +12), and loudness - language_boost: Automatic or a specific language - english_normalization: Improves number/date reading in English (adds slight latency) - subtitle_enable: Returns sentence timestamps in non-streaming mode Output: - A hosted audio file URL (mp3/wav/flac/pcm), with MiniMax metadata such as character count and optional subtitles. Pricing: - $0.06 per 1,000 input tokens (roughly one token ≈ one character) - Audio output is not billed—only input text is counted When to choose Turbo vs HD: - Use Speech 2.6 Turbo for low-latency, interactive experiences - Prefer Speech 2.6 HD for maximum fidelity in long-form content like audiobooks and high-end voiceovers

Sound Generation
Quick to set up
Fully customizable
Ready to use

Template Workflow

Speech 2.6 Turbo workflow screenshot

How to Use This Template

1

Step 1: Enter your text in 'Text' Node

In the 'Text' node, enter your instructions.

Example :
Once upon a quiet evening, in a small town that most people passed without noticing, something unexpected was about to happen.
2

Step 2: Configure 'Voice' Node

Configure the 'Voice' node as needed.

Example :
Wise_Woman
3

Step 3: Run the Flow

Click the 'Run' button to execute the flow and get the final output.

Who is this for?

Perfect for professionals and creators looking to streamline their workflow

Developers building real-time voice experiences

Create chat agents, live voice assistants, and interactive apps that need low-latency, natural speech in many languages.

Product teams and UX designers

Add dynamic audio prompts, confirmations, and brand voices to product flows with consistent quality and fast response.

Support, IVR, and operations teams

Power multilingual IVR menus, helpdesk handoffs, status updates, and automated customer communications.

Content creators and marketers

Generate quick voiceovers for demos, explainers, and social content; clone voices for consistent brand sound.

Educators and training teams

Produce interactive tutorials and localized lessons across 40+ languages with emotion control and timestamped subtitles.

Ready to Get Started?

Use this template in AI-Flow and start creating in minutes

Frequently Asked Questions

What makes Speech 2.6 Turbo different from the HD variant?

Turbo is tuned for low latency and real-time interactions, like voice bots or live app feedback. HD prioritizes maximum fidelity for long-form, production-grade narration such as audiobooks and premium voiceovers.

How many languages and voices are supported?

MiniMax supports 40+ languages and dialects with optional language boosting. You can choose from 300+ curated voices or use a voice_id from the minimax/voice-cloning model.

How do I control style and emotions?

Set emotion to auto for inferred tone or pick a specific emotion (e.g., happy, calm, surprised, neutral). You can further adjust speed, pitch (−12 to +12 semitones), and volume to fine-tune delivery.

What audio formats and settings are available?

Export audio as mp3, wav, flac, or pcm. Configure sample_rate (8000–44100 Hz), channel (mono or stereo), and for mp3 select bitrate (32k, 64k, 128k, 256k).

Can I insert pauses in the speech?

Yes. Use inline markers like <#0.5#> within your text to pause for the specified number of seconds.

Does it support subtitles or timestamps?

Enable subtitle_enable to return sentence-level timestamps and subtitle metadata (available in non-streaming mode).

What’s the character limit for input text?

You can pass up to 10,000 characters in a single request.

Should I enable english_normalization?

Enable it if your English content contains numbers, dates, or structured text that benefits from improved normalization. It may add a small amount of latency.

What is AI-FLOW and how can it help me?

AI-FLOW is an all-in-one AI platform that allows you to build, integrate, and automate AI-powered workflows using an intuitive drag-and-drop interface. Whether you're a beginner or an expert, you can leverage multiple AI models to create innovative solutions without any coding required.

Is there a free trial available?

Yes, AI-FLOW offers a free trial to get you started. After that, you can purchase credits as needed—no subscription or long-term commitment required.

Can I integrate my API keys from providers like OpenAI and Replicate with AI-FLOW Cloud Version ?

Yes, you can easily integrate your existing API keys with AI-FLOW. If specified, nodes related to the API Key provided will use your API key, significantly reducing your platform credit usage.