Klone V2 Pro

Voice Clone

Klone V2 Pro is our voice-cloning model. Synthesize speech in the style of a Voice Arena community voice using its share_id, or (on enterprise tiers) from your own reference audio - with style hints, fixed durations, and telephony-ready output codecs.

Overview

Klone V2 Pro is the voice-cloning model used in the TTS playground when you select the Klone V2 Pro model. It synthesizes speech in the style of a Voice Arena community voice identified by an 8-character share_id (visible on each voice in Voice Arena → Mine tab, or via search).

Key characteristics

Community / cloned voices via share_id - multipart form request.
50+ language codes; billed at 0.5 credits per character (multiplier may apply on paid public voices).
Languages: Use short ISO-style codes (e.g. hi, en, ta).

Looking for standard pre-built system voices instead? See the Fonada V1 documentation.

Reference voice - provide exactly one

Choose the voice to clone using exactly one of the sources below. Sending more than one returns 400 ambiguous_audio_source; sending none returns 400 missing_audio_source.

Field	Type	Tier	Description
share_id	string	All tiers	Catalog / community voice share ID. Uses the stored voice embedding; `audio` / `audio_url` / `audio_text` are ignored.
audio	file	Enterprise	Reference audio upload (WAV/MP3/M4A/FLAC/OGG/WebM). Max 10 MB, max 30 s.
audio_url	string	Enterprise	HTTPS URL of a reference clip. `audio_text` (transcript) is required when using a URL.

Using audio / audio_url on a non-enterprise tier returns 403 enterprise_only.

Generate audio

Catalog / community voice

Returns audio inline (WAV by default). Replace YOUR_FONADA_API_KEY with your API key and YOUR_SHARE_ID with the voice's share ID from Voice Arena.

curl

curl -X POST "https://api.fonada.ai/v1/voice-clone/chunks" \
  -H "Authorization: Bearer YOUR_FONADA_API_KEY" \
  -F "share_id=YOUR_SHARE_ID" \
  -F "text=Hello, this is a voice clone test using a catalog voice." \
  -F "language=hi" \
  -o cloned_output.wav

API endpoint

Base URL: https://api.fonada.ai
Endpoint: /v1/voice-clone/chunks
Method: POST
Content-Type: multipart/form-data
Authorization: Bearer YOUR_FONADA_API_KEY

Synthesis parameters (multipart form)

Field	Type	Default	Description
text	string	required	Text to synthesize in the cloned voice. Empty → `400 empty_text`.
language	string	`hi`	ISO code (e.g. `hi`, `en`, `ta`) — not the full language name used in Fonada V1.
speed	float	`1.0`	Speed factor, range `0.25`–`4.0`. Overridden by `duration` if both set.
duration	float	none	Fixed output duration in seconds, range `0.1`–`60.0`. Overrides `speed`; single-chunk text only.
instruct	string	none	Comma-separated style hints (e.g. `"angry, whispering"`), forwarded verbatim.
audio_text	string	none	Transcript of the reference clip. Optional for uploads (auto-transcribed if omitted); required for `audio_url`. Ignored when `share_id` is used.
output_audio_codec	string	`wav`	Output audio format - see Output audio formats below.

Output audio formats

The model always produces 24 kHz PCM WAV; non-wav values are transcoded server-side. Set the output_audio_codec form field to control the format.

Codec	Aliases	Encoding	Sample rate	Content-Type	Long text
wav	—	PCM WAV (default)	24 kHz	`audio/wav`	Any length
mp3	—	MP3 (128 kbps)	24 kHz	`audio/mpeg`	Max 450 chars
opus	—	Opus in Ogg (128 kbps)	48 kHz¹	`audio/ogg`	Max 450 chars
pcm	`linear16`	Raw headerless 16-bit LE PCM	24 kHz	`audio/pcm`	Any length
mulaw	`ulaw`	μ-law (telephony)	8 kHz	`audio/basic`	Any length
alaw	—	A-law (telephony)	8 kHz	`audio/x-alaw-basic`	Any length

¹ Opus always operates at 48 kHz by design (the encoder upsamples the 24 kHz source). pcm, mulaw, and alaw are headerless raw streams — see Playing raw output below. mp3 / opus cannot be concatenated across chunks, so the SDK produces them in a single request capped at 450 characters.

Response: raw audio bytes in the requested codec (WAV by default). Use -o filename.ext to save the file.

Private voices: Only the voice owner (or an admin) may synthesize with a private voice's share_id. Public voices can be used by any authenticated customer with a valid API key.

Response headers

Content-Type — media type for the codec (see table above)
Content-Disposition — inline; filename="voice_clone.<ext>"
X-Output-Codec — normalized codec name (wav, mp3, opus, pcm, mulaw, alaw)
X-Processing-Time-Ms — end-to-end server processing time
X-Upstream-Time-Ms — upstream cloner round-trip time

Playing raw output

pcm, mulaw, and alaw are headerless — tell the player the format and sample rate:

bash

# PCM (s16le, 24 kHz, mono)
ffplay -f s16le -ar 24000 -ac 1 cloned.pcm

# mu-law (8 kHz, mono)
ffplay -f mulaw -ar 8000 -ac 1 cloned.ulaw

# A-law (8 kHz, mono)
ffplay -f alaw -ar 8000 -ac 1 cloned.alaw

# Convert raw mu-law to a playable WAV
ffmpeg -f mulaw -ar 8000 -ac 1 -i cloned.ulaw cloned.wav

More REST examples

curl — MP3 output

curl -X POST "https://api.fonada.ai/v1/voice-clone/chunks" \
  -H "Authorization: Bearer YOUR_FONADA_API_KEY" \
  -F "share_id=YOUR_SHARE_ID" \
  -F "text=Hello from voice cloning." \
  -F "language=en" \
  -F "output_audio_codec=mp3" \
  -o cloned.mp3

curl — telephony (mu-law, 8 kHz)

# Telephony mu-law (8 kHz) for IVR / SIP
curl -X POST "https://api.fonada.ai/v1/voice-clone/chunks" \
  -H "Authorization: Bearer YOUR_FONADA_API_KEY" \
  -F "share_id=YOUR_SHARE_ID" \
  -F "text=Welcome to support. How can I help you today?" \
  -F "language=en" \
  -F "output_audio_codec=mulaw" \
  -o cloned.ulaw

curl — Enterprise reference upload

# Enterprise: clone from your own reference audio + style + fixed duration
curl -X POST "https://api.fonada.ai/v1/voice-clone/chunks" \
  -H "Authorization: Bearer YOUR_FONADA_API_KEY" \
  -F "text=This is a custom cloned voice." \
  -F "language=en" \
  -F "audio=@/path/to/reference.wav" \
  -F "audio_text=Exact transcript of the reference recording." \
  -F "instruct=cheerful, energetic" \
  -F "duration=5.0" \
  -o cloned.wav

Python (httpx)

import httpx

with open("cloned.mp3", "wb") as f:
    resp = httpx.post(
        "https://api.fonada.ai/v1/voice-clone/chunks",
        headers={"Authorization": "Bearer YOUR_FONADA_API_KEY"},
        data={
            "text": "Hello from voice cloning.",
            "language": "en",
            "share_id": "YOUR_SHARE_ID",
            "output_audio_codec": "mp3",
        },
        timeout=180.0,
    )
    resp.raise_for_status()
    print("codec:", resp.headers.get("X-Output-Codec"))
    print("upstream ms:", resp.headers.get("X-Upstream-Time-Ms"))
    f.write(resp.content)

Python SDK (model v2)

Voice cloning is exposed through the existing TTSClient - set model="v2" and provide a reference voice. A dedicated VoiceCloneClient is also available if you prefer a voice-clone-only entry point. The SDK auto-splits long text, paces requests to the rate limit, and merges the audio for you.

install

pip install fonadalabs

Python — sync

from fonadalabs.tts.client import TTSClient

client = TTSClient(api_key="YOUR_FONADA_API_KEY")

audio_bytes = client.generate_audio(
    text="Hello! This speech is generated with voice cloning.",
    language="English",
    model="v2",
    share_id="YOUR_SHARE_ID",
    output_file="output.wav",
)

print(f"Generated {len(audio_bytes):,} bytes of WAV audio")

Python — async

import asyncio
from fonadalabs.tts.client import TTSClient

async def main():
    client = TTSClient()
    audio = await client.generate_audio_async(
        "Hello from async voice cloning.",
        language="English",
        model="v2",
        share_id="YOUR_SHARE_ID",
    )
    return audio

asyncio.run(main())

Long text & chunking

Input is split sentence-aware (soft 100 / hard 200 chars per chunk) — you do not chunk text yourself.
A per-API-key request tunnel paces calls (~10 req/min) so long jobs do not hit rate limits. Estimate: minutes ≈ chunks ÷ 10.
wav/pcm/mulaw/alaw merge cleanly at any length; mp3/opus are single-request, max 450 chars.

Rate limits

Period	Limit
Minute	10 requests
Hour	100 requests
Day	500 requests

Plus a global concurrency cap of 40 in-flight requests. Exceeding either returns 429 rate_limit_exceeded with a retry_after_seconds hint.

Errors

All errors return JSON: {"detail": {"error": "<code>", "message": "<text>"}}

HTTP	error	Cause
400	empty_text	`text` was empty / whitespace
400	invalid_output_audio_codec	Codec not in the supported set
400	ambiguous_audio_source	More than one reference source supplied
400	missing_audio_source	No reference source supplied
400	invalid_speed / invalid_duration	`speed` or `duration` out of range
400	audio_too_large / audio_too_long	Reference upload exceeds 10 MB / 30 s
403	enterprise_only	`audio` / `audio_url` on a non-enterprise tier
403	invalid_api_key / inactive_api_key	Bad or disabled API key
404	share not found	Unknown `share_id`
429	credits_exhausted	Insufficient credits / credit limit reached
429	rate_limit_exceeded	Per-user rate limit or concurrency cap reached
500	encoding_failed	Server-side transcode to the requested codec failed
502	upstream_error	Upstream cloner failed, was unreachable, or returned no audio

Related endpoints

Endpoint	Purpose
POST /v1/voice-clone/chunks	One-shot synchronous clone (this doc) — returns audio inline
POST /v1/voice-clone/generate	Asynchronous clone with storage chunks + job tracking (for very long text)
GET /v1/voice-clone/languages	List supported language codes

Browser / playground integration: The Fonadalabs web app calls the fonada-api Supabase edge function with action: 'voice-clone-tts' so your API key stays server-side. For long-running jobs the platform may use async chunk delivery; the direct /v1/voice-clone/chunks call above is the synchronous inline-audio response shape for integrations.

See also: Voice Arena documentation for creating voices, share IDs, and pricing.

Voice quality

Klone V2 Pro

Voice cloning

Voice Clone

Community voices from Voice Arena — clone-style synthesis via share_id and language code.

API endpoint: /v1/voice-clone/chunks
Languages: 50+ (ISO codes)
Voice param: share_id
Billing: 0.5 credits / character
Output: wav, mp3, opus, pcm, mulaw, alaw

FAQ

Klone V2 Pro is our voice-cloning model. Instead of choosing a fixed system voice, you synthesize speech in the style of a Voice Arena community voice identified by an 8-character share_id, or (on enterprise tiers) from your own reference audio.

Next: Fonada V1

Klone V2 Pro

Overview

Key characteristics

Reference voice - provide exactly one

Generate audio

Catalog / community voice

API endpoint

Synthesis parameters (multipart form)

Output audio formats

Response headers

Playing raw output

More REST examples

Python SDK (model v2)

Long text & chunking

Rate limits

Errors

Related endpoints

Voice quality

Klone V2 Pro

FAQ

What is Klone V2 Pro?

Which languages does Klone V2 Pro support?

How is Klone V2 Pro billed?

What output formats are available?

How does long text work?

Can I clone from my own reference audio?

On this page