Klone V2 Pro

Voice Clone

Klone V2 Pro is our voice-cloning model. Synthesize speech in the style of a Voice Arena community voice using its share_id, or (on enterprise tiers) from your own reference audio - with style hints, fixed durations, and telephony-ready output codecs.

Overview

Klone V2 Pro is the voice-cloning model used in the TTS playground when you select the Klone V2 Pro model. It synthesizes speech in the style of a Voice Arena community voice identified by an 8-character share_id (visible on each voice in Voice Arena → Mine tab, or via search).

Key characteristics

  • Community / cloned voices via share_id - multipart form request.
  • 50+ language codes; billed at 0.5 credits per character (multiplier may apply on paid public voices).
  • Languages: Use short ISO-style codes (e.g. hi, en, ta).
Looking for standard pre-built system voices instead? See the Fonada V1 documentation.

Reference voice - provide exactly one

Choose the voice to clone using exactly one of the sources below. Sending more than one returns 400 ambiguous_audio_source; sending none returns 400 missing_audio_source.

FieldTypeTierDescription
share_idstringAll tiersCatalog / community voice share ID. Uses the stored voice embedding; audio / audio_url / audio_text are ignored.
audiofileEnterpriseReference audio upload (WAV/MP3/M4A/FLAC/OGG/WebM). Max 10 MB, max 30 s.
audio_urlstringEnterpriseHTTPS URL of a reference clip. audio_text (transcript) is required when using a URL.

Using audio / audio_url on a non-enterprise tier returns 403 enterprise_only.

Generate audio

Catalog / community voice

Returns audio inline (WAV by default). Replace YOUR_FONADA_API_KEY with your API key and YOUR_SHARE_ID with the voice's share ID from Voice Arena.

curl
curl -X POST "https://api.fonada.ai/v1/voice-clone/chunks" \ -H "Authorization: Bearer YOUR_FONADA_API_KEY" \ -F "share_id=YOUR_SHARE_ID" \ -F "text=Hello, this is a voice clone test using a catalog voice." \ -F "language=hi" \ -o cloned_output.wav

API endpoint

Base URL: https://api.fonada.ai
Endpoint: /v1/voice-clone/chunks
Method: POST
Content-Type: multipart/form-data
Authorization: Bearer YOUR_FONADA_API_KEY

Synthesis parameters (multipart form)

FieldTypeDefaultDescription
textstringrequiredText to synthesize in the cloned voice. Empty → 400 empty_text.
languagestringhiISO code (e.g. hi, en, ta) — not the full language name used in Fonada V1.
speedfloat1.0Speed factor, range 0.254.0. Overridden by duration if both set.
durationfloatnoneFixed output duration in seconds, range 0.160.0. Overrides speed; single-chunk text only.
instructstringnoneComma-separated style hints (e.g. "angry, whispering"), forwarded verbatim.
audio_textstringnoneTranscript of the reference clip. Optional for uploads (auto-transcribed if omitted); required for audio_url. Ignored when share_id is used.
output_audio_codecstringwavOutput audio format - see Output audio formats below.

Output audio formats

The model always produces 24 kHz PCM WAV; non-wav values are transcoded server-side. Set the output_audio_codec form field to control the format.

CodecAliasesEncodingSample rateContent-TypeLong text
wavPCM WAV (default)24 kHzaudio/wavAny length
mp3MP3 (128 kbps)24 kHzaudio/mpegMax 450 chars
opusOpus in Ogg (128 kbps)48 kHz1audio/oggMax 450 chars
pcmlinear16Raw headerless 16-bit LE PCM24 kHzaudio/pcmAny length
mulawulawμ-law (telephony)8 kHzaudio/basicAny length
alawA-law (telephony)8 kHzaudio/x-alaw-basicAny length

1 Opus always operates at 48 kHz by design (the encoder upsamples the 24 kHz source). pcm, mulaw, and alaw are headerless raw streams — see Playing raw output below. mp3 / opus cannot be concatenated across chunks, so the SDK produces them in a single request capped at 450 characters.

Response: raw audio bytes in the requested codec (WAV by default). Use -o filename.ext to save the file.

Private voices: Only the voice owner (or an admin) may synthesize with a private voice's share_id. Public voices can be used by any authenticated customer with a valid API key.

Response headers

  • Content-Type — media type for the codec (see table above)
  • Content-Dispositioninline; filename="voice_clone.<ext>"
  • X-Output-Codec — normalized codec name (wav, mp3, opus, pcm, mulaw, alaw)
  • X-Processing-Time-Ms — end-to-end server processing time
  • X-Upstream-Time-Ms — upstream cloner round-trip time

Playing raw output

pcm, mulaw, and alaw are headerless — tell the player the format and sample rate:

bash
# PCM (s16le, 24 kHz, mono) ffplay -f s16le -ar 24000 -ac 1 cloned.pcm # mu-law (8 kHz, mono) ffplay -f mulaw -ar 8000 -ac 1 cloned.ulaw # A-law (8 kHz, mono) ffplay -f alaw -ar 8000 -ac 1 cloned.alaw # Convert raw mu-law to a playable WAV ffmpeg -f mulaw -ar 8000 -ac 1 -i cloned.ulaw cloned.wav

More REST examples

curl — MP3 output
curl -X POST "https://api.fonada.ai/v1/voice-clone/chunks" \ -H "Authorization: Bearer YOUR_FONADA_API_KEY" \ -F "share_id=YOUR_SHARE_ID" \ -F "text=Hello from voice cloning." \ -F "language=en" \ -F "output_audio_codec=mp3" \ -o cloned.mp3
curl — telephony (mu-law, 8 kHz)
# Telephony mu-law (8 kHz) for IVR / SIP curl -X POST "https://api.fonada.ai/v1/voice-clone/chunks" \ -H "Authorization: Bearer YOUR_FONADA_API_KEY" \ -F "share_id=YOUR_SHARE_ID" \ -F "text=Welcome to support. How can I help you today?" \ -F "language=en" \ -F "output_audio_codec=mulaw" \ -o cloned.ulaw
curl — Enterprise reference upload
# Enterprise: clone from your own reference audio + style + fixed duration curl -X POST "https://api.fonada.ai/v1/voice-clone/chunks" \ -H "Authorization: Bearer YOUR_FONADA_API_KEY" \ -F "text=This is a custom cloned voice." \ -F "language=en" \ -F "audio=@/path/to/reference.wav" \ -F "audio_text=Exact transcript of the reference recording." \ -F "instruct=cheerful, energetic" \ -F "duration=5.0" \ -o cloned.wav
Python (httpx)
import httpx with open("cloned.mp3", "wb") as f: resp = httpx.post( "https://api.fonada.ai/v1/voice-clone/chunks", headers={"Authorization": "Bearer YOUR_FONADA_API_KEY"}, data={ "text": "Hello from voice cloning.", "language": "en", "share_id": "YOUR_SHARE_ID", "output_audio_codec": "mp3", }, timeout=180.0, ) resp.raise_for_status() print("codec:", resp.headers.get("X-Output-Codec")) print("upstream ms:", resp.headers.get("X-Upstream-Time-Ms")) f.write(resp.content)

Python SDK (model v2)

Voice cloning is exposed through the existing TTSClient - set model="v2" and provide a reference voice. A dedicated VoiceCloneClient is also available if you prefer a voice-clone-only entry point. The SDK auto-splits long text, paces requests to the rate limit, and merges the audio for you.

install
pip install fonadalabs
Python — sync
from fonadalabs.tts.client import TTSClient client = TTSClient(api_key="YOUR_FONADA_API_KEY") audio_bytes = client.generate_audio( text="Hello! This speech is generated with voice cloning.", language="English", model="v2", share_id="YOUR_SHARE_ID", output_file="output.wav", ) print(f"Generated {len(audio_bytes):,} bytes of WAV audio")
Python — async
import asyncio from fonadalabs.tts.client import TTSClient async def main(): client = TTSClient() audio = await client.generate_audio_async( "Hello from async voice cloning.", language="English", model="v2", share_id="YOUR_SHARE_ID", ) return audio asyncio.run(main())

Long text & chunking

  • Input is split sentence-aware (soft 100 / hard 200 chars per chunk) — you do not chunk text yourself.
  • A per-API-key request tunnel paces calls (~10 req/min) so long jobs do not hit rate limits. Estimate: minutes ≈ chunks ÷ 10.
  • wav/pcm/mulaw/alaw merge cleanly at any length; mp3/opus are single-request, max 450 chars.

Rate limits

PeriodLimit
Minute10 requests
Hour100 requests
Day500 requests

Plus a global concurrency cap of 40 in-flight requests. Exceeding either returns 429 rate_limit_exceeded with a retry_after_seconds hint.

Errors

All errors return JSON: {"detail": {"error": "<code>", "message": "<text>"}}

HTTPerrorCause
400empty_texttext was empty / whitespace
400invalid_output_audio_codecCodec not in the supported set
400ambiguous_audio_sourceMore than one reference source supplied
400missing_audio_sourceNo reference source supplied
400invalid_speed / invalid_durationspeed or duration out of range
400audio_too_large / audio_too_longReference upload exceeds 10 MB / 30 s
403enterprise_onlyaudio / audio_url on a non-enterprise tier
403invalid_api_key / inactive_api_keyBad or disabled API key
404share not foundUnknown share_id
429credits_exhaustedInsufficient credits / credit limit reached
429rate_limit_exceededPer-user rate limit or concurrency cap reached
500encoding_failedServer-side transcode to the requested codec failed
502upstream_errorUpstream cloner failed, was unreachable, or returned no audio

Related endpoints

EndpointPurpose
POST /v1/voice-clone/chunksOne-shot synchronous clone (this doc) — returns audio inline
POST /v1/voice-clone/generateAsynchronous clone with storage chunks + job tracking (for very long text)
GET /v1/voice-clone/languagesList supported language codes

Browser / playground integration: The Fonadalabs web app calls the fonada-api Supabase edge function with action: 'voice-clone-tts' so your API key stays server-side. For long-running jobs the platform may use async chunk delivery; the direct /v1/voice-clone/chunks call above is the synchronous inline-audio response shape for integrations.

See also: Voice Arena documentation for creating voices, share IDs, and pricing.

Voice quality

Klone V2 Pro

Voice cloning

Voice Clone

Community voices from Voice Arena — clone-style synthesis via share_id and language code.

API endpoint
/v1/voice-clone/chunks
Languages
50+ (ISO codes)
Voice param
share_id
Billing
0.5 credits / character
Output
wav, mp3, opus, pcm, mulaw, alaw

FAQ

Klone V2 Pro is our voice-cloning model. Instead of choosing a fixed system voice, you synthesize speech in the style of a Voice Arena community voice identified by an 8-character share_id, or (on enterprise tiers) from your own reference audio.