Quickstart
Get started with building AI voice agents on Indian telephony infrastructure in minutes.
This guide will walk you through setting up your first voice agent using FonadaLabs' complete voice pipeline — from acquiring Indian phone numbers to deploying conversational AI with ultra-low latency ASR, TTS, and real-time audio processing.
Get Your API Key
Before you begin, you'll need an API key to authenticate your requests. Create Api Key
Using the Text to Speech API
Install the SDK
We'll use the fonadalabs library to interact with our TTS API.
pip install fonadalabsMake the API request
Create a new file named example.py and add the following code:
from fonadalabs import TTSClient
client = TTSClient(api_key="")
# Generate audio
audio_data = client.generate_audio(
text="Hello! This is FonadaLabs Text-to-Speech API.",
voice="Dhruv",
language="English",
output_file="output.wav"
)
print(f"Generated {len(audio_data)} bytes")Execute the code
python example.pyYou should see the audio file saved to output.mp3 and the byte count printed to the console.
WebSocket Usage
For real-time audio streaming and better performance, you can use our WebSocket endpoint:
📡 WebSocket Endpoint
wss://api.fonada.ai/tts/generate-audio-wsRequest Payload:
{
"api_key": "your-api-key",
"input": "Your text here",
"voice": "Dhruv",
"language": "Hindi"
}Python WebSocket Example:
from fonadalabs import TTSClient, TTSError, CreditsExhaustedError, RateLimitError
import struct
# Initialize the client with your API key
client = TTSClient(api_key="your-api-key")
# Helper function to add WAV header
def add_wav_header(raw_audio, sample_rate=24000):
"""Add WAV header to raw PCM audio data"""
byte_rate = sample_rate * 1 * 16 // 8
block_align = 1 * 16 // 8
header = struct.pack('<4sI4s4sIHHIIHH4sI',
b'RIFF', 36 + len(raw_audio), b'WAVE', b'fmt ',
16, 1, 1, sample_rate, byte_rate, block_align, 16,
b'data', len(raw_audio))
return header + raw_audio
# Define callback functions for real-time updates
def on_chunk(chunk_num, chunk_bytes):
"""Called for each audio chunk received"""
print(f"Chunk {chunk_num} received: {len(chunk_bytes):,} bytes")
def on_complete(stats):
"""Called when generation is complete"""
chunks_sent = stats.get('chunks_sent', 'N/A')
bytes_sent = stats.get('bytes_sent', 'N/A')
print(f"Complete! Generated {chunks_sent} chunks, {bytes_sent:,} bytes")
def on_error(error_msg):
"""Called when an error occurs"""
print(f"Error: {error_msg}")
try:
# Generate audio with WebSocket streaming (returns raw PCM)
audio_data = client.generate_audio_ws(
text="नमस्ते! FonadaLabs Text-to-Speech API में आपका स्वागत है। "
"यह उदाहरण प्रगति ट्रैकिंग के साथ वास्तविक समय ऑडियो निर्माण को प्रदर्शित करता है।",
voice="Dhruv", # Available: Dhruv, Vaanee, Swastik, etc. (see docs for full list)
language="Hindi", # Available: Hindi, English, Tamil, Telugu
on_chunk=on_chunk,
on_complete=on_complete,
on_error=on_error
)
# Add WAV header and save
wav_audio = add_wav_header(audio_data)
with open("output.wav", "wb") as f:
f.write(wav_audio)
print(f"\nAudio saved to output.wav ({len(wav_audio):,} bytes)")
except CreditsExhaustedError as e:
print(f"Credits exhausted!")
print(f"Remaining balance: {e.remaining_balance}")
print(f"Current usage: {e.current_usage}")
print(f"Estimated cost: {e.estimated_cost}")
except RateLimitError as e:
print(f"Rate limit exceeded!")
print(f"Rate limit: {e.rate_limit} requests")
print(f"Retry after: {e.retry_after_seconds} seconds")
except TTSError as e:
print(f"TTS Error: {e}")WebSocket Benefits
- • Real-time streaming: Audio chunks are delivered as they're generated
- • Lower latency: Perfect for interactive applications
- • Progress tracking: Monitor generation progress in real-time
- • Better user experience: Start playing audio while generation continues
Troubleshooting
WebSocket: "message too big" or "frame exceeds limit"
Problem: WebSocket connection closes with "message too big" error
Cause: Audio chunks larger than the default WebSocket frame size (1MB)
Solution: The SDK automatically handles this by setting a 16MB frame size limit. If you still encounter issues with very large audio chunks, the server should split text into smaller segments.
Connection Timeout
Problem: Request times out for long texts
Solution: Increase timeout when initializing the client:
client = TTSClient(timeout=600) # 10 minutesWebSocket WAV File Not Playable
Problem: WebSocket output file won't play in audio players
Cause: WebSocket streams raw PCM audio without WAV headers
Solution: Add WAV header to the raw audio:
import struct
def add_wav_header(raw_file, output_file):
with open(raw_file, 'rb') as f:
audio = f.read()
# 24kHz, 16-bit, mono
header = struct.pack('<4sI4s4sIHHIIHH4sI',
b'RIFF', 36 + len(audio), b'WAVE', b'fmt ',
16, 1, 1, 24000, 48000, 2, 16, b'data', len(audio))
with open(output_file, 'wb') as f:
f.write(header + audio)
add_wav_header('output.wav', 'output_playable.wav')Best Practices
1. Use appropriate method for text length
- • Short texts (<500 chars): Use
generate_audio() - • Long texts (>500 chars): Use
generate_audio_ws()with progress tracking - • Real-time streaming: Use
stream_audio_async()
2. Handle errors gracefully
Always wrap API calls in try-except blocks to handle network issues and API errors.
3. Use async for multiple requests
For generating multiple audio files, use async methods with asyncio.gather() for better performance.