Quickstart

Get started with building AI voice agents on Indian telephony infrastructure in minutes.

This guide will walk you through setting up your first voice agent using FonadaLabs' complete voice pipeline — from acquiring Indian phone numbers to deploying conversational AI with ultra-low latency ASR, TTS, and real-time audio processing.

Get Your API Key

Before you begin, you'll need an API key to authenticate your requests. Create Api Key

Using the Text to Speech API

1

Install the SDK

We'll use the fonadalabs library to interact with our TTS API.

Python
pip install fonadalabs
2

Make the API request

Create a new file named example.py and add the following code:

Python
from fonadalabs import TTSClient
client = TTSClient(api_key="")
# Generate audio
audio_data = client.generate_audio(
    text="Hello! This is FonadaLabs Text-to-Speech API.",
    voice="Dhruv",
    language="English",
    output_file="output.wav"
)
print(f"Generated {len(audio_data)} bytes")
3

Execute the code

Python
python example.py

You should see the audio file saved to output.mp3 and the byte count printed to the console.

4

WebSocket Usage

For real-time audio streaming and better performance, you can use our WebSocket endpoint:

📡 WebSocket Endpoint

wss://api.fonada.ai/tts/generate-audio-ws

Request Payload:

JSON
{
  "api_key": "your-api-key",
  "input": "Your text here",
  "voice": "Dhruv",
  "language": "Hindi"
}

Python WebSocket Example:

Python
from fonadalabs import TTSClient, TTSError, CreditsExhaustedError, RateLimitError
import struct

# Initialize the client with your API key
client = TTSClient(api_key="your-api-key")


# Helper function to add WAV header
def add_wav_header(raw_audio, sample_rate=24000):
    """Add WAV header to raw PCM audio data"""
    byte_rate = sample_rate * 1 * 16 // 8
    block_align = 1 * 16 // 8
    
    header = struct.pack('<4sI4s4sIHHIIHH4sI',
        b'RIFF', 36 + len(raw_audio), b'WAVE', b'fmt ',
        16, 1, 1, sample_rate, byte_rate, block_align, 16,
        b'data', len(raw_audio))
    
    return header + raw_audio


# Define callback functions for real-time updates
def on_chunk(chunk_num, chunk_bytes):
    """Called for each audio chunk received"""
    print(f"Chunk {chunk_num} received: {len(chunk_bytes):,} bytes")


def on_complete(stats):
    """Called when generation is complete"""
    chunks_sent = stats.get('chunks_sent', 'N/A')
    bytes_sent = stats.get('bytes_sent', 'N/A')
    print(f"Complete! Generated {chunks_sent} chunks, {bytes_sent:,} bytes")


def on_error(error_msg):
    """Called when an error occurs"""
    print(f"Error: {error_msg}")


try:
    # Generate audio with WebSocket streaming (returns raw PCM)
    audio_data = client.generate_audio_ws(
        text="नमस्ते! FonadaLabs Text-to-Speech API में आपका स्वागत है। "
             "यह उदाहरण प्रगति ट्रैकिंग के साथ वास्तविक समय ऑडियो निर्माण को प्रदर्शित करता है।",
        voice="Dhruv",  # Available: Dhruv, Vaanee, Swastik, etc. (see docs for full list)
        language="Hindi",  # Available: Hindi, English, Tamil, Telugu
        on_chunk=on_chunk,
        on_complete=on_complete,
        on_error=on_error
    )
    
    # Add WAV header and save
    wav_audio = add_wav_header(audio_data)
    with open("output.wav", "wb") as f:
        f.write(wav_audio)
    
    print(f"\nAudio saved to output.wav ({len(wav_audio):,} bytes)")

except CreditsExhaustedError as e:
    print(f"Credits exhausted!")
    print(f"Remaining balance: {e.remaining_balance}")
    print(f"Current usage: {e.current_usage}")
    print(f"Estimated cost: {e.estimated_cost}")

except RateLimitError as e:
    print(f"Rate limit exceeded!")
    print(f"Rate limit: {e.rate_limit} requests")
    print(f"Retry after: {e.retry_after_seconds} seconds")

except TTSError as e:
    print(f"TTS Error: {e}")

WebSocket Benefits

  • Real-time streaming: Audio chunks are delivered as they're generated
  • Lower latency: Perfect for interactive applications
  • Progress tracking: Monitor generation progress in real-time
  • Better user experience: Start playing audio while generation continues

Troubleshooting

WebSocket: "message too big" or "frame exceeds limit"

Problem: WebSocket connection closes with "message too big" error

Cause: Audio chunks larger than the default WebSocket frame size (1MB)

Solution: The SDK automatically handles this by setting a 16MB frame size limit. If you still encounter issues with very large audio chunks, the server should split text into smaller segments.

Connection Timeout

Problem: Request times out for long texts

Solution: Increase timeout when initializing the client:

client = TTSClient(timeout=600) # 10 minutes

WebSocket WAV File Not Playable

Problem: WebSocket output file won't play in audio players

Cause: WebSocket streams raw PCM audio without WAV headers

Solution: Add WAV header to the raw audio:

import struct

def add_wav_header(raw_file, output_file):
    with open(raw_file, 'rb') as f:
        audio = f.read()
    
    # 24kHz, 16-bit, mono
    header = struct.pack('<4sI4s4sIHHIIHH4sI',
        b'RIFF', 36 + len(audio), b'WAVE', b'fmt ',
        16, 1, 1, 24000, 48000, 2, 16, b'data', len(audio))
    
    with open(output_file, 'wb') as f:
        f.write(header + audio)

add_wav_header('output.wav', 'output_playable.wav')

Best Practices

1. Use appropriate method for text length

  • Short texts (<500 chars): Use generate_audio()
  • Long texts (>500 chars): Use generate_audio_ws() with progress tracking
  • Real-time streaming: Use stream_audio_async()

2. Handle errors gracefully

Always wrap API calls in try-except blocks to handle network issues and API errors.

3. Use async for multiple requests

For generating multiple audio files, use async methods with asyncio.gather() for better performance.