Speech to Text

You can use our Speech to Text (ASR) API to detect speech and convert it into text. Our model supports 23 languages from all over the India. This allows for great transcription services, voice commands, real-time subtitles, and voice-enabled applications , promising high accuracy with support for various accents, dialects, and audio formats.

Overview

Our ASR (Automatic Speech Recognition) API converts spoken language into written text with high accuracy and low latency. Our models are trained on diverse datasets covering multiple languages, accents, and speaking styles across India to ensure reliable transcription service across various use cases.

Real-time speech transcription for live applications
Batch processing for audio file transcription

Try a sample transcription:

Sample Audio

Hindi Female Voice

00:08Clear Audio

Transcription:

"नमस्ते, मेरा नाम श्रुति है। बताइए, मैं आपकी किस तरह मदद कर सकती हूँ।"

Explore our voice library to find the perfect voice for your project.

Developer quickstart

Learn how to integrate text to speech into your applications

API Usage

Use our models to convert speech to text using REST API endpoint with simple requests for single and batch transcribing.

Transcribe Audio File - Basic Request

You can transcribe a single audio using the endpoint which returns the transcript immediately after processing.

curl

curl -X POST https://api.fonada.ai/v1/asr/transcribe   -H "Authorization: Bearer YOUR_API_KEY"   -H "accept: application/json"   -F "file=@audio.wav"   -F "language_id=en"

Request Parameters

Parameter	Type	Required	Description
audio	file	✓	Audio file to transcribe (.wav, .mp3, .flac, .m4a, .ogg)
language	string	✓	Language code (e.g. , "hi" for Hindi, "en" for English).
format	string	✓	Response format: "text", "json" (default: "json")

Response Format

{
  "engine": "fonadalabs-asr-v1",
  "language": "hi",
  "text": "कभी-कभी छोटी-सी मदद भी बड़ी राहत दे देती है, जैसे समय पर रिमाइंडर सेट करना, कोई महत्वपूर्ण संदेश पढ़कर सुनाना, या फिर बस एक प्यारा-सा गाना चला देना।",
  "timings_ms": {
    "preprocess": 262.97,
    "inference": 23.2,
    "decode": 0.51,
    "total": 290.98
  }
}

Fonadalabs SDK

You can get richer ASR experiences with our official SDK. Install it once and tap into high-level modules for streaming, batch, and custom transcription workflows.

Install the SDK

pip

pip install fonadalabs

Transcribe a Single File

You can use high-level HTTP client to submit local recordings. The SDK falls back to values from .env, so you can keep credentials out of source control while overriding settings at runtime.

Python

from fonadalabs import ASRClient
AUDIO_PATH = "your_audio_path"
LANG = "hi"
your_api="your_api_key"
if __name__ == "__main__":
    client = ASRClient(api_key=your_api)
    try:
        result = client.transcribe_file(AUDIO_PATH, language_id=LANG)
        print("Transcription Complete")
        print("----------------------------")
        print(f"Language : {result.language_id}")
        print(f"Text     : {result.text}")
        print("----------------------------")
    except Exception as e:
        print(f"Error: {e}")
    finally:
        client.close()

Real-time Streaming

Stream audio frames over WebSockets and receive low-latency transcription responses using the SDK’s ASR client.

Python WebSocket Example

Python

from fonadalabs import ASRWebSocketClient
# Initialize the client with your API key directly
client = ASRWebSocketClient(token="YOUR_API_KEY")
# Audio file to transcribe
audio_file = "your_audio_file.wav"
result = client.transcribe(audio_file, language_id="hi")
print(result)

Supported Formats

Our ASR service supports multiple audio formats with various quality and compression options.

MP3

Sample rates: 8kHz - 48kHz

Channels: Mono/Stereo

Bit rate: 16bit/24bit

WAV

Sample rates: 8kHz - 48kHz

Channels: Mono/Stereo

Bit rate: 16bit/24bit

FLAC

Sample rates: 8kHz - 48kHz

Channels: Mono/Stereo

Bit rate: 16bit/24bit

M4A

Sample rates: 8kHz - 48kHz

Channels: Mono/Stereo

Bit rate: 16bit/24bit

OGG

Sample rates: 8kHz - 48kHz

Channels: Mono / Stereo

Bit rate: 16bit/24bit

Supported Languages

Our ASR models support 23 languages with optimized accuracy for regional accents and dialects.

Language	Code	Native Name
Assamese	as	অসমীয়া
Bengali	bn	বাংলা
Bodo	brx	बोड़ो
Dogri	doi	डोगरी
Gujarati	gu	ગુજરાતી
Hindi	hi	हिन्दी
Kannada	kn	ಕನ್ನಡ
Konkani	kok	कोंकणी
Kashmiri	ks	कश्मीरी
Maithili	mai	मैथिली
Malayalam	ml	മലയാളം
Manipuri	mni	মৈতৈলোন
Marathi	mr	मराठी
Nepali	ne	नेपाली
Odia	or	ଓଡିଆ
Punjabi	pa	ਪੰਜਾਬੀ
Sanskrit	sa	संस्कृत
Santali	sat	ᱥᱟᱱᱛᱟᱲᱤ
Sindhi	sd	सिन्धी
Tamil	ta	தமிழ்
Telugu	te	తెలుగు
Urdu	ur	اردو

Best Practices

Follow these recommendations to achieve optimal transcription accuracy and performance.

Audio Quality

• Use high-quality microphones and recording equipment
• Record in quiet environments with minimal background noise
• Maintain consistent distance from the microphone
• Use 16kHz+ sample rate for better accuracy

Speaking Guidelines

• Speak clearly and at a moderate pace
• Avoid speaking too fast or too slow
• Use natural pauses between sentences
• Minimize overlapping speech in multi-speaker scenarios

Error Handling

• Implement retry logic for network failures
• Handle partial results in streaming scenarios
• Monitor API rate limits and usage quotas

Speech to Text

Overview

Try a sample transcription:

Developer quickstart

API Usage

Transcribe Audio File - Basic Request

Request Parameters

Response Format

Fonadalabs SDK

Install the SDK

Transcribe a Single File

Real-time Streaming

Python WebSocket Example

Supported Formats

MP3

WAV

FLAC

M4A

OGG

Supported Languages

Best Practices

Audio Quality

Speaking Guidelines

Error Handling

On this page