Speech to Text
You can use our Speech to Text (ASR) API to detect speech and convert it into text. Our model supports 23 languages from all over the India. This allows for great transcription services, voice commands, real-time subtitles, and voice-enabled applications , promising high accuracy with support for various accents, dialects, and audio formats.
Overview
Our ASR (Automatic Speech Recognition) API converts spoken language into written text with high accuracy and low latency. Our models are trained on diverse datasets covering multiple languages, accents, and speaking styles across India to ensure reliable transcription service across various use cases.
- Real-time speech transcription for live applications
- Batch processing for audio file transcription
Try a sample transcription:
Explore our voice library to find the perfect voice for your project.
API Usage
Use our models to convert speech to text using REST API endpoint with simple requests for single and batch transcribing.
Transcribe Audio File - Basic Request
You can transcribe a single audio using the endpoint which returns the transcript immediately after processing.
curl -X POST https://api.fonada.ai/v1/asr/transcribe -H "Authorization: Bearer YOUR_API_KEY" -H "accept: application/json" -F "file=@audio.wav" -F "language_id=en"Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| audio | file | ✓ | Audio file to transcribe (.wav, .mp3, .flac, .m4a, .ogg) |
| language | string | ✓ | Language code (e.g. , "hi" for Hindi, "en" for English). |
| format | string | ✓ | Response format: "text", "json" (default: "json") |
Response Format
{
"engine": "fonadalabs-asr-v1",
"language": "hi",
"text": "कभी-कभी छोटी-सी मदद भी बड़ी राहत दे देती है, जैसे समय पर रिमाइंडर सेट करना, कोई महत्वपूर्ण संदेश पढ़कर सुनाना, या फिर बस एक प्यारा-सा गाना चला देना।",
"timings_ms": {
"preprocess": 262.97,
"inference": 23.2,
"decode": 0.51,
"total": 290.98
}
}Fonadalabs SDK
You can get richer ASR experiences with our official SDK. Install it once and tap into high-level modules for streaming, batch, and custom transcription workflows.
Install the SDK
pip install fonadalabsTranscribe a Single File
You can use high-level HTTP client to submit local recordings. The SDK falls back to values from .env, so you can keep credentials out of source control while overriding settings at runtime.
from fonadalabs import ASRClient
AUDIO_PATH = "your_audio_path"
LANG = "hi"
your_api="your_api_key"
if __name__ == "__main__":
client = ASRClient(api_key=your_api)
try:
result = client.transcribe_file(AUDIO_PATH, language_id=LANG)
print("Transcription Complete")
print("----------------------------")
print(f"Language : {result.language_id}")
print(f"Text : {result.text}")
print("----------------------------")
except Exception as e:
print(f"Error: {e}")
finally:
client.close()
Real-time Streaming
Stream audio frames over WebSockets and receive low-latency transcription responses using the SDK’s ASR client.
Python WebSocket Example
from fonadalabs import ASRWebSocketClient
# Initialize the client with your API key directly
client = ASRWebSocketClient(token="YOUR_API_KEY")
# Audio file to transcribe
audio_file = "your_audio_file.wav"
result = client.transcribe(audio_file, language_id="hi")
print(result)Supported Formats
Our ASR service supports multiple audio formats with various quality and compression options.
MP3
Sample rates: 8kHz - 48kHz
Channels: Mono/Stereo
Bit rate: 16bit/24bit
WAV
Sample rates: 8kHz - 48kHz
Channels: Mono/Stereo
Bit rate: 16bit/24bit
FLAC
Sample rates: 8kHz - 48kHz
Channels: Mono/Stereo
Bit rate: 16bit/24bit
M4A
Sample rates: 8kHz - 48kHz
Channels: Mono/Stereo
Bit rate: 16bit/24bit
OGG
Sample rates: 8kHz - 48kHz
Channels: Mono / Stereo
Bit rate: 16bit/24bit
Supported Languages
Our ASR models support 23 languages with optimized accuracy for regional accents and dialects.
| Language | Code | Native Name |
|---|---|---|
| Assamese | as | অসমীয়া |
| Bengali | bn | বাংলা |
| Bodo | brx | बोड़ो |
| Dogri | doi | डोगरी |
| Gujarati | gu | ગુજરાતી |
| Hindi | hi | हिन्दी |
| Kannada | kn | ಕನ್ನಡ |
| Konkani | kok | कोंकणी |
| Kashmiri | ks | कश्मीरी |
| Maithili | mai | मैथिली |
| Malayalam | ml | മലയാളം |
| Manipuri | mni | মৈতৈলোন |
| Marathi | mr | मराठी |
| Nepali | ne | नेपाली |
| Odia | or | ଓଡିଆ |
| Punjabi | pa | ਪੰਜਾਬੀ |
| Sanskrit | sa | संस्कृत |
| Santali | sat | ᱥᱟᱱᱛᱟᱲᱤ |
| Sindhi | sd | सिन्धी |
| Tamil | ta | தமிழ் |
| Telugu | te | తెలుగు |
| Urdu | ur | اردو |
Best Practices
Follow these recommendations to achieve optimal transcription accuracy and performance.
Audio Quality
- • Use high-quality microphones and recording equipment
- • Record in quiet environments with minimal background noise
- • Maintain consistent distance from the microphone
- • Use 16kHz+ sample rate for better accuracy
Speaking Guidelines
- • Speak clearly and at a moderate pace
- • Avoid speaking too fast or too slow
- • Use natural pauses between sentences
- • Minimize overlapping speech in multi-speaker scenarios
Error Handling
- • Implement retry logic for network failures
- • Handle partial results in streaming scenarios
- • Monitor API rate limits and usage quotas