Speech to Text

You can use our Speech to Text (ASR) API to detect speech and convert it into text. Our model supports 23 languages from all over the India. This allows for great transcription services, voice commands, real-time subtitles, and voice-enabled applications , promising high accuracy with support for various accents, dialects, and audio formats.

Overview

Our ASR (Automatic Speech Recognition) API converts spoken language into written text with high accuracy and low latency. Our models are trained on diverse datasets covering multiple languages, accents, and speaking styles across India to ensure reliable transcription service across various use cases.

  • Real-time speech transcription for live applications
  • Batch processing for audio file transcription

Try a sample transcription:

Sample Audio
Hindi Female Voice
00:08Clear Audio
Transcription:
"नमस्ते, मेरा नाम श्रुति है। बताइए, मैं आपकी किस तरह मदद कर सकती हूँ।"

Explore our voice library to find the perfect voice for your project.

API Usage

Use our models to convert speech to text using REST API endpoint with simple requests for single and batch transcribing.

Transcribe Audio File - Basic Request

You can transcribe a single audio using the endpoint which returns the transcript immediately after processing.

curl
curl -X POST https://api.fonada.ai/v1/asr/transcribe -H "Authorization: Bearer YOUR_API_KEY" -H "accept: application/json" -F "file=@audio.wav" -F "language_id=en"

Request Parameters

ParameterTypeRequiredDescription
audiofileAudio file to transcribe (.wav, .mp3, .flac, .m4a, .ogg)
languagestringLanguage code (e.g. , "hi" for Hindi, "en" for English).
formatstringResponse format: "text", "json" (default: "json")

Response Format

{
  "engine": "fonadalabs-asr-v1",
  "language": "hi",
  "text": "कभी-कभी छोटी-सी मदद भी बड़ी राहत दे देती है, जैसे समय पर रिमाइंडर सेट करना, कोई महत्वपूर्ण संदेश पढ़कर सुनाना, या फिर बस एक प्यारा-सा गाना चला देना।",
  "timings_ms": {
    "preprocess": 262.97,
    "inference": 23.2,
    "decode": 0.51,
    "total": 290.98
  }
}

Fonadalabs SDK

You can get richer ASR experiences with our official SDK. Install it once and tap into high-level modules for streaming, batch, and custom transcription workflows.

Install the SDK

pip
pip install fonadalabs

Transcribe a Single File

You can use high-level HTTP client to submit local recordings. The SDK falls back to values from .env, so you can keep credentials out of source control while overriding settings at runtime.

Python
from fonadalabs import ASRClient AUDIO_PATH = "your_audio_path" LANG = "hi" your_api="your_api_key" if __name__ == "__main__": client = ASRClient(api_key=your_api) try: result = client.transcribe_file(AUDIO_PATH, language_id=LANG) print("Transcription Complete") print("----------------------------") print(f"Language : {result.language_id}") print(f"Text : {result.text}") print("----------------------------") except Exception as e: print(f"Error: {e}") finally: client.close()

Real-time Streaming

Stream audio frames over WebSockets and receive low-latency transcription responses using the SDK’s ASR client.

Python WebSocket Example

Python
from fonadalabs import ASRWebSocketClient # Initialize the client with your API key directly client = ASRWebSocketClient(token="YOUR_API_KEY") # Audio file to transcribe audio_file = "your_audio_file.wav" result = client.transcribe(audio_file, language_id="hi") print(result)

Supported Formats

Our ASR service supports multiple audio formats with various quality and compression options.

MP3

Sample rates: 8kHz - 48kHz

Channels: Mono/Stereo

Bit rate: 16bit/24bit

WAV

Sample rates: 8kHz - 48kHz

Channels: Mono/Stereo

Bit rate: 16bit/24bit

FLAC

Sample rates: 8kHz - 48kHz

Channels: Mono/Stereo

Bit rate: 16bit/24bit

M4A

Sample rates: 8kHz - 48kHz

Channels: Mono/Stereo

Bit rate: 16bit/24bit

OGG

Sample rates: 8kHz - 48kHz

Channels: Mono / Stereo

Bit rate: 16bit/24bit

Supported Languages

Our ASR models support 23 languages with optimized accuracy for regional accents and dialects.

LanguageCodeNative Name
Assameseasঅসমীয়া
Bengalibnবাংলা
Bodobrxबोड़ो
Dogridoiडोगरी
Gujaratiguગુજરાતી
Hindihiहिन्दी
Kannadaknಕನ್ನಡ
Konkanikokकोंकणी
Kashmiriksकश्मीरी
Maithilimaiमैथिली
Malayalammlമലയാളം
Manipurimniমৈতৈলোন
Marathimrमराठी
Nepalineनेपाली
Odiaorଓଡିଆ
Punjabipaਪੰਜਾਬੀ
Sanskritsaसंस्कृत
Santalisatᱥᱟᱱᱛᱟᱲᱤ
Sindhisdसिन्धी
Tamiltaதமிழ்
Teluguteతెలుగు
Urduurاردو

Best Practices

Follow these recommendations to achieve optimal transcription accuracy and performance.

Audio Quality

  • • Use high-quality microphones and recording equipment
  • • Record in quiet environments with minimal background noise
  • • Maintain consistent distance from the microphone
  • • Use 16kHz+ sample rate for better accuracy

Speaking Guidelines

  • • Speak clearly and at a moderate pace
  • • Avoid speaking too fast or too slow
  • • Use natural pauses between sentences
  • • Minimize overlapping speech in multi-speaker scenarios

Error Handling

  • • Implement retry logic for network failures
  • • Handle partial results in streaming scenarios
  • • Monitor API rate limits and usage quotas