We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

nvidia logo

nvidia/

Nemotron-3.5-ASR-Streaming-Multilingual-0.6b

$0.00020

/ minute

Nemotron 3.5 ASR Streaming Multilingual is an open 0.6B-parameter prompt-conditioned cache-aware FastConformer-RNNT model, engineered for low-latency streaming transcription across 40+ languages. It powers real-time captioning, voice agents, and multilingual transcription pipelines—replacing separate per-language Whisper deployments with a single inference pass.

nvidia/Nemotron-3.5-ASR-Streaming-Multilingual-0.6b cover image

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -F audio=@my_voice.mp3  \
    'https://api.deepinfra.com/v1/inference/nvidia/Nemotron-3.5-ASR-Streaming-Multilingual-0.6b'
copy

which will give you back something similar to:

{
  "text": "",
  "segments": [
    {
      "end": 1.0,
      "id": 0,
      "start": 0.0,
      "text": "Hello"
    },
    {
      "end": 5.0,
      "id": 1,
      "start": 4.0,
      "text": "World"
    }
  ],
  "language": "en",
  "input_length_ms": 0,
  "words": [
    {
      "end": 1.0,
      "start": 0.0,
      "text": "Hello"
    },
    {
      "end": 5.0,
      "start": 4.0,
      "text": "World"
    }
  ],
  "duration": 0.0,
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0,
    "output_length": 0
  }
}

copy

Input fields

Input Schema

Output Schema