Nemotron 3 Nano Omni — the first multimodal model in the Nemotron 3 family, now on DeepInfra!
sesame/
$7.00
/ 1M characters
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

© 2026 Deep Infra. All rights reserved.