NVIDIA Nemotron 3 Super - blazing-fast agentic AI, ready to deploy today!
sesame/
$7.00
/ 1M characters
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

© 2026 Deep Infra. All rights reserved.