Qwen3-Max-Thinking state-of-the-art reasoning model at your fingertips!
sesame/
$7.00
/ 1M characters
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

© 2026 Deep Infra. All rights reserved.