🚀 New model available: DeepSeek-V3.1 🚀
sesame/
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.
© 2025 Deep Infra. All rights reserved.