DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
sesame/
$7.00
/ 1M characters
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

z92bPJB1
2025-03-15T02:12:49+00:00
© 2026 DeepInfra. All rights reserved.