DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
XiaomiMiMo/
$0.00
/ 1M characters
Automatically convert input text into natural and fluent speech output. You can generate natural and vivid speech content by configuring parameters such as speech style and voice. Automatically generate voices from text descriptions, without requiring presets or audio samples.

DeepInfra supports custom voices.
The following creates a voice using the curl command.
curl -X POST "https://api.deepinfra.com/v1/voices/add" \
-H "Content-Type: multipart/form-data" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-F "audio=@hello.wav" \
-F "name=John Doe" \
-F "description=John Doe's voice"
which will return something similar to
{
"user_id": "gh:10000000",
"voice_id": "abcd1234abcd1234abcd",
"name": "John Doe",
"description": "John Doe's voice",
"created_at": 1723851387,
"updated_at": 1723851387
}
© 2026 DeepInfra. All rights reserved.