Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision under an Apache 2.0 license. As of 2 Jan 2025, 10 unique Voicepacks have been released, and a .onnx version of v0.19 is available.
Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision under an Apache 2.0 license. As of 2 Jan 2025, 10 unique Voicepacks have been released, and a .onnx version of v0.19 is available.
You can use cURL or any other http client to run inferences:
curl -X POST \
-d '{"text": "How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born."}' \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-H 'Content-Type: application/json' \
'https://api.deepinfra.com/v1/inference/hexgrad/Kokoro-82M'
which will give you back something similar to:
{
"audio": null,
"input_character_length": 0,
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
output_format
stringOutput format for the speech
Default value: "wav"
Allowed values: mp3
opus
flac
wav
pcm
preset_voice
stringPreset voice name to use for the speech
Default value: "af"
Allowed values: af
af_bella
af_sarah
am_adam
am_michael
bf_emma
bf_isabella
bm_george
bm_lewis
af_nicole
af_sky
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request