hexgrad/Kokoro-82M cover image
featured

hexgrad/Kokoro-82M

Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision under an Apache 2.0 license. As of 2 Jan 2025, 10 unique Voicepacks have been released, and a .onnx version of v0.19 is available.

Kokoro is a frontier TTS model for its size of 82 million parameters (text in/audio out). On 25 Dec 2024, Kokoro v0.19 weights were permissively released in full fp32 precision under an Apache 2.0 license. As of 2 Jan 2025, 10 unique Voicepacks have been released, and a .onnx version of v0.19 is available.

Public
$5.00 per M characters
ProjectPaperLicense

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -d '{"text": "How could I know? It's an unanswerable question. Like asking an unborn child if they'll lead a good life. They haven't even been born."}'  \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -H 'Content-Type: application/json'  \
    'https://api.deepinfra.com/v1/inference/hexgrad/Kokoro-82M'

which will give you back something similar to:

{
  "audio": null,
  "input_character_length": 0,
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

Input fields

textstring

Text to convert to speech


output_formatstring

Output format for the speech

Default value: "wav"

Allowed values: mp3opusflacwavpcm


preset_voicestring

Preset voice name to use for the speech

Default value: "af"

Allowed values: afaf_bellaaf_saraham_adamam_michaelbf_emmabf_isabellabm_georgebm_lewisaf_nicoleaf_sky


speednumber

Speed of the speech

Range: 0.25 ≤ speed ≤ 4


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema