Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.
Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.
You can use cURL or any other http client to run inferences:
curl -X POST \
-d '{"input": "The quick brown fox jumps over the lazy dog"}' \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-H 'Content-Type: application/json' \
'https://api.deepinfra.com/v1/inference/canopylabs/orpheus-3b-0.1-ft'
which will give you back something similar to:
{
"audio": null,
"input_character_length": 0,
"output_format": "",
"words": [
{
"text": "Hello",
"start": 0.0,
"end": 1.0,
"confidence": 0.5
},
{
"text": "World",
"start": 4.0,
"end": 5.0,
"confidence": 0.5
}
],
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
response_format
stringSelect the desired format for the speech output. Supported formats include mp3, opus, flac, wav, and pcm.
Default value: "wav"
Allowed values: mp3
opus
flac
wav
pcm
max_tokens
integerMaximum number of tokens for the generation
Default value: 2000
Range: 0 < max_tokens ≤ 4096
repetition_penalty
numberRepetition penalty for the generation
Default value: 1.1
Range: 0 ≤ repetition_penalty
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request