canopylabs/orpheus-3b-0.1-ft cover image
featured

canopylabs/orpheus-3b-0.1-ft

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

Public
$7.00 per M characters
ProjectPaperLicense

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -d '{"input": "The quick brown fox jumps over the lazy dog"}'  \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -H 'Content-Type: application/json'  \
    'https://api.deepinfra.com/v1/inference/canopylabs/orpheus-3b-0.1-ft'
copy

which will give you back something similar to:

{
  "audio": null,
  "input_character_length": 0,
  "output_format": "",
  "words": [
    {
      "text": "Hello",
      "start": 0.0,
      "end": 1.0,
      "confidence": 0.5
    },
    {
      "text": "World",
      "start": 4.0,
      "end": 5.0,
      "confidence": 0.5
    }
  ],
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

copy

Input fields

inputstring

Text to convert to speech


voicestring

Voice name to use

Default value: "tara"

Allowed values: taraleahjessleodanmiazac


response_formatstring

Select the desired format for the speech output. Supported formats include mp3, opus, flac, wav, and pcm.

Default value: "wav"

Allowed values: mp3opusflacwavpcm


temperaturenumber

Temperature of the generation

Default value: 0.4

Range: 0 ≤ temperature ≤ 2


top_pnumber

Top p value for the generation

Default value: 0.9

Range: 0 ≤ top_p ≤ 1


max_tokensinteger

Maximum number of tokens for the generation

Default value: 2000

Range: 0 < max_tokens ≤ 4096


repetition_penaltynumber

Repetition penalty for the generation

Default value: 1.1

Range: 0 ≤ repetition_penalty


streamboolean

Whether to stream audio bytes in chunks

Default value: false


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema