Latest version of the Airoboros model fine-tunned version of llama-2-70b using the Airoboros dataset. This model is currently running jondurbin/airoboros-l2-70b-2.2.1
Latest version of the Airoboros model fine-tunned version of llama-2-70b using the Airoboros dataset. This model is currently running jondurbin/airoboros-l2-70b-2.2.1
You can POST to our OpenAI compatible endpoint:
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(deepctl auth token)" \
-d '{
"model": "deepinfra/airoboros-70b",
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
To which you'd get something like:
{
"id": "chatcmpl-guMTxWgpFf",
"object": "chat.completion",
"created": 1694623155,
"model": "deepinfra/airoboros-70b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": " Hello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 16,
"total_tokens": 31
}
}
You can also perform a streaming request by passing "stream": true
:
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(deepctl auth token)" \
-d '{
"model": "deepinfra/airoboros-70b",
"stream": true,
"messages": [
{
"role": "user",
"content": "Hello!"
}
]
}'
to which you'd get a sequence of SSE events, finishing with [DONE]
.
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "deepinfra/airoboros-70b", "choices": [{"index": 0, "delta": {"role": "assistant", "content": " "}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "deepinfra/airoboros-70b", "choices": [{"index": 0, "delta": {"role": "assistant", "content": " Hi"}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "deepinfra/airoboros-70b", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "!"}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "deepinfra/airoboros-70b", "choices": [{"index": 0, "delta": {"role": "assistant", "content": ""}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "deepinfra/airoboros-70b", "choices": [{"index": 0, "delta": {"role": "assistant", "content": "</s>"}, "finish_reason": null}]}
data: {"id": "Rc5hsIPHOSfMP3rNSFUw9tfR", "object": "chat.completion.chunk", "created": 1694623354, "model": "deepinfra/airoboros-70b", "choices": [{"index": 0, "delta": {}, "finish_reason": "stop"}]}
data: [DONE]
Currently supported parameters:
temperature
- more or less random generationtop_p
- controls token samplingmax_tokens
- maximum number of generated tokensstop
- up to 4 strings to terminate generation earliern
- number of sequences to generate (up to 2)Known caveats:
max_new_tokens
integermaximum length of the newly generated generated text.If not set or None defaults to model's max context length minus input length.
Default value: 512
Range: 1 ≤ max_new_tokens ≤ 100000
temperature
numbertemperature to use for sampling. 0 means the output is deterministic. Values greater than 1 encourage more diversity
Default value: 0.7
Range: 0 ≤ temperature ≤ 100
top_p
numberSample from the set of tokens with highest probability such that sum of probabilies is higher than p. Lower values focus on the most probable tokens.Higher values sample more low-probability tokens
Default value: 0.9
Range: 0 < top_p ≤ 1
top_k
integerSample from the best k (number of) tokens. 0 means off
Default value: 0
Range: 0 ≤ top_k < 100000
repetition_penalty
numberrepetition penalty. Value of 1 means no penalty, values greater than 1 discourage repetition, smaller than 1 encourage repetition.
Default value: 1
Range: 0.01 ≤ repetition_penalty ≤ 5
num_responses
integerNumber of output sequences to return. Incompatible with streaming
Default value: 1
Range: 1 ≤ num_responses ≤ 2
response_format
objectOptional nested object with "type" set to "json_object"
Default value: [object Object]
presence_penalty
numberPositive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Default value: 0
Range: -2 ≤ presence_penalty ≤ 2
frequency_penalty
numberPositive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics.
Default value: 0
Range: -2 ≤ frequency_penalty ≤ 2
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request
stream
booleanWhether to stream tokens, by default it will be false, currently only supported for Llama 2 text generation models, token by token updates will be sent over SSE
Default value: false