Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. This model is not instruction tuned.
Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. This model is not instruction tuned.
You can POST to our OpenAI compatible endpoint:
curl "https://api.deepinfra.com/v1/openai/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"model": "mistralai/Mixtral-8x22B-v0.1",
"prompt": "The quick brown fox"
}'
To which you'd get something like:
{
"id": "cmpl-1b8401a68c5141eb825f68944dcea2c1",
"object": "text_completion",
"created": 1700578595,
"model": "mistralai/Mixtral-8x22B-v0.1",
"choices": [
{
"index": 0,
"text": " jumped over the lazy dog",
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 4,
"total_tokens": 9,
"completion_tokens": 5
}
}
You can also perform a streaming request by passing "stream": true
:
curl "https://api.deepinfra.com/v1/openai/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPINFRA_TOKEN" \
-d '{
"model": "mistralai/Mixtral-8x22B-v0.1",
"stream": true,
"prompt": "The quick brown fox"
}'
to which you'd get a sequence of SSE events, finishing with [DONE]
.
data: {"id": "cmpl-8e9b62b4fd924beb848c9d12d1bc86ec", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " jumps", "finish_reason": null}]}
data: {"id": "cmpl-713aa1050f5a456f80743ff340749998", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " over", "finish_reason": null}]}
data: {"id": "cmpl-6c45c4a8f3ed45d5823aee9c913b25e0", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " the", "finish_reason": null}]}
data: {"id": "cmpl-4b78757fcff445509e38ca834c171c1a", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " lazy", "finish_reason": null}]}
data: {"id": "cmpl-59c539a062b740e4aed11c5463f888fc", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " dog", "finish_reason": null}]}
data: {"id": "cmpl-c7a6daa119c846ec9a3e2c1abd253f66", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": "", "finish_reason": null}]}
data: {"id": "cmpl-e9804392c4a34188b7e78a59ee1db792", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": "", "finish_reason": "stop"}]}
data: [DONE]
Currently supported parameters:
temperature
- more or less random generationtop_p
- controls token samplingmax_tokens
- maximum number of generated tokensstop
- up to 4 strings to terminate generation earliern
- number of sequences to generate (up to 2)Known caveats:
max_tokens
integerThe maximum number of tokens to generate in the completion. The total length of input tokens and generated tokens is limited by the model's context length.If not set or None defaults to model's max context length minus input length.
Default value: 512
Range: 0 < max_tokens ≤ 100000
temperature
numberWhat sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic
Default value: 1
Range: 0 ≤ temperature ≤ 2
n
integernumber of sequences to return. n != 1 incompatible with streaming
Default value: 1
Range: 1 ≤ n ≤ 2
presence_penalty
numberPositive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Default value: 0
Range: -2 ≤ presence_penalty ≤ 2
frequency_penalty
numberPositive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics.
Default value: 0
Range: -2 ≤ frequency_penalty ≤ 2
repetition_penalty
numberAlternative penalty for repetition, but multiplicative instead of additive (> 1 penalize, < 1 encourage)
Default value: 1