mistralai/Mixtral-8x22B-v0.1 cover image

mistralai/Mixtral-8x22B-v0.1

Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. This model is not instruction tuned.

Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. This model is not instruction tuned.

Public
fp16
64k

OpenAI-compatible HTTP API

You can POST to our OpenAI compatible endpoint:

curl "https://api.deepinfra.com/v1/openai/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
  -d '{
      "model": "mistralai/Mixtral-8x22B-v0.1",
      "prompt": "The quick brown fox"
    }'

To which you'd get something like:

{
    "id": "cmpl-1b8401a68c5141eb825f68944dcea2c1",
    "object": "text_completion",
    "created": 1700578595,
    "model": "mistralai/Mixtral-8x22B-v0.1",
    "choices": [
        {
            "index": 0,
            "text": " jumped over the lazy dog",
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 4,
        "total_tokens": 9,
        "completion_tokens": 5
    }
}

You can also perform a streaming request by passing "stream": true:

curl "https://api.deepinfra.com/v1/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPINFRA_TOKEN" \
  -d '{
      "model": "mistralai/Mixtral-8x22B-v0.1",
      "stream": true,
      "prompt": "The quick brown fox"
    }'

to which you'd get a sequence of SSE events, finishing with [DONE].

data: {"id": "cmpl-8e9b62b4fd924beb848c9d12d1bc86ec", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " jumps", "finish_reason": null}]}

data: {"id": "cmpl-713aa1050f5a456f80743ff340749998", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " over", "finish_reason": null}]}

data: {"id": "cmpl-6c45c4a8f3ed45d5823aee9c913b25e0", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " the", "finish_reason": null}]}

data: {"id": "cmpl-4b78757fcff445509e38ca834c171c1a", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " lazy", "finish_reason": null}]}

data: {"id": "cmpl-59c539a062b740e4aed11c5463f888fc", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": " dog", "finish_reason": null}]}

data: {"id": "cmpl-c7a6daa119c846ec9a3e2c1abd253f66", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": "", "finish_reason": null}]}

data: {"id": "cmpl-e9804392c4a34188b7e78a59ee1db792", "object": "text_completion", "created": 1700578719, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": "", "finish_reason": "stop"}]}

data: [DONE]

Currently supported parameters:

  • temperature - more or less random generation
  • top_p - controls token sampling
  • max_tokens - maximum number of generated tokens
  • stop - up to 4 strings to terminate generation earlier
  • n - number of sequences to generate (up to 2)

Known caveats:

  • if the generation is terminated due to a stop sequence, the stop sequence is present in the output (but in OpenAI it is not).

Input fields

modelstring

model name


promptstring

input prompt - a single string is currently supported


max_tokensinteger

The maximum number of tokens to generate in the completion. The total length of input tokens and generated tokens is limited by the model's context length.If not set or None defaults to model's max context length minus input length.

Default value: 512

Range: 0 < max_tokens ≤ 100000


temperaturenumber

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic

Default value: 1

Range: 0 ≤ temperature ≤ 2


top_pnumber

Default value: 1

Range: 0 < top_p ≤ 1


ninteger

number of sequences to return. n != 1 incompatible with streaming

Default value: 1

Range: 1 ≤ n ≤ 2


streamboolean

whether to stream the output via SSE or return the full response

Default value: false


logprobsinteger

return top tokens and their log-probabilities


echoboolean

return prompt as part of the respons


stopstring

up to 16 sequences where the API will stop generating further tokens


presence_penaltynumber

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

Default value: 0

Range: -2 ≤ presence_penalty ≤ 2


frequency_penaltynumber

Positive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics.

Default value: 0

Range: -2 ≤ frequency_penalty ≤ 2


response_formatobject

The format of the response. Currently, only json is supported.


repetition_penaltynumber

Alternative penalty for repetition, but multiplicative instead of additive (> 1 penalize, < 1 encourage)

Default value: 1

Input Schema

Output Schema

Streaming Schema