Llama 3.3-70B is a multilingual LLM trained on a massive dataset of 15 trillion tokens, fine-tuned for instruction-following and conversational dialogue. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

Llama-3.3-70B-Instruct

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

Llama-3.3-70B-Instruct-Turbo

Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

phi-4

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

Meta-Llama-3.1-70B-Instruct

Meta-Llama-3.1-8B-Instruct

Meta-Llama-3.1-405B-Instruct

QwQ is an experimental research model developed by the Qwen Team, designed to advance AI reasoning capabilities. This model embodies the spirit of philosophical inquiry, approaching problems with genuine wonder and doubt. QwQ demonstrates impressive analytical abilities, achieving scores of 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, and 50.0% on LiveCodeBench. With its contemplative approach and exceptional performance on complex problems.

QwQ-32B-Preview

Meta-Llama-3.1-8B-Instruct-Turbo

Meta-Llama-3.1-70B-Instruct-Turbo

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). It has significant improvements in code generation, code reasoning and code fixing. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.

Qwen2.5-Coder-32B-Instruct

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo.  As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

Llama-3.1-Nemotron-70B-Instruct

Qwen2.5 is a model pretrained on a large-scale dataset of up to 18 trillion tokens, offering significant improvements in knowledge, coding, mathematics, and instruction following compared to its predecessor Qwen2. The model also features enhanced capabilities in generating long texts, understanding structured data, and generating structured outputs, while supporting multilingual capabilities for over 29 languages.

Qwen2.5-72B-Instruct

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks.  This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.

Llama-3.2-90B-Vision-Instruct

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis.  Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.

Llama-3.2-11B-Vision-Instruct

At 8 billion parameters, with superior quality and prompt adherence, this base model is the most powerful in the Stable Diffusion family. This model is ideal for professional use cases at 1 megapixel resolution

sd3.5

Black Forest Labs' latest state-of-the art proprietary model sporting top of the line prompt following, visual quality, details and output diversity.

FLUX-1.1-pro

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. This model offers cutting-edge output quality and competitive prompt following, matching the performance of closed source alternatives. Trained using latent adversarial diffusion distillation, FLUX.1 [schnell] can generate high-quality images in only 1 to 4 steps. 

FLUX-1-schnell

FLUX.1-dev is a state-of-the-art 12 billion parameter rectified flow transformer developed by Black Forest Labs. This model excels in text-to-image generation, providing highly accurate and detailed outputs. It is particularly well-regarded for its ability to follow complex prompts and generate anatomically accurate images, especially with challenging details like hands and faces.

FLUX-1-dev

Black Forest Labs' first flagship model based on Flux latent rectified flow transformers

FLUX-pro

  At 2.5 billion parameters, with improved MMDiT-X architecture and training methods, this model is designed to run “out of the box” on consumer hardware, striking a balance between quality and ease of customization. It is capable of generating images ranging between 0.25 and 2 megapixel resolution. 

sd3.5-medium

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford  et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

whisper-large-v3-turbo

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.

whisper-large-v3

Distil-Whisper was proposed in the paper Robust Knowledge Distillation via Large-Scale Pseudo Labelling.  This is the third and final installment of the Distil-Whisper English series. It the knowledge distilled version of OpenAI's Whisper large-v3, the latest and most performant Whisper model to date.  Compared to previous Distil-Whisper models, the distillation procedure for distil-large-v3 has been adapted to give superior long-form transcription accuracy with OpenAI's sequential long-form algorithm.

distil-large-v3

Text-to-Speech (TTS) technology converts written text into spoken words using advanced speech synthesis. TTS systems are used in applications like virtual assistants, accessibility tools for visually impaired users, and language learning software, enabling seamless human-computer interaction.

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to those leading proprietary models.

WizardLM-2-8x22B

Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. 

This is an advanced and more complex API. We strongly recommend that you use OpenAI Chat Completions instead.

#### Simple prompt

To query this model you need to provide a properly formatted input string.

```bash
DeepInfraCurl(example=<deepinfra.lib.examples.BasicExample object at 0x7efdf6512f20>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)
```

That will respond with

```json
{
    "request_id": "RWZDRhS5kdoM1XWwXLEshynO",
    "inference_status": {
        "status": "succeeded",
        "runtime_ms": 243,
        "cost": 0.0000436,
        "tokens_input": 12,
        "tokens_generated": 25
    },
    "results": [
        {
            "generated_text": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?"
        }
    ],
    "num_tokens": 25,
    "num_input_tokens":12
}
```

#### Conversations

The OpenAI Chat Completions API is better suited for chat-like conversations, use it instead.

To query this model you need to provide a properly formatted input string.

However, you can still do it if you really need to. You have to add each response and each of the user prompts to every request.
You need a properly formatted input string to make it understand the current context. See the example below for some of them.
You can tweak it even further by providing a system message.

```bash
DeepInfraCurl(example=<deepinfra.lib.examples.ConversationExample object at 0x7efdf655a8c0>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)
```

The conversation above might return something like the following

```json
{
    "request_id": "RWZDRhS5kdoM1XWwXLEshynO",
    "inference_status": {
        "status": "succeeded",
        "runtime_ms": 243,
        "cost": 0.000436,
        "tokens_input": 149,
        "tokens_generated": 338
    },
    "results": [
        {
            "generated_text": "Sous le Sable, my friend! It's an ancient technique that's been used for centuries in the Middle East and North Africa. The name itself..."
        }
    ],
    "num_tokens": 338,
    "num_input_tokens": 149
}
```

The longer the conversation gets, the more time it takes the model to generate the response. The conversation is limited by the context size of a model. Larger models also usually take more time to respond.

<br>

### Streaming


To do a streaming request, just pass `"stream": true`:

```bash
DeepInfraCurl(example=<deepinfra.lib.examples.BasicExample object at 0x7efdf6512f20>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=True)
```

which outputs:

```json
data: {"token": {"id": null, "text": "Hello", "logprob": 0.0, "special": false}, "generated_text": "", "details": null, "estimated_cost": null}

data: {"token": {"id": null, "text": "!", "logprob": 0.0, "special": false}, "generated_text": "", "details": null, "estimated_cost": null}

data: {"token": {"id": null, "text": " It", "logprob": 0.0, "special": false}, "generated_text": "", "details": null, "estimated_cost": null}

data: {"token": {"id": null, "text": "'s", "logprob": 0.0, "special": false}, "generated_text": "", "details": null, "estimated_cost": null}

....

data: {"token": {"id": null, "text": "", "logprob": 0.0, "special": false}, "generated_text": null, "details": {"finish_reason": "stop"}, "num_output_tokens": 25, "num_input_tokens": 12, "estimated_cost": 0.0000386}
```

#### Input format

You can see below the basic format of the input. Bear in mind that newlines often matter.

```text
PromptGenerator(example=<deepinfra.lib.examples.BasicExample object at 0x7efdf655a8c0>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

Conversation prompts contain the history of the exchanged prompts and responses.

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptConversationExample object at 0x7efdf6512f20>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

If you want to add system prompt, it is done like this

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptSystemExample object at 0x7efdf655a8c0>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```



You can use our command-line tool [deepctl](/docs/getting-started) to run inferences:

This is an advanced and more complex API. We strongly recommend that you use OpenAI Chat Completions instead.

#### Simple prompt

To query this model you need to provide a properly formatted input string.

```bash
DeepInfraCtl(example=<deepinfra.lib.examples.BasicExample object at 0x7f17ccedb910>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)
```

That will respond with

```json
{
    "inference_status": {
        "status": "succeeded",
        "runtime_ms": 243,
        "cost": 0.0000436,
        "tokens_input": 12,
        "tokens_generated": 25
    },
    "results": [
        {
            "generated_text": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?"
        }
    ],
    "num_tokens": 25,
    "num_input_tokens":12
}
```

#### Conversations

The OpenAI Chat Completions API is better suited for chat-like conversations, use it instead.

To query this model you need to provide a properly formatted input string.

However, you can still do it if you really need to. You have to add each response and each of the user prompts to every request.
You need a properly formatted input string to make it understand the current context. See the example below for some of them.
You can tweak it even further by providing a system message.

```bash
DeepInfraCtl(example=<deepinfra.lib.examples.ConversationExample object at 0x7f17cced8280>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)
```

The conversation above might return something like the following

```json
{
    "inference_status": {
        "status": "succeeded",
        "runtime_ms": 243,
        "cost": 0.000436,
        "tokens_input": 149,
        "tokens_generated": 338
    },
    "results": [
        {
            "generated_text": "Sous le Sable, my friend! It's an ancient technique that's been used for centuries in the Middle East and North Africa. The name itself..."
        }
    ],
    "num_tokens": 338,
    "num_input_tokens": 149
}
```

The longer the conversation gets, the more time it takes the model to generate the response. The conversation is limited by the context size of a model. Larger models also usually take more time to respond.

#### Input format

You can see below the basic format of the input. Bear in mind that newlines often matter.

```text
PromptGenerator(example=<deepinfra.lib.examples.BasicExample object at 0x7f17cced8280>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

Conversation prompts contain the history of the exchanged prompts and responses.

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptConversationExample object at 0x7f17ccedb910>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

If you want to add system prompt, it is done like this

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptSystemExample object at 0x7f17cced8280>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```



We recommend using our NodeJS client https://github.com/deepinfra/deepinfra-node.

You can install it with

```bash
npm install deepinfra
```

#### Simple prompt

To query this model you need to provide a properly formatted input string.

```javascript
DeepInfraJS(example=<deepinfra.lib.examples.BasicExample object at 0x78565592d120>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)

// Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
```

#### Conversations

The OpenAI Chat Completions API is better suited for chat-like conversations, use it instead.

However, you can still do it if you really need to. You have to add each response and each of the user prompts to every request.
You need a properly formatted input string to make it understand the current context. See the example below for some of them.
You can tweak it even further by providing a system message.

```javascript
DeepInfraJS(example=<deepinfra.lib.examples.ConversationExample object at 0x78565592c160>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)

// Sous le Sable! It's an ancient technique that never goes out of style, n'est-ce pas? Literally ...
```

The longer the conversation gets, the more time it takes the model to generate the response.
The number of messages that you can have in a conversation is limited by the context size of a model.
Larger models also usually take more time to respond.

#### Input format

You can see below the basic format of the input. Bear in mind that newlines often matter.

```text
PromptGenerator(example=<deepinfra.lib.examples.BasicExample object at 0x78565592d120>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

Conversation prompts contain the history of the exchanged prompts and responses.

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptConversationExample object at 0x78565592c160>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

If you want to add system prompt, it is done like this

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptSystemExample object at 0x78565592e500>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```



You can POST to our OpenAI Completions compatible endpoint.

However, this is an advanced and more complex API. We strongly recommend that you use OpenAI Chat Completions instead.

#### Simple prompt

To query this model you need to provide a properly formatted input string.

```bash
OpenAICompletionCurl(example=<deepinfra.lib.examples.BasicExample object at 0x7f5b74e3d930>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)
```

To which you'd get something like

```json
{
    "id": "cmpl-1b8401a68c5141eb825f68944dcea2c1",
    "object": "text_completion",
    "created": 1700578595,
    "model": "mistralai/Mixtral-8x22B-v0.1",
    "choices": [
        {
            "index": 0,
            "text": "Hello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?",
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 4,
        "total_tokens": 9,
        "completion_tokens": 5,
        "estimated_cost": 0.00035493
    }
}
```

#### Conversations

The OpenAI Chat Completions API is better suited for chat-like conversations, use it instead.

However, you can still do it if you really need to. You have to add each response and each of the user prompts to every request.
You need a properly formatted input string to make it understand the current context. See the example below for some of them.
You can tweak it even further by providing a system message.

```bash
OpenAICompletionCurl(example=<deepinfra.lib.examples.ConversationExample object at 0x7f5b74e3c670>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)
```

The conversation above might return something like the following

```json
{
    "id": "cmpl-b23a3fb60cde42ce8f24bb980b4dee87",
    "object": "text_completion",
    "created": 1715688169,
    "model": "mistralai/Mixtral-8x22B-v0.1",
    "choices": [
        {
            "index": 0,
            "text": "Sous le Sable, my friend! It's an ancient technique that's been used for centuries in the Middle East and North Africa. The name itself...",
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 149,
        "total_tokens": 487,
        "completion_tokens": 338,
        "estimated_cost": 0.00035493
    }
}
```

The longer the conversation gets, the more time it takes the model to generate the response. The conversation is limited by the context size of a model. Larger models also usually take more time to respond.

<br>

### Streaming

You can also perform a streaming request by passing `"stream": true`:

```bash
OpenAICompletionCurl(example=<deepinfra.lib.examples.BasicExample object at 0x7f5b74e3ff40>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=True)
```

to which you'd get a sequence of [SSE](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events) events, finishing with `[DONE]`.

```
data: {"id": "cmpl-158cbf94cef043c2955172e8062ded3d", "object": "text_completion", "created": 1694623354, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": "Hi", "finish_reason": null}]}

data: {"id": "cmpl-158cbf94cef043c2955172e8062ded3d", "object": "text_completion", "created": 1694623354, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": "!", "finish_reason": null}]}

data: {"id": "cmpl-158cbf94cef043c2955172e8062ded3d", "object": "text_completion", "created": 1694623354, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": "", "finish_reason": null}]}

data: {"id": "cmpl-158cbf94cef043c2955172e8062ded3d", "object": "text_completion", "created": 1694623354, "model": "mistralai/Mixtral-8x22B-v0.1", "choices": [{"index": 0, "text": "", "finish_reason": "stop"}]}

data: [DONE]
```

#### Input format

You can see below the basic format of the input. Bear in mind that newlines often matter.

```text
PromptGenerator(example=<deepinfra.lib.examples.BasicExample object at 0x7f5b74e3d930>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

Conversation prompts contain the history of the exchanged prompts and responses.

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptConversationExample object at 0x7f5b74e3c670>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

If you want to add system prompt, it is done like this

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptSystemExample object at 0x7f5b74e3ff40>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```



You can use the official openai python client to run inferences with us.

However, this is an advanced and more complex API. We strongly recommend that you use OpenAI Chat Completions instead.

#### Simple prompt

To query this model you need to provide a properly formatted input string.

```python
OpenAICompletionPython(example=<deepinfra.lib.examples.BasicExample object at 0x7f7d2b9a1c00>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)

# Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
# 11 25
```

#### Conversations

The OpenAI Chat Completions API is better suited for chat-like conversations, use it instead.

However, you can still do it if you really need to. You have to add each response and each of the user prompts to every request.
You need a properly formatted input string to make it understand the current context. See the example below for some of them.
You can tweak it even further by providing a system message.

```python
OpenAICompletionPython(example=<deepinfra.lib.examples.ConversationExample object at 0x7f7d2b9a1cc0>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)

# Sous le Sable! It's an ancient technique that never goes out of style, n'est-ce pas? Literally ...
# 149 324
```

The longer the conversation gets, the more time it takes the model to generate the response.
The number of messages that you can have in a conversation is limited by the context size of a model.
Larger models also usually take more time to respond.


<br>

### Streaming

Streaming completions is supported by adding the `stream=True` option.

```python
OpenAICompletionPython(example=<deepinfra.lib.examples.BasicExample object at 0x7f7d2b9a1d50>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=True)

# Hello
# !
# It
# 's
# nice
# ...
# 11 25
```

#### Input format

You can see below the basic format of the input. Bear in mind that newlines often matter.

```text
PromptGenerator(example=<deepinfra.lib.examples.BasicExample object at 0x7f7d2b9a1c00>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

Conversation prompts contain the history of the exchanged prompts and responses.

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptConversationExample object at 0x7f7d2b9a1cc0>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

If you want to add system prompt, it is done like this

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptSystemExample object at 0x7f7d2b9a1d50>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```



You can use JavaScript in the browser or node.js to make requests with us.

However, this is an advanced and more complex API. We strongly recommend that you use OpenAI Chat Completions instead.

```bash
npm install openai
```

#### Simple prompt

To query this model you need to provide a properly formatted input string.

```javascript
OpenAICompletionJS(example=<deepinfra.lib.examples.BasicExample object at 0x7f63e0387fd0>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)

// Hello! It's nice to meet you. Is there something I can help you with, or would you like to chat?
// 11 25
```

#### Conversations

The OpenAI Chat Completions API is better suited for chat-like conversations, use it instead.

However, you can still do it if you really need to. You have to add each response and each of the user prompts to every request.
You need a properly formatted input string to make it understand the current context. See the example below for some of them.
You can tweak it even further by providing a system message.

```javascript
OpenAICompletionJS(example=<deepinfra.lib.examples.ConversationExample object at 0x7f63e0387490>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=False)

// Sous le Sable! It's an ancient technique that never goes out of style, n'est-ce pas? Literally ...
// 149 324
```

The longer the conversation gets, the more time it takes the model to generate the response.
The number of messages that you can have in a conversation is limited by the context size of a model.
Larger models also usually take more time to respond.


<br>

### Streaming

Streaming completions is supported by adding the `stream: true` option.

```javascript
OpenAICompletionJS(example=<deepinfra.lib.examples.BasicExample object at 0x7f63e0386170>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='$DEEPINFRA_TOKEN', stream=True)

// Hello
// !
// It
// 's
// nice
// ...
// 11 25
```

#### Input format

You can see below the basic format of the input. Bear in mind that newlines often matter.

```text
PromptGenerator(example=<deepinfra.lib.examples.BasicExample object at 0x7f63e0387fd0>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

Conversation prompts contain the history of the exchanged prompts and responses.

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptConversationExample object at 0x7f63e0387490>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```

If you want to add system prompt, it is done like this

```text
PromptGenerator(example=<deepinfra.lib.examples.PromptSystemExample object at 0x7f63e0386170>, model_info=ModelInfo(model_name='mistralai/Mixtral-8x22B-v0.1', provider=<ModelProvider.DEEPINFRA: 'deepinfra'>, type='text-generation', tags=['completions', 'openai'], reported_type='text-generation', model_id='di:mistralai:Mixtral-8x22B-v0.1:42a1ba7ede3e491b47fc3fdc4a61b7ebff9442e1', disabled=0, private=0, owner_uid=None, streamable=1, pricing_history=[ModelPricing(ptype=<PricingType.TOKENS: 'tokens'>, cents_per_sec=None, cents_per_gtok=6.5e-05, cents_per_ctok=6.5e-05, cents_per_input_sec=None, cents_per_input_chars=None, cents_per_image_unit=None, t2i_default_width=None, t2i_default_height=None, t2i_default_iterations=None, active=TimeInterval(fr=0, to=4000000000000))], max_tokens=65536, featured=0, description='Mixtral-8x22B is the latest and largest mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 22b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference.  This model is not instruction tuned. ', github_url='', paper_url='', license_url='', readme='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/readme.6fe63ef19cfcfef5a611663c9ecc761e198f23927625ea081ab5977890b3470e.md', cover_img_url='https://shared.deepinfra.com/models/mistralai/Mixtral-8x22B-v0.1/cover_image.eb92d1199149a5d7fa5e7b2dc17dc991f7398301747b92bd60032c3b7fc77a0f.webp', virtual=0, replaced_by='mistralai/Mixtral-8x7B-Instruct-v0.1', deprecated=1726522208, quantization='fp16', mmlu=None, expected=None, stop_token_ids=[], stop_words=[], lora_supported=False, chat_template_raw=None, chat_template_extra={}, routing_algo=<RoutingAlgo.UID: 'uid'>), api_key='', stream=False)
```



input

maximum length of the newly generated generated text.If explicitly set to None it will be the model's max context length minus input length.

max_new_tokens

temperature to use for sampling. 0 means the output is deterministic. Values greater than 1 encourage more diversity

temperature

Sample from the set of tokens with highest probability such that sum of probabilies is higher than p. Lower values focus on the most probable tokens.Higher values sample more low-probability tokens

top_p

Float that represents the minimum probability for a token to be considered, relative to the probability of the most likely token. Must be in [0, 1]. Set to 0 to disable this.

min_p

Sample from the best k (number of) tokens. 0 means off

top_k

repetition penalty. Value of 1 means no penalty, values greater than 1 discourage repetition, smaller than 1 encourage repetition.

repetition_penalty

Up to 16 strings that will terminate generation immediately

stop

Number of output sequences to return. Incompatible with streaming

num_responses

Optional nested object with "type" set to "json_object"

response_format

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

presence_penalty

Positive values penalize new tokens based on how many times they appear in the text so far, increasing the model's likelihood to talk about new topics.

frequency_penalty

A unique identifier representing your end-user, which can help monitor and detect abuse. Avoid sending us any identifying information. We recommend hashing user identifiers.

user

Seed for random number generator. If not provided, a random seed is used. Determinism is not guaranteed.

seed

The webhook to call when inference is done, by default you will get the output in the response of your inference request

webhook

Whether to stream tokens, by default it will be false, currently only supported for Llama 2 text generation models, token by token updates will be sent over SSE

stream

Type

ResponseFormat

Frequency Penalty

Input

Max New Tokens

Min P

Num Responses

Presence Penalty

Repetition Penalty

Response Format

Seed

Stop

Stream

Temperature

Top K

Top P

User

Webhook

TextGenerationIn

I have this dream about the day I got a job at a tech company. I just woke up on a plane. I sat down on the floor and started getting work done. After getting up around 6 p.m., I looked around and

Generated Text

GeneratedText

estimated cost billed for the request in USD

Cost

Runtime Ms

Status

Tokens Generated

Tokens Input

InferenceReplyStatus

Object containing the status of the inference request

Inference Status

Num Input Tokens

number of generated tokens, excluding prompt

Num Tokens

Request Id

Results

TextGenerationOut

model

conversation messages: (user,assistant,tool)*,user including one system message anywhere

messages

whether to stream the output via SSE or return the full response

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

The maximum number of tokens to generate in the chat completion.

The total length of input tokens and generated tokens is limited by the model's context length.

max_tokens

up to 16 sequences where the API will stop generating further tokens

A list of tools the model may call. Currently, only functions are supported as a tool.

tools

Controls which (if any) function is called by the model. none means the model will not call a function and instead generates a message. auto means the model can pick between generating a message or calling a function. specifying a particular function choice is not supported currently.none is the default when no functions are present. auto is the default if functions are present.

tool_choice

The format of the response. Currently, only json is supported.

Alternative penalty for repetition, but multiplicative instead of additive (> 1 penalize, < 1 encourage)

ChatCompletionAssistantMessage

ChatCompletionContentPartImage

ChatCompletionContentPartText

ChatCompletionMessageToolCall

ChatCompletionSystemMessage

ChatCompletionToolMessage

ChatCompletionUserMessage

ChatTools

Function

FunctionDefinition

ImageURL

Max Tokens

Messages

Model

Tool Choice

Tools

OpenAIChatCompletionsIn

FinishReason

OpenAIChatCompletionChoice

UsageInfo

a list of chat completion choices, can be more than one

Choices

Created

the object type, which is always chat.completion

Object

Usage

OpenAIChatCompletionOut

ChatMessageRole

OpenAIChatCompletionStreamChoice

OpenAIDeltaMessage

OpenAIDeltaToolCall

OpenAIDeltaToolCallFunction

the object type, which is always chat.completion.chunk

OpenAIChatCompletionStreamOut

input prompt - a single string is currently supported

prompt

The maximum number of tokens to generate in the completion.

The total length of input tokens and generated tokens is limited by the model's context length.If explicitly set to None it will be the model's max context length minus input length.

return top tokens and their log-probabilities

logprobs

echo

A unique identifier representing your end-user, which can help  monitor and detect abuse. Avoid sending us any identifying information. We recommend hashing user identifiers.

Echo

Logprobs

Prompt

OpenAICompletionsIn

LogProbs

OpenAICompletionChoice

a list of completion choices, can be more than one

the object type, which is always text_completion

OpenAICompletionOut

OpenAICompletionStreamChoice

OpenAICompletionStreamOut

A system message that will be part of the prompt

system

conversation messages: (user,assistant,tool)*, including one system message anywhere. You can either use `prompt` or `messages` but not both.

topP

topK

maxTokens

If set, the model will stop generating text when one of the stop sequences is generated.