We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-video

text-to-image

reranker

zero-shot-image-classification

multimodal

Category/all

meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo cover image

fp8

128k

$0.10/$0.28 in/out Mtoken

meta-llama/

Meta-Llama-3.1-70B-Instruct-Turbo

text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-8B-Instruct cover image

bfloat16

128k

$0.03/$0.05 in/out Mtoken

meta-llama/

Meta-Llama-3.1-8B-Instruct

text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo cover image

fp8

128k

$0.015/$0.02 in/out Mtoken

meta-llama/

Meta-Llama-3.1-8B-Instruct-Turbo

text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

microsoft/Phi-3-medium-4k-instruct cover image

bfloat16

Replaced

microsoft/

Phi-3-medium-4k-instruct

text-generation

The Phi-3-Medium-4K-Instruct is a powerful and lightweight language model with 14 billion parameters, trained on high-quality data to excel in instruction following and safety measures. It demonstrates exceptional performance across benchmarks, including common sense, language understanding, and logical reasoning, outperforming models of similar size.

microsoft/Phi-4-multimodal-instruct cover image

bfloat16

128k

Replaced

microsoft/

Phi-4-multimodal-instruct

text-generation

Phi-4-multimodal-instruct is a lightweight open multimodal foundation model that leverages the language, vision, and speech research and datasets used for Phi-3.5 and 4.0 models. The model processes text, image, and audio inputs, generating text outputs, and comes with 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning, direct preference optimization and RLHF (Reinforcement Learning from Human Feedback) to support precise instruction adherence and safety measures. The languages that each modal supports are the following: - Text: Arabic, Chinese, Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Hungarian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian - Vision: English - Audio: English, Chinese, German, French, Italian, Japanese, Spanish, Portuguese

fp16

32k

Replaced

microsoft/

WizardLM-2-7B

text-generation

WizardLM-2 7B is the smaller variant of Microsoft AI's latest Wizard model. It is the fastest and achieves comparable performance with existing 10x larger open-source leading models

bfloat16

64k

$$0.005 / Mtoken

microsoft/

WizardLM-2-8x22B

text-generation

WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to those leading proprietary models.

bfloat16

32k

Replaced

microsoft/

phi-4-reasoning-plus

text-generation

Phi-4-reasoning-plus is a state-of-the-art open-weight reasoning model finetuned from Phi-4 using supervised fine-tuning on a dataset of chain-of-thought traces and reinforcement learning. The supervised fine-tuning dataset includes a blend of synthetic prompts and high-quality filtered data from public domain websites, focused on math, science, and coding skills as well as alignment data for safety and Responsible AI. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning. Phi-4-reasoning-plus has been trained additionally with Reinforcement Learning, hence, it has higher accuracy but generates on average 50% more tokens, thus having higher latency.

mistralai/Devstral-Small-2505 cover image

bfloat16

125k

Replaced

mistralai/

Devstral-Small-2505

text-generation

Devstral is an agentic LLM for software engineering tasks. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents.

mistralai/Mistral-7B-Instruct-v0.1 cover image

fp16

32k

Replaced

mistralai/

Mistral-7B-Instruct-v0.1

text-generation

The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.

mistralai/Mistral-7B-Instruct-v0.2 cover image

fp16

32k

Replaced

mistralai/

Mistral-7B-Instruct-v0.2

text-generation

The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.2 generative text model using a variety of publicly available conversation datasets.

mistralai/Mistral-7B-Instruct-v0.3 cover image

bfloat16

32k

$0.028/$0.054 in/out Mtoken

mistralai/

Mistral-7B-Instruct-v0.3

text-generation

Mistral-7B-Instruct-v0.3 is an instruction-tuned model, next iteration of of Mistral 7B that has larger vocabulary, newer tokenizer and supports function calling.

mistralai/Mistral-Nemo-Instruct-2407 cover image

fp8

128k

$0.02/$0.04 in/out Mtoken

mistralai/

Mistral-Nemo-Instruct-2407

text-generation

12B model trained jointly by Mistral AI and NVIDIA, it significantly outperforms existing models smaller or similar in size.

mistralai/Mistral-Small-24B-Instruct-2501 cover image

fp8

32k

$0.05/$0.08 in/out Mtoken

mistralai/

Mistral-Small-24B-Instruct-2501

text-generation

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware.

mistralai/Mistral-Small-3.1-24B-Instruct-2503 cover image

fp8

125k

$0.05/$0.10 in/out Mtoken

mistralai/

Mistral-Small-3.1-24B-Instruct-2503

text-generation

Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and extends context capabilities up to 128K tokens while maintaining top-tier text performance. Its 24 billion parameters and instruction fine-tuning deliver fast, local deployment for both text and vision tasks.

mistralai/Mixtral-8x22B-Instruct-v0.1 cover image

bfloat16

64k

Replaced

mistralai/

Mixtral-8x22B-Instruct-v0.1

text-generation

This is the instruction fine-tuned version of Mixtral-8x22B - the latest and largest mixture of experts large language model (LLM) from Mistral AI. This state of the art machine learning model uses a mixture 8 of experts (MoE) 22b models. During inference 2 experts are selected. This architecture allows large models to be fast and cheap at inference.

mistralai/Mixtral-8x7B-Instruct-v0.1 cover image

fp8

32k

$0.08/$0.24 in/out Mtoken

mistralai/

Mixtral-8x7B-Instruct-v0.1

text-generation

Mixtral is mixture of expert large language model (LLM) from Mistral AI. This is state of the art machine learning model using a mixture 8 of experts (MoE) 7b models. During inference 2 expers are selected. This architecture allows large models to be fast and cheap at inference. The Mixtral-8x7B outperforms Llama 2 70B on most benchmarks.

nvidia/Llama-3.1-Nemotron-70B-Instruct cover image

fp8

128k

$0.12/$0.30 in/out Mtoken

nvidia/

Llama-3.1-Nemotron-70B-Instruct

text-generation

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo. As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

openchat/

openchat_3.5

bigcode/

starcoder2-15b

openai/

whisper-tiny

Phind/

Phind-CodeLlama-34B-v2

Gryphe/

MythoMax-L2-13b

Featured Models

deepseek-ai/

DeepSeek-R1-0528

zai-org/

GLM-4.5V

google/

gemma-3-27b-it

openai/

gpt-oss-20b

google/

gemma-3-12b-it

mistralai/

Devstral-Small-2507

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales