We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-video

text-to-image

reranker

zero-shot-image-classification

multimodal

Category/all

meta-llama/Llama-4-Maverick-17B-128E-Instruct-Turbo cover image

featured

fp8

$0.50 / Mtoken

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-Turbo

text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 cover image

featured

fp8

1024k

$0.15/$0.60 in/out Mtoken

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-FP8

text-generation

meta-llama/Llama-4-Scout-17B-16E-Instruct cover image

featured

bfloat16

320k

$0.08/$0.30 in/out Mtoken

meta-llama/

Llama-4-Scout-17B-16E-Instruct

text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts

deepseek-ai/DeepSeek-R1-0528 cover image

featured

fp4

160k

$0.50/$2.15 in/out Mtoken

deepseek-ai/

DeepSeek-R1-0528

text-generation

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528.

deepseek-ai/DeepSeek-V3-0324 cover image

featured

fp4

160k

$0.28/$0.88 in/out Mtoken

deepseek-ai/

DeepSeek-V3-0324

text-generation

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3.

mistralai/Devstral-Small-2507 cover image

featured

fp8

125k

$0.07/$0.28 in/out Mtoken

mistralai/

Devstral-Small-2507

text-generation

Devstral is an agentic LLM for software engineering tasks, making it a great choice for software engineering agents.

mistralai/Mistral-Small-3.2-24B-Instruct-2506 cover image

featured

fp8

125k

$0.05/$0.10 in/out Mtoken

mistralai/

Mistral-Small-3.2-24B-Instruct-2506

text-generation

Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the infinite-generation errors, and a more robust function-calling interface—while otherwise matching or slightly improving on all previous text and vision benchmarks.

meta-llama/Llama-Guard-4-12B cover image

meta-llama/

Llama-Guard-4-12B

text-generation

Llama Guard 4 is a natively multimodal safety classifier with 12 billion parameters trained jointly on text and multiple images. Llama Guard 4 is a dense architecture pruned from the Llama 4 Scout pre-trained model and fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It itself acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

featured

bfloat16

128k

$0.075/$0.15 in/out Mtoken

Qwen/

QwQ-32B

text-generation

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

featured

195k

$16.50/$82.50 in/out Mtoken

anthropic/

claude-4-opus

text-generation

Anthropic’s most powerful model yet and the state-of-the-art coding model. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4 is ideal for powering frontier agent products and features.

featured

195k

$3.30/$16.50 in/out Mtoken

anthropic/

claude-4-sonnet

text-generation

Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more.

featured

976k

$0.21/$1.75 in/out Mtoken

google/

gemini-2.5-flash

text-generation

Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

featured

976k

$0.875/$7.00 in/out Mtoken

google/

gemini-2.5-pro

text-generation

Gemini 2.5 Pro is Google's the most advanced thinking model, designed to tackle increasingly complex problems. Gemini 2.5 Pro leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities. Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. The Gemini 2.5 Pro model is now available on DeepInfra.

featured

bfloat16

128k

$0.09/$0.17 in/out Mtoken

google/

gemma-3-27b-it

text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

featured

bfloat16

128k

$0.05/$0.10 in/out Mtoken

google/

gemma-3-12b-it

text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

featured

bfloat16

128k

$0.02/$0.04 in/out Mtoken

google/

gemma-3-4b-it

text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

featured

$0.62 per M characters

hexgrad/

Kokoro-82M

text-to-speech

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

canopylabs/orpheus-3b-0.1-ft cover image

featured

$7.00 per M characters

canopylabs/

orpheus-3b-0.1-ft

text-to-speech

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

Gryphe/

MythoMax-L2-13b

Phind/

Phind-CodeLlama-34B-v2

openai/

whisper-tiny

openchat/

openchat_3.5

bigcode/

starcoder2-15b

Featured Models

meta-llama/

Llama-3.3-70B-Instruct-Turbo

mistralai/

Voxtral-Small-24B-2507

meta-llama/

Llama-Guard-4-12B

Qwen/

Qwen3-235B-A22B-Instruct-2507

deepseek-ai/

DeepSeek-R1-Distill-Llama-70B

Qwen/

Qwen3-Coder-480B-A35B-Instruct-Turbo

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales