We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

featured

text-generation

Mistral-Small-3.2-24B-Instruct-2506

mistralai/Mistral-Small-3.2-24B-Instruct-2506 cover image

Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the infinite-generation errors, and a more robust function-calling interface—while otherwise matching or slightly improving on all previous text and vision benchmarks.

$0.075 in, $0.20 out / 1M

featured

text-generation

anthropic/claude-4-opus cover image

Anthropic’s most powerful model yet and the state-of-the-art coding model. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4 is ideal for powering frontier agent products and features.

$16.50 in, $82.50 out / 1M

featured

text-generation

claude-4-sonnet

anthropic/claude-4-sonnet cover image

Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more.

$3.30 in, $16.50 out / 1M

featured

text-generation

gemini-2.5-flash

google/gemini-2.5-flash cover image

Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

$0.30 in, $2.50 out / 1M

featured

text-generation

google/gemini-2.5-pro cover image

Gemini 2.5 Pro is Google's the most advanced thinking model, designed to tackle increasingly complex problems. Gemini 2.5 Pro leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities. Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. The Gemini 2.5 Pro model is now available on DeepInfra.

$1.25 in, $10.00 out / 1M

featured

text-generation

google/gemma-3-27b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

$0.09 in, $0.16 out / 1M

featured

text-generation

google/gemma-3-12b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.04 in, $0.13 out / 1M

featured

text-generation

google/gemma-3-4b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.04 in, $0.08 out / 1M

featured

text-generation

DeepSeek-R1-Distill-Llama-70B

deepseek-ai/DeepSeek-R1-Distill-Llama-70B cover image

DeepSeek-R1-Distill-Llama-70B is a highly efficient language model that leverages knowledge distillation to achieve state-of-the-art performance. This model distills the reasoning patterns of larger models into a smaller, more agile architecture, resulting in exceptional results on benchmarks like AIME 2024, MATH-500, and LiveCodeBench. With 70 billion parameters, DeepSeek-R1-Distill-Llama-70B offers a unique balance of accuracy and efficiency, making it an ideal choice for a wide range of natural language processing tasks.

$0.50 in, $1.00 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-V3 cover image

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

$0.38 in, $0.89 out / 1M

featured

text-generation

Llama-3.3-70B-Instruct-Turbo

meta-llama/Llama-3.3-70B-Instruct-Turbo cover image

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

$0.13 in, $0.38 out / 1M

featured

text-generation

Llama-3.3-70B-Instruct

meta-llama/Llama-3.3-70B-Instruct cover image

Llama 3.3-70B is a multilingual LLM trained on a massive dataset of 15 trillion tokens, fine-tuned for instruction-following and conversational dialogue. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

$0.23 in, $0.40 out / 1M

featured

text-generation

microsoft/phi-4 cover image

Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

$0.07 in, $0.14 out / 1M

text-generation

MythoMax-L2-13b

Gryphe/MythoMax-L2-13b cover image

$0.06 / 1M tokens

text-generation

Hermes-3-Llama-3.1-405B

NousResearch/Hermes-3-Llama-3.1-405B cover image

Hermes 3 is a cutting-edge language model that offers advanced capabilities in roleplaying, reasoning, and conversation. It's a fine-tuned version of the Llama-3.1 405B foundation model, designed to align with user needs and provide powerful control. Key features include reliable function calling, structured output, generalist assistant capabilities, and improved code generation. Hermes 3 is competitive with Llama-3.1 Instruct models, with its own strengths and weaknesses.

$1.00 / 1M tokens

text-generation

Hermes-3-Llama-3.1-70B

NousResearch/Hermes-3-Llama-3.1-70B cover image

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

$0.30 / 1M tokens

text-generation

Qwen2.5-72B-Instruct

Qwen/Qwen2.5-72B-Instruct cover image

Qwen2.5 is a model pretrained on a large-scale dataset of up to 18 trillion tokens, offering significant improvements in knowledge, coding, mathematics, and instruction following compared to its predecessor Qwen2. The model also features enhanced capabilities in generating long texts, understanding structured data, and generating structured outputs, while supporting multilingual capabilities for over 29 languages.

$0.12 in, $0.39 out / 1M

text-generation

Qwen2.5-VL-32B-Instruct

Qwen/Qwen2.5-VL-32B-Instruct cover image

$0.20 in, $0.60 out / 1M

text-generation

Qwen3-VL-235B-A22B-Instruct

Qwen/Qwen3-VL-235B-A22B-Instruct cover image

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

$0.30 in, $1.49 out / 1M

text-generation

Qwen3-VL-235B-A22B-Thinking

Qwen/Qwen3-VL-235B-A22B-Thinking cover image

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

$0.45 in, $3.49 out / 1M

text-generation

Qwen3-VL-30B-A3B-Instruct

Qwen/Qwen3-VL-30B-A3B-Instruct cover image

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

$0.29 in, $0.99 out / 1M

text-generation

Qwen3-VL-30B-A3B-Thinking

Qwen/Qwen3-VL-30B-A3B-Thinking cover image

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

$0.29 in, $0.99 out / 1M

text-generation

Qwen3-VL-4B-Instruct

Qwen/Qwen3-VL-4B-Instruct cover image

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

$0.10 in, $0.60 out / 1M

text-generation

Qwen3-VL-4B-Thinking

Qwen/Qwen3-VL-4B-Thinking cover image

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities.

$0.10 in, $1.00 out / 1M

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

moonshotai/Kimi-K2-Instruct-0905 zai-org/GLM-4.6 deepseek-ai/DeepSeek-V3.1 anthropic/claude-3-7-sonnet-latest deepseek-ai/DeepSeek-V3.2-Exp

Featured Models

hexgrad/Kokoro-82M anthropic/claude-4-opus openai/gpt-oss-20b deepseek-ai/DeepSeek-V3.2-Exp Qwen/Qwen3-235B-A22B-Thinking-2507

Built With Love in Palo Alto

© 2025 Deep Infra. All rights reserved.

Privacy Policy Terms of Service