We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

View All

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-image

text-to-video

reranker

zero-shot-image-classification

multimodal

featured

text-generation

deepseek-ai/

DeepSeek-V3.1-Terminus

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

fp4

160k

$0.216/$0.27/$1.00 cached/in/out Mtoken

featured

text-generation

Qwen/

Qwen3-Next-80B-A3B-Instruct

Over the past few months, we have observed increasingly clear trends toward scaling both total parameters and context lengths in the pursuit of more powerful and agentic artificial intelligence (AI). We are excited to share our latest advancements in addressing these demands, centered on improving scaling efficiency through innovative model architecture. We call this next-generation foundation models Qwen3-Next.

bfloat16

256k

$0.14/$1.40 in/out Mtoken

featured

text-generation

Qwen/

Qwen3-Next-80B-A3B-Thinking

bfloat16

256k

$0.14/$1.40 in/out Mtoken

featured

text-generation

moonshotai/

Kimi-K2-Instruct-0905

Kimi K2 0905 is the September update of Kimi K2 0711. It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k. This update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.

fp4

256k

$0.40/$0.50/$2.00 cached/in/out Mtoken

featured

text-generation

deepseek-ai/

DeepSeek-V3.1

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.

fp4

160k

$0.216/$0.27/$1.00 cached/in/out Mtoken

featured

text-generation

openai/

gpt-oss-120b

gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

fp4

128k

$0.05/$0.45 in/out Mtoken

featured

text-generation

openai/

gpt-oss-20b

gpt-oss-20b is an open-weight 21B parameter model released by OpenAI under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference. The model is trained in OpenAI’s Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

fp4

128k

$0.04/$0.15 in/out Mtoken

featured

text-generation

allenai/

olmOCR-7B-0825

olmOCR is a specialized AI tool that converts PDF documents into clean, structured text while preserving important formatting and layout information. What makes olmOCR particularly valuable for developers is its ability to handle challenging PDFs that traditional OCR tools struggle with—including complex layouts, poor-quality scans, handwritten text, and documents with mixed content types. Built on a fine-tuned 7B vision-language model, olmOCR provides enterprise-grade PDF processing at a fraction of the cost of proprietary solutions.

16k

$0.27/$1.50 in/out Mtoken

featured

text-generation

Qwen/

Qwen3-Coder-480B-A35B-Instruct-Turbo

Qwen3-Coder-480B-A35B-Instruct is the Qwen3's most agentic code model, featuring Significant Performance on Agentic Coding, Agentic Browser-Use and other foundational coding tasks, achieving results comparable to Claude Sonnet.

fp4

256k

$0.29/$1.20 in/out Mtoken

featured

text-generation

zai-org/

GLM-4.5

The GLM-4.5 series models are foundation models designed for intelligent agents. GLM-4.5 has 355 billion total parameters with 32 billion active parameters, while GLM-4.5-Air adopts a more compact design with 106 billion total parameters and 12 billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.

fp8

128k

$0.40/$1.60 in/out Mtoken

featured

text-generation

Qwen/

Qwen3-235B-A22B-Thinking-2507

Qwen3-235B-A22B-Thinking-2507 is the Qwen3's new model with scaling the thinking capability of Qwen3-235B-A22B, improving both the quality and depth of reasoning.

fp8

256k

$0.30/$2.90 in/out Mtoken

featured

text-generation

Qwen/

Qwen3-Coder-480B-A35B-Instruct

fp8

256k

$0.40/$1.60 in/out Mtoken

featured

speech-recognition

mistralai/

Voxtral-Small-24B-2507

Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

bf16

32k

$0.00300 / minute

featured

speech-recognition

mistralai/

Voxtral-Mini-3B-2507

Voxtral Mini is an enhancement of Ministral 3B, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding.

bf16

32k

$0.00100 / minute

featured

text-generation

deepseek-ai/

DeepSeek-R1-0528-Turbo

The DeepSeek R1 0528 turbo model is a state of the art reasoning model that can generate very quick responses

fp4

32k

$1.00/$3.00 in/out Mtoken

featured

text-generation

Qwen/

Qwen3-235B-A22B-Instruct-2507

Qwen3-235B-A22B-Instruct-2507 is the updated version of the Qwen3-235B-A22B non-thinking mode, featuring Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

fp8

256k

$0.09/$0.60 in/out Mtoken

featured

text-generation

Qwen/

Qwen3-30B-A3B

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support

fp8

40k

$0.08/$0.29 in/out Mtoken

featured

text-generation

Qwen/

Qwen3-32B

fp8

40k

$0.10/$0.28 in/out Mtoken

featured

text-generation

Qwen/

Qwen3-14B

fp8

40k

$0.06/$0.24 in/out Mtoken

featured

text-generation

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-FP8

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

fp8

1024k

$0.15/$0.60 in/out Mtoken

featured

text-generation

meta-llama/

Llama-4-Scout-17B-16E-Instruct

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts

fp8

320k

$0.08/$0.30 in/out Mtoken

featured

text-generation

deepseek-ai/

DeepSeek-R1-0528

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528.

fp4

160k

$0.40/$0.50/$2.15 cached/in/out Mtoken

featured

text-generation

deepseek-ai/

DeepSeek-V3-0324

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3.

fp4

160k

$0.25/$0.88 in/out Mtoken

featured

text-generation

mistralai/

Mistral-Small-3.2-24B-Instruct-2506

Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the infinite-generation errors, and a more robust function-calling interface—while otherwise matching or slightly improving on all previous text and vision benchmarks.

fp8

125k

$0.075/$0.20 in/out Mtoken

Company

Latest Models

Phind/Phind-CodeLlama-34B-v2 openai/whisper-tiny openchat/openchat_3.5 Gryphe/MythoMax-L2-13b bigcode/starcoder2-15b

Featured Models

zai-org/GLM-4.5 moonshotai/Kimi-K2-Instruct-0905 Qwen/Qwen3-Coder-480B-A35B-Instruct Qwen/Qwen3-235B-A22B-Instruct-2507 Qwen/Qwen3-Next-80B-A3B-Instruct