We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

DeepInfra raises $107M Series B to scale the inference cloud — read the announcement

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

text-generation

XiaomiMiMo/MiMo-V2.5 cover image

MiMo-V2.5 is a native omnimodal model with strong agentic capabilities, supporting text, image, video, and audio understanding within a unified architecture. Built upon the MiMo-V2-Flash backbone and extended with dedicated vision and audio encoders, it delivers robust performance across multimodal perception, long-context reasoning, and agentic workflows.

$0.08 cached, $0.40 in, $2.00 out / 1M

text-generation

anthropic/claude-fable-5 cover image

Claude Fable 5 is Anthropic's next generation of intelligence for the hardest knowledge work and coding problems. It works independently for longer than any prior generally available Claude model: run it in an agent harness and it can work for days at a time, planning across stages, delegating to sub-agents, and checking its own work.

$10.00 in, $50.00 out / 1M

text-generation

claude-haiku-4-5

anthropic/claude-haiku-4-5 cover image

The next generation of Anthropic's fastest and most cost-effective model, optimal for use cases where speed and affordability matter.

$1.00 in, $5.00 out / 1M

text-generation

claude-opus-4-7

anthropic/claude-opus-4-7 cover image

Anthropic's most capable production model yet, advancing performance across coding, enterprise workflows, and long-running agentic tasks.

$5.00 in, $25.00 out / 1M

text-generation

claude-opus-4-8

anthropic/claude-opus-4-8 cover image

Claude Opus 4.8 is our most intelligent Opus model and the best generally available model for coding and agents, with deeper reasoning for enterprise workflows.

$5.00 in, $25.00 out / 1M

text-generation

anthropic/claude-opus-5 cover image

Claude Opus 5 is Anthropic's most advanced Opus model, powering long-running agents while delivering improvements in coding and professional work.

$5.00 in, $25.00 out / 1M

text-generation

claude-sonnet-4-6

anthropic/claude-sonnet-4-6 cover image

Claude Sonnet 4.6 delivers frontier intelligence at scale—built for coding, agents, and enterprise workflows.

$3.00 in, $15.00 out / 1M

text-generation

claude-sonnet-5

anthropic/claude-sonnet-5 cover image

Claude Sonnet 5 is Anthropic's most capable Sonnet model yet, built for coding, agents, and professional work at scale. It brings near-Opus intelligence to the model teams run at scale every day, with the same balance of capability, cost, and speed teams already rely on Sonnet for.

$2.00 in, $10.00 out / 1M

text-generation

DeepSeek-R1-0528

deepseek-ai/DeepSeek-R1-0528 cover image

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528.

$0.35 cached, $0.50 in, $2.15 out / 1M

text-generation

deepseek-ai/DeepSeek-V3 cover image

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

$0.32 in, $0.89 out / 1M

text-generation

DeepSeek-V3-0324

deepseek-ai/DeepSeek-V3-0324 cover image

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3.

$0.135 cached, $0.24 in, $0.90 out / 1M

text-generation

deepseek-ai/DeepSeek-V3.1 cover image

DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.

$0.13 cached, $0.25 in, $0.95 out / 1M

text-generation

DeepSeek-V3.1-Terminus

deepseek-ai/DeepSeek-V3.1-Terminus cover image

DeepSeek-V3.1 Terminus is an update to DeepSeek V3.1 that maintains the model's original capabilities while addressing issues reported by users, including language consistency and agent capabilities, further optimizing the model's performance in coding and search agents. It is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes. It extends the DeepSeek-V3 base with a two-phase long-context training process. Users can control the reasoning behaviour with the reasoning enabled boolean. Learn more in our docs The model improves tool use, code generation, and reasoning efficiency, achieving performance comparable to DeepSeek-R1 on difficult benchmarks while responding more quickly. It supports structured tool calling, code agents, and search agents, making it suitable for research, coding, and agentic workflows.

$0.13 cached, $0.27 in, $0.95 out / 1M

text-generation

gemini-1.5-flash

google/gemini-1.5-flash cover image

Gemini 1.5 Flash is Google's foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter.

text-generation

gemini-1.5-flash-8b

google/gemini-1.5-flash-8b cover image

text-generation

gemini-2.5-flash

google/gemini-2.5-flash cover image

Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

$0.30 in, $2.50 out / 1M

text-generation

google/gemini-2.5-pro cover image

Gemini 2.5 Pro is Google's the most advanced thinking model, designed to tackle increasingly complex problems. Gemini 2.5 Pro leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities. Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. The Gemini 2.5 Pro model is now available on DeepInfra.

$1.25 in, $10.00 out / 1M

text-generation

gemini-3.1-flash-lite

google/gemini-3.1-flash-lite cover image

Bring any idea to life with state-of-the-art reasoning to help you learn, build, and plan anything. Best for high-volume tasks that need efficiency and intelligence.

$0.25 in, $1.50 out / 1M

text-generation

google/gemini-3.1-pro cover image

Bring any idea to life with state-of-the-art reasoning to help you learn, build, and plan anything. Best for complex tasks and bringing creative concepts to life.

$2.00 in, $12.00 out / 1M

text-generation

gemini-3.5-flash

google/gemini-3.5-flash cover image

Gemini 3.5 Flash delivers near-Pro intelligence at Flash-tier cost and speed: Pro-level coding proficiency, parallel agentic execution, all at a much lower price.

$1.50 in, $9.00 out / 1M

text-generation

google/gemma-3-12b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.05 in, $0.15 out / 1M

text-generation

google/gemma-3-27b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

$0.08 in, $0.16 out / 1M

text-generation

google/gemma-3-4b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.05 in, $0.10 out / 1M

text-generation

gemma-4-31B-it-Ultra

google/gemma-4-31B-it-Ultra cover image

Ultra speed version of gemma-4-31B-it

$0.27 in, $0.76 out / 1M

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

deepseek-ai/DeepSeek-V4-Flash-0731 thinkingmachines/Inkling-Small google/nano-banana-2-lite google/nano-banana-2 google/nano-banana-pro

Featured Models

deepseek-ai/DeepSeek-V4-Flash Qwen/Qwen3-Max moonshotai/Kimi-K2.6 moonshotai/Kimi-K2.7-Code deepseek-ai/DeepSeek-V3.2

Built With Love in Palo Alto

© 2026 DeepInfra. All rights reserved.

Privacy Policy Terms of Service