We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

text-generation

automatic-speech-recognition

zero-shot-image-classification

featured

text-generation

DeepSeek-R1-0528

deepseek-ai/DeepSeek-R1-0528 cover image

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528.

$0.40 cached, $0.50 in, $2.15 out / 1M

featured

text-generation

DeepSeek-V3-0324

deepseek-ai/DeepSeek-V3-0324 cover image

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3.

$0.25 in, $0.88 out / 1M

featured

text-generation

Mistral-Small-3.2-24B-Instruct-2506

mistralai/Mistral-Small-3.2-24B-Instruct-2506 cover image

Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the infinite-generation errors, and a more robust function-calling interface—while otherwise matching or slightly improving on all previous text and vision benchmarks.

$0.075 in, $0.20 out / 1M

featured

text-generation

anthropic/claude-4-opus cover image

Anthropic’s most powerful model yet and the state-of-the-art coding model. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4 is ideal for powering frontier agent products and features.

$16.50 in, $82.50 out / 1M

featured

text-generation

claude-4-sonnet

anthropic/claude-4-sonnet cover image

Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more.

$3.30 in, $16.50 out / 1M

featured

text-generation

gemini-2.5-flash

google/gemini-2.5-flash cover image

Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

$0.30 in, $2.50 out / 1M

featured

text-generation

google/gemini-2.5-pro cover image

Gemini 2.5 Pro is Google's the most advanced thinking model, designed to tackle increasingly complex problems. Gemini 2.5 Pro leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities. Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. The Gemini 2.5 Pro model is now available on DeepInfra.

$1.25 in, $10.00 out / 1M

featured

text-generation

google/gemma-3-27b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

$0.09 in, $0.16 out / 1M

featured

text-generation

google/gemma-3-12b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.04 in, $0.13 out / 1M

featured

text-generation

google/gemma-3-4b-it cover image

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

$0.04 in, $0.08 out / 1M

featured

hexgrad/Kokoro-82M cover image

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

$0.62 per 1M characters

featured

orpheus-3b-0.1-ft

canopylabs/orpheus-3b-0.1-ft cover image

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

$7.00 per 1M characters

featured

sesame/csm-1b cover image

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

$7.00 per 1M characters

featured

text-generation

DeepSeek-R1-Distill-Llama-70B

deepseek-ai/DeepSeek-R1-Distill-Llama-70B cover image

DeepSeek-R1-Distill-Llama-70B is a highly efficient language model that leverages knowledge distillation to achieve state-of-the-art performance. This model distills the reasoning patterns of larger models into a smaller, more agile architecture, resulting in exceptional results on benchmarks like AIME 2024, MATH-500, and LiveCodeBench. With 70 billion parameters, DeepSeek-R1-Distill-Llama-70B offers a unique balance of accuracy and efficiency, making it an ideal choice for a wide range of natural language processing tasks.

$0.50 in, $1.00 out / 1M

featured

text-generation

deepseek-ai/DeepSeek-V3 cover image

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

$0.38 in, $0.89 out / 1M

featured

text-generation

Llama-3.3-70B-Instruct-Turbo

meta-llama/Llama-3.3-70B-Instruct-Turbo cover image

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

$0.13 in, $0.38 out / 1M

featured

text-generation

microsoft/phi-4 cover image

Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

$0.07 in, $0.14 out / 1M

featured

speech-recognition

whisper-large-v3-turbo

openai/whisper-large-v3-turbo cover image

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

$0.00020 / minute

SOC 2 Certified

ISO 27001 Certified

Have questions or need a custom solution?

Company

Latest Models

deepseek-ai/DeepSeek-V3.1 zai-org/GLM-4.6 deepseek-ai/DeepSeek-V3.2-Exp moonshotai/Kimi-K2-Instruct-0905 anthropic/claude-3-7-sonnet-latest

Featured Models

deepseek-ai/DeepSeek-V3.2-Exp openai/whisper-large-v3-turbo deepseek-ai/DeepSeek-R1-0528-Turbo deepseek-ai/DeepSeek-V3.1 moonshotai/Kimi-K2-Instruct-0905

Built With Love in Palo Alto

© 2025 Deep Infra. All rights reserved.

Privacy Policy Terms of Service