We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 cover image
featured
fp8
1024k
$0.15/$0.60 in/out Mtoken
  • text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Maverick, a 17 billion parameter model with 128 experts

meta-llama/Llama-4-Scout-17B-16E-Instruct cover image
featured
bfloat16
320k
$0.08/$0.30 in/out Mtoken
  • text-generation

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding. Llama 4 Scout, a 17 billion parameter model with 16 experts

deepseek-ai/DeepSeek-R1-0528 cover image
featured
fp4
160k
$0.50/$2.15 in/out Mtoken
  • text-generation

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528.

deepseek-ai/DeepSeek-V3-0324 cover image
featured
fp4
160k
$0.28/$0.88 in/out Mtoken
  • text-generation

DeepSeek-V3-0324, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token, an improved iteration over DeepSeek-V3.

mistralai/Devstral-Small-2507 cover image
featured
fp8
125k
$0.07/$0.28 in/out Mtoken
  • text-generation

Devstral is an agentic LLM for software engineering tasks, making it a great choice for software engineering agents.

mistralai/Mistral-Small-3.2-24B-Instruct-2506 cover image
featured
fp8
125k
$0.05/$0.10 in/out Mtoken
  • text-generation

Mistral-Small-3.2-24B-Instruct is a drop-in upgrade over the 3.1 release, with markedly better instruction following, roughly half the infinite-generation errors, and a more robust function-calling interface—while otherwise matching or slightly improving on all previous text and vision benchmarks.

meta-llama/Llama-Guard-4-12B cover image
featured
bfloat16
160k
$0.05 / Mtoken
  • text-generation

Llama Guard 4 is a natively multimodal safety classifier with 12 billion parameters trained jointly on text and multiple images. Llama Guard 4 is a dense architecture pruned from the Llama 4 Scout pre-trained model and fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It itself acts as an LLM: it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

Qwen/QwQ-32B cover image
featured
bfloat16
128k
$0.075/$0.15 in/out Mtoken
  • text-generation

QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

anthropic/claude-4-opus cover image
featured
195k
$16.50/$82.50 in/out Mtoken
  • text-generation

Anthropic’s most powerful model yet and the state-of-the-art coding model. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Claude Opus 4 is ideal for powering frontier agent products and features.

anthropic/claude-4-sonnet cover image
featured
195k
$3.30/$16.50 in/out Mtoken
  • text-generation

Anthropic's mid-size model with superior intelligence for high-volume uses in coding, in-depth research, agents, & more.

google/gemini-2.5-flash cover image
featured
976k
$0.21/$1.75 in/out Mtoken
  • text-generation

Gemini 2.5 Flash is Google's latest thinking model, designed to tackle increasingly complex problems. It's capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. Gemini 2.5 Flash: best for balancing reasoning and speed.

google/gemini-2.5-pro cover image
featured
976k
$0.875/$7.00 in/out Mtoken
  • text-generation

Gemini 2.5 Pro is Google's the most advanced thinking model, designed to tackle increasingly complex problems. Gemini 2.5 Pro leads common benchmarks by meaningful margins and showcases strong reasoning and code capabilities. Gemini 2.5 models are thinking models, capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy. The Gemini 2.5 Pro model is now available on DeepInfra.

google/gemma-3-27b-it cover image
featured
bfloat16
128k
$0.09/$0.17 in/out Mtoken
  • text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3 27B is Google's latest open source model, successor to Gemma 2

google/gemma-3-12b-it cover image
featured
bfloat16
128k
$0.05/$0.10 in/out Mtoken
  • text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

google/gemma-3-4b-it cover image
featured
bfloat16
128k
$0.02/$0.04 in/out Mtoken
  • text-generation

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling. Gemma 3-12B is Google's latest open source model, successor to Gemma 2

hexgrad/Kokoro-82M cover image
featured
$0.62 per M characters
  • text-to-speech

Kokoro is an open-weight TTS model with 82 million parameters. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

canopylabs/orpheus-3b-0.1-ft cover image
featured
$7.00 per M characters
  • text-to-speech

Orpheus TTS is a state-of-the-art, Llama-based Speech-LLM designed for high-quality, empathetic text-to-speech generation. This model has been finetuned to deliver human-level speech synthesis, achieving exceptional clarity, expressiveness, and real-time streaming performances.

sesame/csm-1b cover image
featured
$7.00 per M characters
  • text-to-speech

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.