Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

deepseek-ai/DeepSeek-R1 cover image
featured
64k
$0.75/$2.40 in/out Mtoken
  • text-generation

We introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

deepseek-ai/DeepSeek-R1-Distill-Llama-70B cover image
featured
bfloat16
128k
$0.23/$0.69 in/out Mtoken
  • text-generation

DeepSeek-R1-Distill-Llama-70B is a highly efficient language model that leverages knowledge distillation to achieve state-of-the-art performance. This model distills the reasoning patterns of larger models into a smaller, more agile architecture, resulting in exceptional results on benchmarks like AIME 2024, MATH-500, and LiveCodeBench. With 70 billion parameters, DeepSeek-R1-Distill-Llama-70B offers a unique balance of accuracy and efficiency, making it an ideal choice for a wide range of natural language processing tasks.

deepseek-ai/DeepSeek-V3 cover image
featured
64k
$0.49/$0.89 in/out Mtoken
  • text-generation

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

meta-llama/Llama-3.3-70B-Instruct-Turbo cover image
featured
fp8
128k
$0.12/$0.30 in/out Mtoken
  • text-generation

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

meta-llama/Llama-3.3-70B-Instruct cover image
featured
bfloat16
128k
$0.23/$0.40 in/out Mtoken
  • text-generation

Llama 3.3-70B is a multilingual LLM trained on a massive dataset of 15 trillion tokens, fine-tuned for instruction-following and conversational dialogue. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

mistralai/Mistral-Small-24B-Instruct-2501 cover image
featured
fp8
32k
$0.07/$0.14 in/out Mtoken
  • text-generation

Mistral Small 3 is a 24B-parameter language model optimized for low-latency performance across common AI tasks. Released under the Apache 2.0 license, it features both pre-trained and instruction-tuned versions designed for efficient local deployment. The model achieves 81% accuracy on the MMLU benchmark and performs competitively with larger models like Llama 3.3 70B and Qwen 32B, while operating at three times the speed on equivalent hardware.

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B cover image
featured
fp8
128k
$0.12/$0.18 in/out Mtoken
  • text-generation

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on Qwen 2.5 32B, using outputs from DeepSeek R1. It outperforms OpenAI's o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. Other benchmark results include: AIME 2024: 72.6 | MATH-500: 94.3 | CodeForces Rating: 1691.

microsoft/phi-4 cover image
featured
bfloat16
16k
$0.07/$0.14 in/out Mtoken
  • text-generation

Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

meta-llama/Meta-Llama-3.1-70B-Instruct cover image
featured
bfloat16
128k
$0.23/$0.40 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-8B-Instruct cover image
featured
bfloat16
128k
$0.03/$0.05 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-405B-Instruct cover image
featured
fp8
32k
$0.80 / Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo cover image
featured
fp8
128k
$0.02/$0.05 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo cover image
featured
fp8
128k
$0.12/$0.30 in/out Mtoken
  • text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

Qwen/Qwen2.5-Coder-32B-Instruct cover image
featured
fp8
32k
$0.07/$0.16 in/out Mtoken
  • text-generation

Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). It has significant improvements in code generation, code reasoning and code fixing. A more comprehensive foundation for real-world applications such as Code Agents. Not only enhancing coding capabilities but also maintaining its strengths in mathematics and general competencies.

nvidia/Llama-3.1-Nemotron-70B-Instruct cover image
featured
fp8
128k
$0.12/$0.30 in/out Mtoken
  • text-generation

Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA to improve the helpfulness of LLM generated responses to user queries. This model reaches Arena Hard of 85.0, AlpacaEval 2 LC of 57.6 and GPT-4-Turbo MT-Bench of 8.98, which are known to be predictive of LMSys Chatbot Arena Elo. As of 16th Oct 2024, this model is #1 on all three automatic alignment benchmarks (verified tab for AlpacaEval 2 LC), edging out strong frontier models such as GPT-4o and Claude 3.5 Sonnet.

Qwen/Qwen2.5-72B-Instruct cover image
featured
fp8
32k
$0.13/$0.40 in/out Mtoken
  • text-generation

Qwen2.5 is a model pretrained on a large-scale dataset of up to 18 trillion tokens, offering significant improvements in knowledge, coding, mathematics, and instruction following compared to its predecessor Qwen2. The model also features enhanced capabilities in generating long texts, understanding structured data, and generating structured outputs, while supporting multilingual capabilities for over 29 languages.

meta-llama/Llama-3.2-90B-Vision-Instruct cover image
featured
bfloat16
32k
$0.35/$0.40 in/out Mtoken
  • text-generation

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.

meta-llama/Llama-3.2-11B-Vision-Instruct cover image
featured
bfloat16
128k
$0.055 / Mtoken
  • text-generation

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.