We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-video

text-to-image

reranker

zero-shot-image-classification

multimodal

Category/featured

Our most popular AI models used by thousands of users in their apps and research. What will you create today?

$7.00 per M characters

sesame/

csm-1b

text-to-speech

CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs. The model architecture employs a Llama backbone and a smaller audio decoder that produces Mimi audio codes.

$0.10/$0.40 in/out Mtoken

deepseek-ai/

DeepSeek-R1-Distill-Llama-70B

text-generation

DeepSeek-R1-Distill-Llama-70B is a highly efficient language model that leverages knowledge distillation to achieve state-of-the-art performance. This model distills the reasoning patterns of larger models into a smaller, more agile architecture, resulting in exceptional results on benchmarks like AIME 2024, MATH-500, and LiveCodeBench. With 70 billion parameters, DeepSeek-R1-Distill-Llama-70B offers a unique balance of accuracy and efficiency, making it an ideal choice for a wide range of natural language processing tasks.

$0.38/$0.89 in/out Mtoken

deepseek-ai/

DeepSeek-V3

text-generation

DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2.

$0.038/$0.12 in/out Mtoken

meta-llama/

Llama-3.3-70B-Instruct-Turbo

text-generation

Llama 3.3-70B Turbo is a highly optimized version of the Llama 3.3-70B model, utilizing FP8 quantization to deliver significantly faster inference speeds with a minor trade-off in accuracy. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

$0.23/$0.40 in/out Mtoken

meta-llama/

Llama-3.3-70B-Instruct

text-generation

Llama 3.3-70B is a multilingual LLM trained on a massive dataset of 15 trillion tokens, fine-tuned for instruction-following and conversational dialogue. The model is designed to be helpful, safe, and flexible, with a focus on responsible deployment and mitigating potential risks such as bias, toxicity, and misinformation. It achieves state-of-the-art performance on various benchmarks, including conversational tasks, language translation, and text generation.

$0.07/$0.14 in/out Mtoken

microsoft/

phi-4

text-generation

Phi-4 is a model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.

$0.00020 / minute

openai/

whisper-large-v3-turbo

automatic-speech-recognition

Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

bigcode/

starcoder2-15b

openai/

whisper-tiny

Phind/

Phind-CodeLlama-34B-v2

openchat/

openchat_3.5

Gryphe/

MythoMax-L2-13b

Featured Models

mistralai/

Voxtral-Mini-3B-2507

openai/

gpt-oss-120b

deepseek-ai/

DeepSeek-R1-0528-Turbo

deepseek-ai/

DeepSeek-V3-0324-Turbo

zai-org/

GLM-4.5-Air

google/

gemini-2.5-flash

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales