We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-video

text-to-image

reranker

zero-shot-image-classification

multimodal

Category/all

bfloat16

Replaced

google/

gemma-2-9b-it

text-generation

Gemma is a family of lightweight, state-of-the-art open models from Google. The 9B Gemma 2 model delivers class-leading performance, outperforming Llama 3 8B and other open models in its size category.

512

$0.005 / Mtoken

intfloat/

e5-base-v2

embeddings

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Model has 24 layers and 1024 out dim.

512

$0.010 / Mtoken

intfloat/

e5-large-v2

embeddings

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Model has 24 layers and 1024 out dim.

fp32

512

$0.010 / Mtoken

intfloat/

multilingual-e5-large

embeddings

The Multilingual-E5-large model is a 24-layer text embedding model with an embedding size of 1024, trained on a mixture of multilingual datasets and supporting 100 languages.

512

$0.0005 / sec

intfloat/

multilingual-e5-large-instruct

embeddings

The Multilingual-E5 models, initialized from XLM-RoBERTa, support up to 512 tokens per input — any longer text will be silently truncated. To ensure optimal performance, always prefix inputs with “query:” or “passage:”, as the model was explicitly trained with this format.

fp16

Replaced

lizpreciatior/

lzlv_70b_fp16_hf

text-generation

A Mythomax/MLewd_13B-style merge of selected 70B models A multi-model merge of several LLaMA2 70B finetunes for roleplaying and creative work. The goal was to create a model that combines creativity with intelligence for an enhanced experience.

mattshumer/Reflection-Llama-3.1-70B cover image

bfloat16

Replaced

mattshumer/

Reflection-Llama-3.1-70B

text-generation

Reflection Llama-3.1 70B is trained with a new technique called Reflection-Tuning that teaches a LLM to detect mistakes in its reasoning and correct course. The model was trained on synthetic data.

meta-llama/Llama-2-13b-chat-hf cover image

fp16

Replaced

meta-llama/

Llama-2-13b-chat-hf

text-generation

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

meta-llama/Llama-2-70b-chat-hf cover image

fp16

Replaced

meta-llama/

Llama-2-70b-chat-hf

text-generation

LLaMa 2 is a collections of LLMs trained by Meta. This is the 70B chat optimized version. This endpoint has per token pricing.

meta-llama/Llama-3.2-11B-Vision-Instruct cover image

bfloat16

128k

$$0.00 / Mtoken

meta-llama/

Llama-3.2-11B-Vision-Instruct

text-generation

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and visual question answering, bridging the gap between language generation and visual reasoning. Pre-trained on a massive dataset of image-text pairs, it performs well in complex, high-accuracy image analysis. Its ability to integrate visual understanding with language processing makes it an ideal solution for industries requiring comprehensive visual-linguistic AI applications, such as content creation, AI-driven customer service, and research.

meta-llama/Llama-3.2-1B-Instruct cover image

bfloat16

128k

Replaced

meta-llama/

Llama-3.2-1B-Instruct

text-generation

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out).

meta-llama/Llama-3.2-3B-Instruct cover image

bfloat16

128k

$0.012/$0.024 in/out Mtoken

meta-llama/

Llama-3.2-3B-Instruct

text-generation

The Meta Llama 3.2 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction-tuned generative models in 1B and 3B sizes (text in/text out)

meta-llama/Llama-3.2-90B-Vision-Instruct cover image

bfloat16

32k

$0.35/$0.40 in/out Mtoken

meta-llama/

Llama-3.2-90B-Vision-Instruct

text-generation

The Llama 90B Vision model is a top-tier, 90-billion-parameter multimodal model designed for the most challenging visual reasoning and language tasks. It offers unparalleled accuracy in image captioning, visual question answering, and advanced image-text comprehension. Pre-trained on vast multimodal datasets and fine-tuned with human feedback, the Llama 90B Vision is engineered to handle the most demanding image-based AI tasks. This model is perfect for industries requiring cutting-edge multimodal AI capabilities, particularly those dealing with complex, real-time visual and textual analysis.

bfloat16

128k

$$0.001 / Mtoken

meta-llama/

Llama-Guard-3-8B

text-generation

Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

meta-llama/Meta-Llama-3-70B-Instruct cover image

bfloat16

$0.30/$0.40 in/out Mtoken

meta-llama/

Meta-Llama-3-70B-Instruct

text-generation

Model Details Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.

meta-llama/Meta-Llama-3-8B-Instruct cover image

bfloat16

$0.03/$0.06 in/out Mtoken

meta-llama/

Meta-Llama-3-8B-Instruct

text-generation

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.

meta-llama/Meta-Llama-3.1-405B-Instruct cover image

fp8

32k

Replaced

meta-llama/

Meta-Llama-3.1-405B-Instruct

text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

meta-llama/Meta-Llama-3.1-70B-Instruct cover image

bfloat16

128k

$0.23/$0.40 in/out Mtoken

meta-llama/

Meta-Llama-3.1-70B-Instruct

text-generation

Meta developed and released the Meta Llama 3.1 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8B, 70B and 405B sizes

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

Phind/

Phind-CodeLlama-34B-v2

openchat/

openchat_3.5

openai/

whisper-tiny

bigcode/

starcoder2-15b

Gryphe/

MythoMax-L2-13b

Featured Models

mistralai/

Mistral-Small-3.2-24B-Instruct-2506

mistralai/

Voxtral-Mini-3B-2507

sesame/

csm-1b

google/

gemini-2.5-pro

deepseek-ai/

DeepSeek-V3

Qwen/

QwQ-32B

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales