We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-video

text-to-image

reranker

zero-shot-image-classification

multimodal

Category/all

fp16

Replaced

Austism/

chronos-hermes-13b-v2

text-generation

This offers the imaginative writing style of chronos while still retaining coherency and being capable. Outputs are long and utilize exceptional prose. Supports a maxium context length of 4096. The model follows the Alpaca prompt format.

512

$0.005 / Mtoken

BAAI/

bge-base-en-v1.5

embeddings

BGE embedding is a general Embedding Model. It is pre-trained using retromae and trained on large-scale pair data using contrastive learning. Note that the goal of pre-training is to reconstruct the text, and the pre-trained model cannot be used for similarity calculation directly, it needs to be fine-tuned

$0.010 / Mtoken

BAAI/

bge-en-icl

embeddings

A LLM-based embedding model with in-context learning capabilities that achieves SOTA performance on BEIR and AIR-Bench. It leverages few-shot examples to enhance task performance.

512

$0.010 / Mtoken

BAAI/

bge-large-en-v1.5

embeddings

fp32

$0.010 / Mtoken

BAAI/

bge-m3

embeddings

BGE-M3 is a versatile text embedding model that supports multi-functionality, multi-linguality, and multi-granularity, allowing it to perform dense retrieval, multi-vector retrieval, and sparse retrieval in over 100 languages and with input sizes up to 8192 tokens. The model can be used in a retrieval pipeline with hybrid retrieval and re-ranking to achieve higher accuracy and stronger generalization capabilities. BGE-M3 has shown state-of-the-art performance on several benchmarks, including MKQA, MLDR, and NarritiveQA, and can be used as a drop-in replacement for other embedding models like DPR and BGE-v1.5.

$0.010 / Mtoken

BAAI/

bge-m3-multi

embeddings

BGE-M3 is a multilingual text embedding model developed by BAAI, distinguished by its Multi-Linguality (supporting 100+ languages), Multi-Functionality (unified dense, multi-vector, and sparse retrieval), and Multi-Granularity (handling inputs from short queries to 8192-token documents). It achieves state-of-the-art retrieval performance across diverse benchmarks while maintaining a single model for multiple retrieval modes.

$1.200 / Mtoken

ByteDance/

SeeDance-T2V

text-to-video

Replaced

CompVis/

stable-diffusion-v1-4

text-to-image

Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input.

fp16

$0.06 / Mtoken

Gryphe/

MythoMax-L2-13b

text-generation

fp8

Replaced

Gryphe/

MythoMax-L2-13b-turbo

text-generation

Faster version of Gryphe/MythoMax-L2-13b running on multiple H100 cards in fp8 precision. Up to 160 tps.

KoboldAI/LLaMA2-13B-Tiefighter cover image

fp16

Replaced

KoboldAI/

LLaMA2-13B-Tiefighter

text-generation

LLaMA2-13B-Tiefighter is a highly creative and versatile language model, fine-tuned for storytelling, adventure, and conversational dialogue. It combines the strengths of multiple models and datasets, including retro-rodeo and choose-your-own-adventure, to generate engaging and imaginative content. With its ability to improvise and adapt to different styles and formats, Tiefighter is perfect for writers, creators, and anyone looking to spark their imagination.

NousResearch/Hermes-3-Llama-3.1-405B cover image

fp8

128k

$0.70/$0.80 in/out Mtoken

NousResearch/

Hermes-3-Llama-3.1-405B

text-generation

Hermes 3 is a cutting-edge language model that offers advanced capabilities in roleplaying, reasoning, and conversation. It's a fine-tuned version of the Llama-3.1 405B foundation model, designed to align with user needs and provide powerful control. Key features include reliable function calling, structured output, generalist assistant capabilities, and improved code generation. Hermes 3 is competitive with Llama-3.1 Instruct models, with its own strengths and weaknesses.

NousResearch/Hermes-3-Llama-3.1-70B cover image

fp8

128k

$0.10/$0.28 in/out Mtoken

NousResearch/

Hermes-3-Llama-3.1-70B

text-generation

Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.

NovaSky-AI/Sky-T1-32B-Preview cover image

fp16

32k

Deprecated

NovaSky-AI/

Sky-T1-32B-Preview

text-generation

This is a 32B reasoning model trained from Qwen2.5-32B-Instruct with 17K data. The performance is on par with o1-preview model on both math and coding.

Phind/Phind-CodeLlama-34B-v2 cover image

fp16

Replaced

Phind/

Phind-CodeLlama-34B-v2

text-generation

Phind-CodeLlama-34B-v2 is an open-source language model that has been fine-tuned on 1.5B tokens of high-quality programming-related data and achieved a pass@1 rate of 73.8% on HumanEval. It is multi-lingual and proficient in Python, C/C++, TypeScript, Java, and more. It has been trained on a proprietary dataset of instruction-answer pairs instead of code completion examples. The model is instruction-tuned on the Alpaca/Vicuna format to be steerable and easy-to-use. It accepts the Alpaca/Vicuna instruction format and can generate one completion for each prompt.

bfloat16

31k

Replaced

Qwen/

QVQ-72B-Preview

text-generation

QVQ-72B-Preview is an experimental research model developed by the Qwen team, focusing on enhancing visual reasoning capabilities. QVQ-72B-Preview has achieved remarkable performance on various benchmarks. It scored a remarkable 70.3% on the Multimodal Massive Multi-task Understanding (MMMU) benchmark

bfloat16

32k

Replaced

Qwen/

QwQ-32B-Preview

text-generation

QwQ is an experimental research model developed by the Qwen Team, designed to advance AI reasoning capabilities. This model embodies the spirit of philosophical inquiry, approaching problems with genuine wonder and doubt. QwQ demonstrates impressive analytical abilities, achieving scores of 65.2% on GPQA, 50.0% on AIME, 90.6% on MATH-500, and 50.0% on LiveCodeBench. With its contemplative approach and exceptional performance on complex problems.