We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

automatic-speech-recognition

text-to-speech

embeddings

text-to-video

text-to-image

reranker

zero-shot-image-classification

multimodal

Category/all

Replaced

openai/

whisper-small.en

automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labelled data without the need for fine-tuning. It is a Transformer based encoder-decoder model, trained on either English-only or multilingual data, and is available in five configurations of varying model sizes. The models were trained on the tasks of speech recognition and speech translation, predicting transcriptions in the same or different languages as the audio.

Replaced

openai/

whisper-timestamped-medium

automatic-speech-recognition

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This version has implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

Replaced

openai/

whisper-timestamped-medium.en

automatic-speech-recognition

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This variant contains implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

Replaced

openai/

whisper-tiny.en

automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labeled data without fine-tuning. It's a Transformer based encoder-decoder model, trained on English-only or multilingual data, predicting transcriptions in the same or different language as the audio. Whisper checkpoints come in five configurations of varying model sizes.

Replaced

openbmb/

MiniCPM-Llama3-V-2_5

text-generation

bfloat16

Replaced

openchat/

openchat-3.6-8b

text-generation

Openchat 3.6 is a LLama-3-8b fine tune that outperforms it on multiple benchmarks.

fp16

Replaced

openchat/

openchat_3.5

text-generation

OpenChat is a library of open-source language models that have been fine-tuned with C-RLFT, a strategy inspired by offline reinforcement learning. These models can learn from mixed-quality data without preference labels and have achieved exceptional performance comparable to ChatGPT. The developers of OpenChat are dedicated to creating a high-performance, commercially viable, open-source large language model and are continuously making progress towards this goal.

Replaced

run-diffusion/

Juggernaut-Flux

text-to-image

A drop-in replacement for Flux [Dev] that delivers sharper details, richer colors, and enhanced realism, while instantly boosting LoRAs and LyCORIS with full compatibility.

Replaced

run-diffusion/

Juggernaut-Lightning-Flux

text-to-image

Blazing-fast, high-quality images rendered at five times the speed of Flux. Perfect for mood boards and mass ideation, this model excels in both realism and prompt adherence.

Replaced

runwayml/

stable-diffusion-v1-5

text-to-image

Most widely used version of Stable Diffusion. Trained on 512x512 images, it can generate realistic images given text description

512

$0.005 / Mtoken

sentence-transformers/

all-MiniLM-L12-v2

embeddings

We present a sentence transformation model that generates semantically similar sentences. Our model is based on the Sentence-Transformers architecture and was trained on a large dataset of sentence pairs. We evaluate the effectiveness of our model by measuring its ability to generate similar sentences that are close to the original sentence in meaning.

512

$0.005 / Mtoken

sentence-transformers/

all-MiniLM-L6-v2

embeddings

We present a sentence transformation model that achieves state-of-the-art results on various NLP tasks without requiring task-specific architectures or fine-tuning. Our approach leverages contrastive learning and utilizes a variety of datasets to learn robust sentence representations. We evaluate our model on several benchmarks and demonstrate its effectiveness in various applications such as text classification, sentiment analysis, named entity recognition, and question answering.

512

$0.005 / Mtoken

sentence-transformers/

all-mpnet-base-v2

embeddings

A sentence transformation model that has been trained on a wide range of datasets, including but not limited to S2ORC, WikiAnwers, PAQ, Stack Exchange, and Yahoo! Answers. Our model can be used for various NLP tasks such as clustering, sentiment analysis, and question answering.

$0.005 / Mtoken

sentence-transformers/

clip-ViT-B-32

embeddings

The CLIP model maps text and images to a shared vector space, enabling various applications such as image search, zero-shot image classification, and image clustering. The model can be used easily after installation, and its performance is demonstrated through zero-shot ImageNet validation set accuracy scores. Multilingual versions of the model are also available for 50+ languages.

512

$0.005 / Mtoken

sentence-transformers/

clip-ViT-B-32-multilingual-v1

embeddings

This model is a multilingual version of the OpenAI CLIP-ViT-B32 model, which maps text and images to a common dense vector space. It includes a text embedding model that works for 50+ languages and an image encoder from CLIP. The model was trained using Multilingual Knowledge Distillation, where a multilingual DistilBERT model was trained as a student model to align the vector space of the original CLIP image encoder across many languages.

512

$0.005 / Mtoken

sentence-transformers/

multi-qa-mpnet-base-dot-v1

embeddings

We present a sentence transformation model that maps sentences and paragraphs to a 768-dimensional dense vector space, suitable for semantic search tasks. The model is trained on 215 million question-answer pairs from various sources, including WikiAnswers, PAQ, Stack Exchange, MS MARCO, GOOAQ, Amazon QA, Yahoo Answers, Search QA, ELI5, and Natural Questions. Our model uses a contrastive learning objective.

512

$0.005 / Mtoken

sentence-transformers/

paraphrase-MiniLM-L6-v2

embeddings

We present a sentence similarity model based on the Sentence Transformers architecture, which maps sentences to a 384-dimensional dense vector space. The model uses a pre-trained BERT encoder and applies mean pooling on top of the contextualized word embeddings to obtain sentence embeddings. We evaluate the model on the Sentence Embeddings Benchmark.

512

$0.005 / Mtoken

shibing624/

text2vec-base-chinese

embeddings

A sentence similarity model that can be used for various NLP tasks such as text classification, sentiment analysis, named entity recognition, question answering, and more. It utilizes the CoSENT architecture, which consists of a transformer encoder and a pooling module, to encode input texts into vectors that capture their semantic meaning. The model was trained on the nli_zh dataset and achieved high performance on various benchmark datasets.

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.

Contact Sales Get Started

Latest Models

openchat/

openchat_3.5

Phind/

Phind-CodeLlama-34B-v2

bigcode/

starcoder2-15b

Gryphe/

MythoMax-L2-13b

openai/

whisper-tiny

Featured Models

mistralai/

Voxtral-Mini-3B-2507

Qwen/

Qwen3-Coder-480B-A35B-Instruct-Turbo

meta-llama/

Llama-4-Maverick-17B-128E-Instruct-FP8

deepseek-ai/

DeepSeek-V3-0324

meta-llama/

Llama-3.3-70B-Instruct

google/

gemini-2.5-flash

Company

Pricing

Docs

Compare

DeepStart

About

Careers

Trust Center

Privacy

Terms

Have questions or need a custom solution?

Contact Sales