Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

openai/whisper-medium cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labeled data and demonstrates strong abilities to generalize to various datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture.

openai/whisper-medium.en cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without fine-tuning. The primary intended users of these models are AI researchers studying robustness, generalisation, and capabilities of the current model.

openai/whisper-small cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without the need for fine-tuning. The model is based on a Transformer architecture and uses a large-scale weak supervision technique.

openai/whisper-small.en cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labelled data without the need for fine-tuning. It is a Transformer based encoder-decoder model, trained on either English-only or multilingual data, and is available in five configurations of varying model sizes. The models were trained on the tasks of speech recognition and speech translation, predicting transcriptions in the same or different languages as the audio.

openai/whisper-timestamped-medium cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This version has implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

openai/whisper-timestamped-medium.en cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This variant contains implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

openai/whisper-tiny cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. Whisper is a Transformer-based encoder-decoder model trained on English-only or multilingual data. The English-only models were trained on speech recognition, while the multilingual models were trained on both speech recognition and machine translation.

openai/whisper-tiny.en cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labeled data without fine-tuning. It's a Transformer based encoder-decoder model, trained on English-only or multilingual data, predicting transcriptions in the same or different language as the audio. Whisper checkpoints come in five configurations of varying model sizes.

openchat/openchat_3.5 cover image
fp16
8k
$0.07 / Mtoken
  • text-generation

OpenChat is a library of open-source language models that have been fine-tuned with C-RLFT, a strategy inspired by offline reinforcement learning. These models can learn from mixed-quality data without preference labels and have achieved exceptional performance comparable to ChatGPT. The developers of OpenChat are dedicated to creating a high-performance, commercially viable, open-source large language model and are continuously making progress towards this goal.

prompthero/openjourney cover image
$0.0005 / sec
  • text-to-image

Text to image model based on Stable Diffusion.

runwayml/stable-diffusion-v1-5 cover image
$0.0005 / sec
  • text-to-image

Most widely used version of Stable Diffusion. Trained on 512x512 images, it can generate realistic images given text description

sentence-transformers/all-MiniLM-L12-v2 cover image
512
$0.005 / Mtoken
  • embeddings

We present a sentence transformation model that generates semantically similar sentences. Our model is based on the Sentence-Transformers architecture and was trained on a large dataset of sentence pairs. We evaluate the effectiveness of our model by measuring its ability to generate similar sentences that are close to the original sentence in meaning.

sentence-transformers/all-MiniLM-L6-v2 cover image
512
$0.005 / Mtoken
  • embeddings

We present a sentence transformation model that achieves state-of-the-art results on various NLP tasks without requiring task-specific architectures or fine-tuning. Our approach leverages contrastive learning and utilizes a variety of datasets to learn robust sentence representations. We evaluate our model on several benchmarks and demonstrate its effectiveness in various applications such as text classification, sentiment analysis, named entity recognition, and question answering.

sentence-transformers/all-mpnet-base-v2 cover image
512
$0.005 / Mtoken
  • embeddings

A sentence transformation model that has been trained on a wide range of datasets, including but not limited to S2ORC, WikiAnwers, PAQ, Stack Exchange, and Yahoo! Answers. Our model can be used for various NLP tasks such as clustering, sentiment analysis, and question answering.

sentence-transformers/clip-ViT-B-32 cover image
512
$0.005 / Mtoken
  • embeddings

The CLIP model maps text and images to a shared vector space, enabling various applications such as image search, zero-shot image classification, and image clustering. The model can be used easily after installation, and its performance is demonstrated through zero-shot ImageNet validation set accuracy scores. Multilingual versions of the model are also available for 50+ languages.

sentence-transformers/clip-ViT-B-32-multilingual-v1 cover image
512
$0.005 / Mtoken
  • embeddings

This model is a multilingual version of the OpenAI CLIP-ViT-B32 model, which maps text and images to a common dense vector space. It includes a text embedding model that works for 50+ languages and an image encoder from CLIP. The model was trained using Multilingual Knowledge Distillation, where a multilingual DistilBERT model was trained as a student model to align the vector space of the original CLIP image encoder across many languages.

sentence-transformers/multi-qa-mpnet-base-dot-v1 cover image
512
$0.005 / Mtoken
  • embeddings

We present a sentence transformation model that maps sentences and paragraphs to a 768-dimensional dense vector space, suitable for semantic search tasks. The model is trained on 215 million question-answer pairs from various sources, including WikiAnswers, PAQ, Stack Exchange, MS MARCO, GOOAQ, Amazon QA, Yahoo Answers, Search QA, ELI5, and Natural Questions. Our model uses a contrastive learning objective.

sentence-transformers/paraphrase-MiniLM-L6-v2 cover image
512
$0.005 / Mtoken
  • embeddings

We present a sentence similarity model based on the Sentence Transformers architecture, which maps sentences to a 384-dimensional dense vector space. The model uses a pre-trained BERT encoder and applies mean pooling on top of the contextualized word embeddings to obtain sentence embeddings. We evaluate the model on the Sentence Embeddings Benchmark.