Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

openai/whisper-medium cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labeled data and demonstrates strong abilities to generalize to various datasets and domains without fine-tuning. The model is based on a Transformer encoder-decoder architecture.

openai/whisper-medium.en cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains without fine-tuning. The primary intended users of these models are AI researchers studying robustness, generalisation, and capabilities of the current model.

openai/whisper-small cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without the need for fine-tuning. The model is based on a Transformer architecture and uses a large-scale weak supervision technique.

openai/whisper-small.en cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labelled data without the need for fine-tuning. It is a Transformer based encoder-decoder model, trained on either English-only or multilingual data, and is available in five configurations of varying model sizes. The models were trained on the tasks of speech recognition and speech translation, predicting transcriptions in the same or different languages as the audio.

openai/whisper-timestamped-medium cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This version has implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

openai/whisper-timestamped-medium.en cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This variant contains implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models.

openai/whisper-tiny cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. It was trained on 680k hours of labelled data and demonstrates a strong ability to generalize to many datasets and domains without fine-tuning. Whisper is a Transformer-based encoder-decoder model trained on English-only or multilingual data. The English-only models were trained on speech recognition, while the multilingual models were trained on both speech recognition and machine translation.

openai/whisper-tiny.en cover image
$0.0005 / sec
  • automatic-speech-recognition

Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation, trained on 680k hours of labeled data without fine-tuning. It's a Transformer based encoder-decoder model, trained on English-only or multilingual data, predicting transcriptions in the same or different language as the audio. Whisper checkpoints come in five configurations of varying model sizes.

rajpurkarlab/gilbert cover image
$0.0005 / sec
  • token-classification

A model for removing references to priors from radiology reports based on a fine-tuned BioBERT model.

roberta-base cover image
$0.0005 / sec
  • fill-mask

The RoBERTa model was pretrained on a dataset created by combining several sources including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. It uses a tokenization scheme with a vocabulary size of 50,000 and replaces 15% of the tokens with either a special masking token or a random token. The model achieved impressive results when fine-tuned on various downstream NLP tasks, outperforming its predecessor BERT in many areas.

roberta-large cover image
$0.0005 / sec
  • fill-mask

The RoBERTa model was pre-trained on a dataset consisting of 11,038 books, English Wikipedia, 63 million news articles, and a dataset containing a subset of Common Crawl data. It achieved state-of-the-art results on Glue, SuperGLUE, and multi-task benchmarks while exhibiting less sensitivity to hyperparameter tuning compared to BERT. RoBERTa uses a robust optimization approach and dynamic masking, which changes during pre-training, unlike BERT.

sberbank-ai/ruRoberta-large cover image
$0.0005 / sec
  • fill-mask

The ruRoberta-large model was trained by the SberDevices team for mask filling tasks using encoders and BBPE tokenizers. It has 355 million parameters and was trained on 250GB of data. The NLP Core Team RnD, including Dmitry Zmitrovich, contributed to its development.

sentence-transformers/all-MiniLM-L12-v2 cover image
512
$0.005 / Mtoken
  • embeddings

We present a sentence transformation model that generates semantically similar sentences. Our model is based on the Sentence-Transformers architecture and was trained on a large dataset of sentence pairs. We evaluate the effectiveness of our model by measuring its ability to generate similar sentences that are close to the original sentence in meaning.

sentence-transformers/all-MiniLM-L6-v2 cover image
512
$0.005 / Mtoken
  • embeddings

We present a sentence transformation model that achieves state-of-the-art results on various NLP tasks without requiring task-specific architectures or fine-tuning. Our approach leverages contrastive learning and utilizes a variety of datasets to learn robust sentence representations. We evaluate our model on several benchmarks and demonstrate its effectiveness in various applications such as text classification, sentiment analysis, named entity recognition, and question answering.

sentence-transformers/all-mpnet-base-v2 cover image
512
$0.005 / Mtoken
  • embeddings

A sentence transformation model that has been trained on a wide range of datasets, including but not limited to S2ORC, WikiAnwers, PAQ, Stack Exchange, and Yahoo! Answers. Our model can be used for various NLP tasks such as clustering, sentiment analysis, and question answering.

sentence-transformers/clip-ViT-B-32 cover image
512
$0.005 / Mtoken
  • embeddings

The CLIP model maps text and images to a shared vector space, enabling various applications such as image search, zero-shot image classification, and image clustering. The model can be used easily after installation, and its performance is demonstrated through zero-shot ImageNet validation set accuracy scores. Multilingual versions of the model are also available for 50+ languages.

sentence-transformers/clip-ViT-B-32-multilingual-v1 cover image
512
$0.005 / Mtoken
  • embeddings

This model is a multilingual version of the OpenAI CLIP-ViT-B32 model, which maps text and images to a common dense vector space. It includes a text embedding model that works for 50+ languages and an image encoder from CLIP. The model was trained using Multilingual Knowledge Distillation, where a multilingual DistilBERT model was trained as a student model to align the vector space of the original CLIP image encoder across many languages.

sentence-transformers/multi-qa-mpnet-base-dot-v1 cover image
512
$0.005 / Mtoken
  • embeddings

We present a sentence transformation model that maps sentences and paragraphs to a 768-dimensional dense vector space, suitable for semantic search tasks. The model is trained on 215 million question-answer pairs from various sources, including WikiAnswers, PAQ, Stack Exchange, MS MARCO, GOOAQ, Amazon QA, Yahoo Answers, Search QA, ELI5, and Natural Questions. Our model uses a contrastive learning objective.