Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

bert-large-cased cover image
$0.0005 / sec
  • fill-mask

A transformer-based language model pre-trained on a large corpus of English data using a masked language modeling objective. It was introduced in a research paper by Google researchers and achieved state-of-the-art results on various natural language processing tasks. The model is cased, meaning it differentiates between English and english, and has a configuration of 24 layers, 1024 hidden dimensions, 16 attention heads, and 336M parameters.

bert-large-uncased cover image
$0.0005 / sec
  • fill-mask

BERT is a transformers model pretrained on a large corpus of English data. Supports masked language modeling and next sentence prediction

bert-large-uncased-whole-word-masking-finetuned-squad cover image
$0.0005 / sec
  • question-answering

A whole word masking model finetuned on SQuAD is a transformer-based language model pretrained on a large corpus of English data. The model was trained using a masked language modeling objective, where 15% of the tokens in a sentence were randomly masked, and the model had to predict the missing tokens. The model was also fine-tuned on the SQuAD dataset for question answering tasks, achieving high scores on both F1 and exact match metrics.

bigcode/starcoder cover image
8k
$0.0005 / sec
  • text-generation

A 15.5B parameter model trained on 80+ programming languages from The Stack (v1.2) dataset, using a GPT-2 architecture with multi-query attention and Fill-in-the-Middle objective. The model is capable of generating code snippets provided some context, but the generated code is not guaranteed to work as intended and may contain bugs or exploits. The model is licensed under the BigCode OpenRAIL-M v1 license agreement.

camembert-base cover image
$0.0005 / sec
  • fill-mask

We extract contextual embedding features from Camembert, a fill-mask language model, for the task of sentiment analysis. We use the tokenize and encode functions to convert our sentence into a numerical representation, and then feed it into the Camembert model to get the contextual embeddings. We extract the embeddings from all 12 self-attention layers and the input embedding layer to form a 13-dimensional feature vector for each sentence.

cardiffnlp/twitter-roberta-base-emotion cover image
$0.0005 / sec
  • text-classification

A model trained on approximately 58 million tweets and fine-tuned for emotion recognition using the TweetEval benchmark. The model achieves high accuracy on various emotion classification tasks, including emojii, emotion, hate, irony, offensive, sentiment, stance/abortion, stance/atheism, stance/climate, stance/feminist, and stance/hillary.

cardiffnlp/twitter-roberta-base-sentiment cover image
$0.0005 / sec
  • text-classification

A RoBERTa-base trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English.

codellama/CodeLlama-34b-Instruct-hf cover image
4k
$0.60 / Mtoken
  • text-generation

Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts. This particular instance is the 34b instruct variant

csarron/bert-base-uncased-squad-v1 cover image
$0.0005 / sec
  • question-answering

We present a fine-tuned BERT-base uncased model for question answering on the SQuAD v1 dataset. Our model achieves an exact match score of 80.9104 and an F1 score of 88.2302 without any hyperparameter search.

deepset/bert-large-uncased-whole-word-masking-squad2 cover image
$0.0005 / sec
  • question-answering

We present a BERT-based language model called bert-large-uncased-whole-word-masking-squad2, trained on the SQuAD2.0 dataset for extractive question answering. The model achieves high scores on exact match and F1 metrics.

deepset/minilm-uncased-squad2 cover image
$0.0005 / sec
  • question-answering

Microsoft's MiniLM-L12-H384-uncased language model achieved state-of-the-art results on the SQuAD 2.0 question-answering benchmark, with exact match and F1 scores of 76.13% and 79.54%, respectively. The model was trained on the SQuAD 2.0 dataset using a batch size of 12, learning rate of 4e-5, and 4 epochs. The authors suggest using their model as a starting point for building large language models for downstream NLP tasks.

deepset/roberta-base-squad2 cover image
$0.0005 / sec
  • question-answering

A pre-trained language model based on RoBERTa, fine-tuned on the SQuAD2.0 dataset for extractive question answering. It achieved scores of 79.87% exact match and 82.91% F1 score on the SQuAD2.0 dev set. Deepset is the company behind the open-source NLP framework Haystack, and offers other resources such as Distilled roberta-base-squad2, German BERT, and GermanQuAD datasets and models.

deepset/roberta-base-squad2-covid cover image
$0.0005 / sec
  • question-answering

We present a RoBERTa-based question answering model called roberta-base-squad2 for extractive QA on COVID-19 related texts. The model was trained on the SQuAD-style CORD-19 annotations and achieved promising results on 5-fold cross-validation.

deepset/roberta-large-squad2 cover image
$0.0005 / sec
  • question-answering

This is the roberta-large model, fine-tuned using the SQuAD2.0 dataset.

deepset/tinyroberta-squad2 cover image
$0.0005 / sec
  • question-answering

Deepset presents tinyroberta-squad2, a distilled version of their roberta-base-squad2 model that achieves similar performance while being faster. The model is trained on SQuAD 2.0 and uses Haystack's infrastructure with 4x Tesla V100 GPUs. It achieved 78.69% exact match and 81.92% F1 score on the SQuAD 2.0 dev set.

distilbert-base-cased-distilled-squad cover image
$0.0005 / sec
  • question-answering

The DistilBERT model is a small, fast, cheap, and lightweight Transformer model trained by distilling BERT base. It has 40% fewer parameters than the original BERT model and runs 60% faster, preserving over 95% of BERT's performance. The model was fine-tuned using knowledge distillation on the SQuAD v1.1 dataset and achieved a F1 score of 87.1 on the dev set.