Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

albert-base-v1 cover image
$0.0005 / sec
  • fill-mask

The ALBERT model is a transformer-based language model developed by Google researchers, designed for self-supervised learning of language representations. The model uses a combination of masked language modeling and sentence order prediction objectives, trained on a large corpus of English text data. Fine-tuning the model on specific downstream tasks can lead to improved performance, and various pre-trained versions are available for different NLP tasks.

albert-base-v2 cover image
$0.0005 / sec
  • fill-mask

The ALBERT model is a variant of BERT that uses a larger model size and more training data to improve performance on downstream NLP tasks. It was trained on a combination of BookCorpus and English Wikipedia, and achieved state-of-the-art results on several benchmark datasets. Fine-tuning ALBERT on specific tasks can further improve its performance.

aubmindlab/bert-base-arabertv02 cover image
$0.0005 / sec
  • fill-mask

An Arabic pretrained language model based on Google's BERT architecture, with two versions: AraBERTv1 and AraBERTv2. It uses the same BERT-Base configuration and is trained on a large dataset of 200 million words, including OSCAR-unshuffled, Arabic Wikipedia, and Assafir news articles. The model is available in TensorFlow 1.x and Hugging Face models repository.

bert-base-cased cover image
$0.0005 / sec
  • fill-mask

A transformer-based language model developed by Google Research that achieved state-of-the-art results on a wide range of NLP tasks. The model was pre-trained on a large corpus of English text, including BookCorpus and English Wikipedia, using a masked language modeling objective. Fine-tuned versions of the model are available for various downstream tasks, and the model has been shown to achieve excellent results on tasks such as question answering, sentiment analysis, and named entity recognition.

bert-base-chinese cover image
$0.0005 / sec
  • fill-mask

A pre-trained language model developed by the HuggingFace team for the Chinese language. It uses a fill-mask approach and has been trained on a large corpus of Chinese text data. The model can be used for various natural language processing tasks such as masked language modeling and has been shown to achieve state-of-the-art results in certain benchmarks. However, like other language models, it also comes with risks, limitations, and biases, including perpetuating harmful stereotypes and biases present in the data it was trained on. Users are advised to carefully evaluate and mitigate these risks when using the model.

bert-base-german-cased cover image
$0.0005 / sec
  • fill-mask

A pre-trained language model developed using Google's TensorFlow code and trained on a single cloud TPU v2. The model was trained for 810k steps with a batch size of 1024 and sequence length of 128, and then fine-tuned for 30k steps with sequence length of 512. The authors used a variety of data sources, including German Wikipedia, OpenLegalData, and news articles, and employed spacy v2.1 for data cleaning and segmentation. The model achieved good performance on various downstream tasks, such as germEval18Fine, germEval18coarse, germEval14, CONLL03, and 10kGNAD, without extensive hyperparameter tuning. Additionally, the authors found that even a randomly initialized BERT can achieve good performance when trained exclusively on labeled downstream datasets.

bert-base-multilingual-cased cover image
$0.0005 / sec
  • fill-mask

A pre-trained multilingual model that uses a masked language modeling objective to learn a bidirectional representation of languages. It was trained on 104 languages with the largest Wikipedias, and its inputs are in the form of [CLS] Sentence A [SEP] Sentence B [SEP]. The model is primarily aimed at being fine-tuned on tasks that use the whole sentence, potentially masked, to make decisions.

bert-base-multilingual-uncased cover image
$0.0005 / sec
  • fill-mask

A transformer-based language model trained on 102 languages with the largest Wikipedia. It was introduced in a research paper by Google Research and has been widely used for various natural language processing tasks. The model is trained using a masked language modeling objective, where 15% of the tokens are masked, and the model predicts the missing tokens.

bert-base-uncased cover image
$0.0005 / sec
  • fill-mask

A transformers model pretrained on a large corpus of English data in a self-supervised fashion. It was trained on BookCorpus, a dataset consisting of 11,038 unpublished books, and English Wikipedia, excluding lists, tables, and headers. The model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks.

bert-large-cased cover image
$0.0005 / sec
  • fill-mask

A transformer-based language model pre-trained on a large corpus of English data using a masked language modeling objective. It was introduced in a research paper by Google researchers and achieved state-of-the-art results on various natural language processing tasks. The model is cased, meaning it differentiates between English and english, and has a configuration of 24 layers, 1024 hidden dimensions, 16 attention heads, and 336M parameters.

bert-large-uncased cover image
$0.0005 / sec
  • fill-mask

BERT is a transformers model pretrained on a large corpus of English data. Supports masked language modeling and next sentence prediction

bert-large-uncased-whole-word-masking-finetuned-squad cover image
$0.0005 / sec
  • question-answering

A whole word masking model finetuned on SQuAD is a transformer-based language model pretrained on a large corpus of English data. The model was trained using a masked language modeling objective, where 15% of the tokens in a sentence were randomly masked, and the model had to predict the missing tokens. The model was also fine-tuned on the SQuAD dataset for question answering tasks, achieving high scores on both F1 and exact match metrics.

bigcode/starcoder2-15b cover image
16k
$0.40 / Mtoken
  • text-generation

StarCoder2-15B model is a 15B parameter model trained on 600+ programming languages. It specializes in code completion.

camembert-base cover image
$0.0005 / sec
  • fill-mask

We extract contextual embedding features from Camembert, a fill-mask language model, for the task of sentiment analysis. We use the tokenize and encode functions to convert our sentence into a numerical representation, and then feed it into the Camembert model to get the contextual embeddings. We extract the embeddings from all 12 self-attention layers and the input embedding layer to form a 13-dimensional feature vector for each sentence.

cardiffnlp/twitter-roberta-base-emotion cover image
$0.0005 / sec
  • text-classification

A model trained on approximately 58 million tweets and fine-tuned for emotion recognition using the TweetEval benchmark. The model achieves high accuracy on various emotion classification tasks, including emojii, emotion, hate, irony, offensive, sentiment, stance/abortion, stance/atheism, stance/climate, stance/feminist, and stance/hillary.

cardiffnlp/twitter-roberta-base-sentiment cover image
$0.0005 / sec
  • text-classification

A RoBERTa-base trained on ~124M tweets from January 2018 to December 2021, and finetuned for sentiment analysis with the TweetEval benchmark. This model is suitable for English.

codellama/CodeLlama-34b-Instruct-hf cover image
4k
$0.60 / Mtoken
  • text-generation

Code Llama is a state-of-the-art LLM capable of generating code, and natural language about code, from both code and natural language prompts. This particular instance is the 34b instruct variant

codellama/CodeLlama-70b-Instruct-hf cover image
4k
Replaced
  • text-generation

CodeLlama-70b is the largest and latest code generation from the Code Llama collection.