Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/fill-mask

GroNLP/bert-base-dutch-cased cover image
$0.0005 / sec
  • fill-mask

We present BERTje, a Dutch pre-trained BERT model developed at the University of Groningen. BERTje achieved state-of-the-art results on several NLP tasks such as named entity recognition and part-of-speech tagging. We also provide a detailed comparison of BERTje with other pre-trained models such as mBERT and RobBERT.

KB/bert-base-swedish-cased cover image
$0.0005 / sec
  • fill-mask

The National Library of Sweden has released three pre-trained language models based on BERT and ALBERT for Swedish text. The models include a BERT base model, a BERT fine-tuned for named entity recognition, and an experimental ALBERT model. They were trained on approximately 15-20 GB of text data from various sources such as books, news, government publications, Swedish Wikipedia, and internet forums.

Rostlab/prot_bert cover image
$0.0005 / sec
  • fill-mask

A pre-trained language model developed specifically for protein sequences using a masked language modeling (MLM) objective. It achieved impressive results when fine-tuned on downstream tasks such as secondary structure prediction and sub-cellular localization. The model was trained on uppercase amino acids only and used a vocabulary size of 21, with inputs of the form "[CLS] Protein Sequence A [SEP] Protein Sequence B [SEP]"

Rostlab/prot_bert_bfd cover image
$0.0005 / sec
  • fill-mask

A pretrained language model on protein sequences using a masked language modeling objective. It achieved high scores on various downstream tasks such as secondary structure prediction and localization. The model was trained on a large corpus of protein sequences in a self-supervised fashion, without human labeling, using a combination of a Bert model and a vocabulary size of 21.

albert-base-v1 cover image
$0.0005 / sec
  • fill-mask

The ALBERT model is a transformer-based language model developed by Google researchers, designed for self-supervised learning of language representations. The model uses a combination of masked language modeling and sentence order prediction objectives, trained on a large corpus of English text data. Fine-tuning the model on specific downstream tasks can lead to improved performance, and various pre-trained versions are available for different NLP tasks.

albert-base-v2 cover image
$0.0005 / sec
  • fill-mask

The ALBERT model is a variant of BERT that uses a larger model size and more training data to improve performance on downstream NLP tasks. It was trained on a combination of BookCorpus and English Wikipedia, and achieved state-of-the-art results on several benchmark datasets. Fine-tuning ALBERT on specific tasks can further improve its performance.

aubmindlab/bert-base-arabertv02 cover image
$0.0005 / sec
  • fill-mask

An Arabic pretrained language model based on Google's BERT architecture, with two versions: AraBERTv1 and AraBERTv2. It uses the same BERT-Base configuration and is trained on a large dataset of 200 million words, including OSCAR-unshuffled, Arabic Wikipedia, and Assafir news articles. The model is available in TensorFlow 1.x and Hugging Face models repository.

bert-base-cased cover image
$0.0005 / sec
  • fill-mask

A transformer-based language model developed by Google Research that achieved state-of-the-art results on a wide range of NLP tasks. The model was pre-trained on a large corpus of English text, including BookCorpus and English Wikipedia, using a masked language modeling objective. Fine-tuned versions of the model are available for various downstream tasks, and the model has been shown to achieve excellent results on tasks such as question answering, sentiment analysis, and named entity recognition.

bert-base-chinese cover image
$0.0005 / sec
  • fill-mask

A pre-trained language model developed by the HuggingFace team for the Chinese language. It uses a fill-mask approach and has been trained on a large corpus of Chinese text data. The model can be used for various natural language processing tasks such as masked language modeling and has been shown to achieve state-of-the-art results in certain benchmarks. However, like other language models, it also comes with risks, limitations, and biases, including perpetuating harmful stereotypes and biases present in the data it was trained on. Users are advised to carefully evaluate and mitigate these risks when using the model.

bert-base-german-cased cover image
$0.0005 / sec
  • fill-mask

A pre-trained language model developed using Google's TensorFlow code and trained on a single cloud TPU v2. The model was trained for 810k steps with a batch size of 1024 and sequence length of 128, and then fine-tuned for 30k steps with sequence length of 512. The authors used a variety of data sources, including German Wikipedia, OpenLegalData, and news articles, and employed spacy v2.1 for data cleaning and segmentation. The model achieved good performance on various downstream tasks, such as germEval18Fine, germEval18coarse, germEval14, CONLL03, and 10kGNAD, without extensive hyperparameter tuning. Additionally, the authors found that even a randomly initialized BERT can achieve good performance when trained exclusively on labeled downstream datasets.

bert-base-multilingual-cased cover image
$0.0005 / sec
  • fill-mask

A pre-trained multilingual model that uses a masked language modeling objective to learn a bidirectional representation of languages. It was trained on 104 languages with the largest Wikipedias, and its inputs are in the form of [CLS] Sentence A [SEP] Sentence B [SEP]. The model is primarily aimed at being fine-tuned on tasks that use the whole sentence, potentially masked, to make decisions.

bert-base-multilingual-uncased cover image
$0.0005 / sec
  • fill-mask

A transformer-based language model trained on 102 languages with the largest Wikipedia. It was introduced in a research paper by Google Research and has been widely used for various natural language processing tasks. The model is trained using a masked language modeling objective, where 15% of the tokens are masked, and the model predicts the missing tokens.

bert-base-uncased cover image
$0.0005 / sec
  • fill-mask

A transformers model pretrained on a large corpus of English data in a self-supervised fashion. It was trained on BookCorpus, a dataset consisting of 11,038 unpublished books, and English Wikipedia, excluding lists, tables, and headers. The model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks.

bert-large-cased cover image
$0.0005 / sec
  • fill-mask

A transformer-based language model pre-trained on a large corpus of English data using a masked language modeling objective. It was introduced in a research paper by Google researchers and achieved state-of-the-art results on various natural language processing tasks. The model is cased, meaning it differentiates between English and english, and has a configuration of 24 layers, 1024 hidden dimensions, 16 attention heads, and 336M parameters.

bert-large-uncased cover image
$0.0005 / sec
  • fill-mask

BERT is a transformers model pretrained on a large corpus of English data. Supports masked language modeling and next sentence prediction

camembert-base cover image
$0.0005 / sec
  • fill-mask

We extract contextual embedding features from Camembert, a fill-mask language model, for the task of sentiment analysis. We use the tokenize and encode functions to convert our sentence into a numerical representation, and then feed it into the Camembert model to get the contextual embeddings. We extract the embeddings from all 12 self-attention layers and the input embedding layer to form a 13-dimensional feature vector for each sentence.

distilbert-base-multilingual-cased cover image
$0.0005 / sec
  • fill-mask

The DistilBERT model is a distilled version of the BERT base multilingual model, trained on 104 languages and featuring 6 layers, 768 dimensions, and 12 heads. It is designed for masked language modeling and next sentence prediction tasks, with potential applications in natural language processing and downstream tasks. However, it should not be used to intentionally create hostile or alienating environments for people, and users should be aware of its risks, biases, and limitations.

distilbert-base-uncased cover image
$0.0005 / sec
  • fill-mask

DistilBERT is a smaller, faster, and cheaper version of BERT, a popular language model. It was trained on the same data as BERT, including BookCorpus and English Wikipedia, but with a few key differences in the preprocessing and training procedures. Despite its smaller size, DistilBERT achieve's similar results to BERT on various natural language processing tasks.