Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/fill-mask

nlpaueb/legal-bert-small-uncased cover image
$0.0005 / sec
  • fill-mask

We present LEGAL-BERT, a family of BERT models for the legal domain, designed to assist legal NLP research, computational law, and legal technology applications. Our models are trained on a large corpus of legal texts and demonstrate improved performance compared to using BERT out of the box. We release our models and pre-training corpora to facilitate further research and development in this field.

nreimers/mMiniLMv2-L12-H384-distilled-from-XLMR-Large cover image
$0.0005 / sec
  • fill-mask

A multilingual MiniLMv2 model trained on 16 languages, using a shared vocabulary and language-specific embeddings. The model is based on the transformer architecture and was developed by Microsoft Research. It includes support for various natural language processing tasks such as language translation, question answering, and text classification.

roberta-base cover image
$0.0005 / sec
  • fill-mask

The RoBERTa model was pretrained on a dataset created by combining several sources including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. It uses a tokenization scheme with a vocabulary size of 50,000 and replaces 15% of the tokens with either a special masking token or a random token. The model achieved impressive results when fine-tuned on various downstream NLP tasks, outperforming its predecessor BERT in many areas.

roberta-large cover image
$0.0005 / sec
  • fill-mask

The RoBERTa model was pre-trained on a dataset consisting of 11,038 books, English Wikipedia, 63 million news articles, and a dataset containing a subset of Common Crawl data. It achieved state-of-the-art results on Glue, SuperGLUE, and multi-task benchmarks while exhibiting less sensitivity to hyperparameter tuning compared to BERT. RoBERTa uses a robust optimization approach and dynamic masking, which changes during pre-training, unlike BERT.

sberbank-ai/ruRoberta-large cover image
$0.0005 / sec
  • fill-mask

The ruRoberta-large model was trained by the SberDevices team for mask filling tasks using encoders and BBPE tokenizers. It has 355 million parameters and was trained on 250GB of data. The NLP Core Team RnD, including Dmitry Zmitrovich, contributed to its development.

smanjil/German-MedBERT cover image
$0.0005 / sec
  • fill-mask

This paper presents a fine-tuned German Medical BERT model for the medical domain, achieving improved performance on the NTS-ICD-10 text classification task. The model was trained using PyTorch and Hugging Face library on Colab GPU, with standard parameter settings and up to 25 epochs for classification. Evaluation results show significant improvement in micro precision, recall, and F1 score compared to the base German BERT model.

uer/albert-base-chinese-cluecorpussmall cover image
$0.0005 / sec
  • fill-mask

We present a Chinese version of the popular BERT-based language model, called Albert, which is trained on the ClueCorpusSmall dataset using the UER-py toolkit. Our model achieves state-of-the-art results on various NLP tasks. The model is trained in two stages, first with a sequence length of 128 and then with a sequence length of 512.

xlm-roberta-base cover image
$0.0005 / sec
  • fill-mask

The XLM-RoBERTa model is a multilingual version of RoBERTa, pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper "Unsupervised Cross-lingual Representation Learning at Scale" by Conneau et al. and first released in this repository. The model learns an inner representation of 100 languages that can be used to extract features useful for downstream tasks.