Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

text-to-image

automatic-speech-recognition

embeddings

token-classification

fill-mask

text-classification

question-answering

image-classification

object-detection

custom

zero-shot-image-classification

Category/fill-mask

$0.0005 / sec

nlpaueb/

legal-bert-small-uncased

fill-mask

We present LEGAL-BERT, a family of BERT models for the legal domain, designed to assist legal NLP research, computational law, and legal technology applications. Our models are trained on a large corpus of legal texts and demonstrate improved performance compared to using BERT out of the box. We release our models and pre-training corpora to facilitate further research and development in this field.

$0.0005 / sec

nreimers/

mMiniLMv2-L12-H384-distilled-from-XLMR-Large

fill-mask

A multilingual MiniLMv2 model trained on 16 languages, using a shared vocabulary and language-specific embeddings. The model is based on the transformer architecture and was developed by Microsoft Research. It includes support for various natural language processing tasks such as language translation, question answering, and text classification.

$0.0005 / sec

roberta-base

fill-mask

The RoBERTa model was pretrained on a dataset created by combining several sources including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. It uses a tokenization scheme with a vocabulary size of 50,000 and replaces 15% of the tokens with either a special masking token or a random token. The model achieved impressive results when fine-tuned on various downstream NLP tasks, outperforming its predecessor BERT in many areas.

$0.0005 / sec

roberta-large

fill-mask

The RoBERTa model was pre-trained on a dataset consisting of 11,038 books, English Wikipedia, 63 million news articles, and a dataset containing a subset of Common Crawl data. It achieved state-of-the-art results on Glue, SuperGLUE, and multi-task benchmarks while exhibiting less sensitivity to hyperparameter tuning compared to BERT. RoBERTa uses a robust optimization approach and dynamic masking, which changes during pre-training, unlike BERT.

$0.0005 / sec

sberbank-ai/

ruRoberta-large

fill-mask

The ruRoberta-large model was trained by the SberDevices team for mask filling tasks using encoders and BBPE tokenizers. It has 355 million parameters and was trained on 250GB of data. The NLP Core Team RnD, including Dmitry Zmitrovich, contributed to its development.

$0.0005 / sec

smanjil/

German-MedBERT

fill-mask

This paper presents a fine-tuned German Medical BERT model for the medical domain, achieving improved performance on the NTS-ICD-10 text classification task. The model was trained using PyTorch and Hugging Face library on Colab GPU, with standard parameter settings and up to 25 epochs for classification. Evaluation results show significant improvement in micro precision, recall, and F1 score compared to the base German BERT model.

$0.0005 / sec

uer/

albert-base-chinese-cluecorpussmall

fill-mask

We present a Chinese version of the popular BERT-based language model, called Albert, which is trained on the ClueCorpusSmall dataset using the UER-py toolkit. Our model achieves state-of-the-art results on various NLP tasks. The model is trained in two stages, first with a sequence length of 128 and then with a sequence length of 512.

$0.0005 / sec

xlm-roberta-base

fill-mask

The XLM-RoBERTa model is a multilingual version of RoBERTa, pre-trained on 2.5TB of filtered CommonCrawl data containing 100 languages. It was introduced in the paper "Unsupervised Cross-lingual Representation Learning at Scale" by Conneau et al. and first released in this repository. The model learns an inner representation of 100 languages that can be used to extract features useful for downstream tasks.

Latest Models

openai/

whisper-tiny

openchat/

openchat_3.5

Gryphe/

MythoMax-L2-13b

Phind/

Phind-CodeLlama-34B-v2

bigcode/

starcoder2-15b

Featured Models

openchat/

openchat_3.5

mistralai/

Mixtral-8x22B-Instruct-v0.1

meta-llama/

Llama-2-7b-chat-hf

microsoft/

WizardLM-2-8x22B

BAAI/

bge-large-en-v1.5

mistralai/

Mixtral-8x7B-Instruct-v0.1

Company

Pricing

Docs

Compare

DeepStart

About

Privacy

Terms