Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/fill-mask

distilroberta-base cover image
$0.0005 / sec
  • fill-mask

DistilRoBERTa is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimensions, and 12 heads, totaling 82M parameters. It is trained on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset, and achieves comparable performance to RoBERTa while being twice as fast. The model is designed for masked language modeling and can be fine-tuned for downstream tasks, but it also comes with potential biases and limitations, including significant gender and ethnicity biases in its predictions.

dmis-lab/biobert-base-cased-v1.2 cover image
$0.0005 / sec
  • fill-mask

BioBERT is a pre-trained biomedical language representation model for biomedical text mining based on original BERT and trained by DMIS-LAB

emilyalsentzer/Bio_ClinicalBERT cover image
$0.0005 / sec
  • fill-mask

The Bio+Clinical BERT model, is initialized from BioBERT and trained on all MIMIC notes. The model was pre-trained using a rules-based section splitter and Sentispacy tokenizer, with a batch size of 32, max sequence length of 128, and learning rate of 5·10^-5 for 150,000 steps.

emilyalsentzer/Bio_Discharge_Summary_BERT cover image
$0.0005 / sec
  • fill-mask

The Bio+Discharge Summary BERT model, initialized from BioBERT and trained on only discharge summaries from MIMIC, is described. The model was pre-trained using a rules-based section splitter and SentencePiece tokenizer, with a batch size of 32, maximum sequence length of 128, and learning rate of 5·10^-5 for 150,000 steps.

hfl/chinese-bert-wwm-ext cover image
$0.0005 / sec
  • fill-mask

Chinese pre-trained BERT with Whole Word Masking, which can be used for various NLP tasks such as question answering, sentiment analysis, named entity recognition, etc. This work is based on the original BERT model but with additional whole word masking techniques to improve its performance on out-of-vocabulary words.

hfl/chinese-roberta-wwm-ext cover image
$0.0005 / sec
  • fill-mask

We present Chinese pre-trained BERT with Whole Word Masking, which is an extension of the original BERT model tailored for Chinese natural language processing tasks. This variant uses whole word masking instead of subword tokenization to improve performance on out-of-vocabulary words and enhance language understanding capabilities.

huggingface/CodeBERTa-small-v1 cover image
$0.0005 / sec
  • fill-mask

CodeBERTa is a RoBERTa-like model trained on the CodeSearchNet dataset from GitHub. Supported languages: go, java, javascript, php, python, ruby.

jackaduma/SecBERT cover image
$0.0005 / sec
  • fill-mask

SecBERT is a pretrained language model for cyber security text, trained on a dataset of papers from various sources, including APTnotes, Stucco-Data, and CASIE. The model has its own wordpiece vocabulary, secvocab, and is available in two versions, SecBERT and SecRoBERTa. The model can improve downstream tasks such as NER, text classification, semantic understanding, and Q&A in the cyber security domain.

klue/bert-base cover image
$0.0005 / sec
  • fill-mask

The KLUE BERT base is a pre-trained BERT model on Korean Language. It was developed by the Facebook AI Research Lab and is licensed under cc-by-sa-4.0. The model can be used for various tasks like topic classification, semantic textual similarity, natural language inference, named entity recognition, and others.

microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext cover image
$0.0005 / sec
  • fill-mask

PubMedBERT is a pretrained language model specifically designed for biomedical natural language processing tasks. It was trained from scratch using abstracts and full-text articles from PubMed and PubMedCentral, and achieved state-of-the-art performance on various biomedical NLP tasks.

microsoft/codebert-base-mlm cover image
$0.0005 / sec
  • fill-mask

A pre-trained language model designed to handle both programming languages and natural languages. With a multi-task learning framework that includes masked language modeling, next sentence prediction, and replaced token detection, CodeBERT achieves state-of-the-art results on various code understanding tasks while also performing well on natural language processing benchmarks. We analyze the effects of different design choices and provide insights into the behavior of CodeBERT, demonstrating its potential as a versatile tool for a wide range of applications involving both coding and natural language understanding.

microsoft/deberta-base cover image
$0.0005 / sec
  • fill-mask

DeBERTa is a variant of BERT that uses disentangled attention and an enhanced mask decoder to improve performance on natural language understanding (NLU) tasks. In a study, DeBERTa outperformed BERT and RoBERTa on most NLU tasks with only 80GB of training data. The model showed particularly strong results on the SQuAD 1.1/2.0 and MNLI tasks.

microsoft/deberta-v2-xlarge cover image
$0.0005 / sec
  • fill-mask

DeBERTa (Decoding-Enhanced BERT with Disentangled Attention) is a novel language model that improves upon BERT and RoBERTa using disentangled attention and enhanced mask decoding. It achieves state-of-the-art results on various NLU tasks while requiring less computational resources than its predecessors.

microsoft/deberta-v3-base cover image
$0.0005 / sec
  • fill-mask

DeBERTaV3 is an improved version of the DeBERTa model that uses ELECTRA-style pre-training with gradient-disentangled embedding sharing. The new model significantly improves performance on downstream tasks compared to DeBERTa, and achieves state-of-the-art results on SQuAD 2.0 and MNLI tasks. DeBERTaV3 has a hidden size of 768 and 86 million backbone parameters, and was trained using a vocabulary of 128K tokens.

naver/splade-cocondenser-ensembledistil cover image
$0.0005 / sec
  • fill-mask

The SPLADE CoCondenser EnsembleDistil model is a passage retrieval system based on sparse neural IR models, which achieves state-of-the-art performance on MS MARCO dev dataset with MRR@10 of 38.3 and R@1000 of 98.3. The model uses a combination of distillation and hard negative sampling techniques to improve its effectiveness.

neuralmind/bert-base-portuguese-cased cover image
$0.0005 / sec
  • fill-mask

A pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. The model is available in two sizes: Base and Large, and can be used for various NLP tasks such as masked language modeling and embedding generation.

neuralmind/bert-large-portuguese-cased cover image
$0.0005 / sec
  • fill-mask

BERTimbau Large is a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks. It is available in two sizes: Base and Large. The model can be used for various NLP tasks such as masked language modeling prediction, and BERT embeddings.

nlpaueb/legal-bert-base-uncased cover image
$0.0005 / sec
  • fill-mask

LEGAL-BERT is a family of BERT models for the legal domain, designed to assist legal NLP research, computational law, and legal technology applications. It includes five variants, including LEGAL-BERT-BASE, which achieved better performance than other models on several downstream tasks. The authors suggest possible applications, such as developing question answering systems for databases, ontologies, document collections, and the web; natural language generation from databases and ontologies; text classification; information extraction and opinion mining; and machine learning in natural language processing.