fill-mask
DistilRoBERTa is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimensions, and 12 heads, totaling 82M parameters. It is trained on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset, and achieves comparable performance to RoBERTa while being twice as fast. The model is designed for masked language modeling and can be fine-tuned for downstream tasks, but it also comes with potential biases and limitations, including significant gender and ethnicity biases in its predictions.
fill-mask
BioBERT is a pre-trained biomedical language representation model for biomedical text mining based on original BERT and trained by DMIS-LAB
fill-mask
The Bio+Clinical BERT model, is initialized from BioBERT and trained on all MIMIC notes. The model was pre-trained using a rules-based section splitter and Sentispacy tokenizer, with a batch size of 32, max sequence length of 128, and learning rate of 5·10^-5 for 150,000 steps.
fill-mask
The Bio+Discharge Summary BERT model, initialized from BioBERT and trained on only discharge summaries from MIMIC, is described. The model was pre-trained using a rules-based section splitter and SentencePiece tokenizer, with a batch size of 32, maximum sequence length of 128, and learning rate of 5·10^-5 for 150,000 steps.
fill-mask
Chinese pre-trained BERT with Whole Word Masking, which can be used for various NLP tasks such as question answering, sentiment analysis, named entity recognition, etc. This work is based on the original BERT model but with additional whole word masking techniques to improve its performance on out-of-vocabulary words.
fill-mask
We present Chinese pre-trained BERT with Whole Word Masking, which is an extension of the original BERT model tailored for Chinese natural language processing tasks. This variant uses whole word masking instead of subword tokenization to improve performance on out-of-vocabulary words and enhance language understanding capabilities.
fill-mask
CodeBERTa is a RoBERTa-like model trained on the CodeSearchNet dataset from GitHub. Supported languages: go, java, javascript, php, python, ruby.
fill-mask
SecBERT is a pretrained language model for cyber security text, trained on a dataset of papers from various sources, including APTnotes, Stucco-Data, and CASIE. The model has its own wordpiece vocabulary, secvocab, and is available in two versions, SecBERT and SecRoBERTa. The model can improve downstream tasks such as NER, text classification, semantic understanding, and Q&A in the cyber security domain.
fill-mask
The KLUE BERT base is a pre-trained BERT model on Korean Language. It was developed by the Facebook AI Research Lab and is licensed under cc-by-sa-4.0. The model can be used for various tasks like topic classification, semantic textual similarity, natural language inference, named entity recognition, and others.
fill-mask
PubMedBERT is a pretrained language model specifically designed for biomedical natural language processing tasks. It was trained from scratch using abstracts and full-text articles from PubMed and PubMedCentral, and achieved state-of-the-art performance on various biomedical NLP tasks.
fill-mask
A pre-trained language model designed to handle both programming languages and natural languages. With a multi-task learning framework that includes masked language modeling, next sentence prediction, and replaced token detection, CodeBERT achieves state-of-the-art results on various code understanding tasks while also performing well on natural language processing benchmarks. We analyze the effects of different design choices and provide insights into the behavior of CodeBERT, demonstrating its potential as a versatile tool for a wide range of applications involving both coding and natural language understanding.
fill-mask
DeBERTa is a variant of BERT that uses disentangled attention and an enhanced mask decoder to improve performance on natural language understanding (NLU) tasks. In a study, DeBERTa outperformed BERT and RoBERTa on most NLU tasks with only 80GB of training data. The model showed particularly strong results on the SQuAD 1.1/2.0 and MNLI tasks.
fill-mask
DeBERTa (Decoding-Enhanced BERT with Disentangled Attention) is a novel language model that improves upon BERT and RoBERTa using disentangled attention and enhanced mask decoding. It achieves state-of-the-art results on various NLU tasks while requiring less computational resources than its predecessors.
fill-mask
DeBERTaV3 is an improved version of the DeBERTa model that uses ELECTRA-style pre-training with gradient-disentangled embedding sharing. The new model significantly improves performance on downstream tasks compared to DeBERTa, and achieves state-of-the-art results on SQuAD 2.0 and MNLI tasks. DeBERTaV3 has a hidden size of 768 and 86 million backbone parameters, and was trained using a vocabulary of 128K tokens.
fill-mask
The SPLADE CoCondenser EnsembleDistil model is a passage retrieval system based on sparse neural IR models, which achieves state-of-the-art performance on MS MARCO dev dataset with MRR@10 of 38.3 and R@1000 of 98.3. The model uses a combination of distillation and hard negative sampling techniques to improve its effectiveness.
fill-mask
A pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks: Named Entity Recognition, Sentence Textual Similarity and Recognizing Textual Entailment. The model is available in two sizes: Base and Large, and can be used for various NLP tasks such as masked language modeling and embedding generation.
fill-mask
BERTimbau Large is a pretrained BERT model for Brazilian Portuguese that achieves state-of-the-art performances on three downstream NLP tasks. It is available in two sizes: Base and Large. The model can be used for various NLP tasks such as masked language modeling prediction, and BERT embeddings.
fill-mask
LEGAL-BERT is a family of BERT models for the legal domain, designed to assist legal NLP research, computational law, and legal technology applications. It includes five variants, including LEGAL-BERT-BASE, which achieved better performance than other models on several downstream tasks. The authors suggest possible applications, such as developing question answering systems for databases, ontologies, document collections, and the web; natural language generation from databases and ontologies; text classification; information extraction and opinion mining; and machine learning in natural language processing.