DistilRoBERTa is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimensions, and 12 heads, totaling 82M parameters. It is trained on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset, and achieves comparable performance to RoBERTa while being twice as fast. The model is designed for masked language modeling and can be fine-tuned for downstream tasks, but it also comes with potential biases and limitations, including significant gender and ethnicity biases in its predictions.
DistilRoBERTa is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimensions, and 12 heads, totaling 82M parameters. It is trained on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset, and achieves comparable performance to RoBERTa while being twice as fast. The model is designed for masked language modeling and can be fine-tuned for downstream tasks, but it also comes with potential biases and limitations, including significant gender and ethnicity biases in its predictions.
d5411c3ee9e1793fd9ef58390b40a80a4c10df32
2023-03-03T02:36:55+00:00