The RoBERTa model was pretrained on a dataset created by combining several sources including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. It uses a tokenization scheme with a vocabulary size of 50,000 and replaces 15% of the tokens with either a special masking token or a random token. The model achieved impressive results when fine-tuned on various downstream NLP tasks, outperforming its predecessor BERT in many areas.
The RoBERTa model was pretrained on a dataset created by combining several sources including BookCorpus, English Wikipedia, CC-News, OpenWebText, and Stories. It uses a tokenization scheme with a vocabulary size of 50,000 and replaces 15% of the tokens with either a special masking token or a random token. The model achieved impressive results when fine-tuned on various downstream NLP tasks, outperforming its predecessor BERT in many areas.
ff46155979338ff8063cdad90908b498ab91b181
2023-03-03T03:37:57+00:00