roberta-large cover image

roberta-large

The RoBERTa model was pre-trained on a dataset consisting of 11,038 books, English Wikipedia, 63 million news articles, and a dataset containing a subset of Common Crawl data. It achieved state-of-the-art results on Glue, SuperGLUE, and multi-task benchmarks while exhibiting less sensitivity to hyperparameter tuning compared to BERT. RoBERTa uses a robust optimization approach and dynamic masking, which changes during pre-training, unlike BERT.

The RoBERTa model was pre-trained on a dataset consisting of 11,038 books, English Wikipedia, 63 million news articles, and a dataset containing a subset of Common Crawl data. It achieved state-of-the-art results on Glue, SuperGLUE, and multi-task benchmarks while exhibiting less sensitivity to hyperparameter tuning compared to BERT. RoBERTa uses a robust optimization approach and dynamic masking, which changes during pre-training, unlike BERT.

Public
$0.0005/sec
demoapi

5069d8a2a32a7df4c69ef9b56348be04152a2341

2023-03-03T03:08:28+00:00


© 2023 Deep Infra. All rights reserved.

Discord Logo