A transformer-based language model pre-trained on a large corpus of English data using a masked language modeling objective. It was introduced in a research paper by Google researchers and achieved state-of-the-art results on various natural language processing tasks. The model is cased, meaning it differentiates between English and english, and has a configuration of 24 layers, 1024 hidden dimensions, 16 attention heads, and 336M parameters.
A transformer-based language model pre-trained on a large corpus of English data using a masked language modeling objective. It was introduced in a research paper by Google researchers and achieved state-of-the-art results on various natural language processing tasks. The model is cased, meaning it differentiates between English and english, and has a configuration of 24 layers, 1024 hidden dimensions, 16 attention heads, and 336M parameters.
d9238236d8326ce4bc117132bb3b7e62e95f3a9a
2023-03-03T06:38:45+00:00