A transformer-based language model pre-trained on a large corpus of English data using a masked language modeling objective. It was introduced in a research paper by Google researchers and achieved state-of-the-art results on various natural language processing tasks. The model is cased, meaning it differentiates between English and english, and has a configuration of 24 layers, 1024 hidden dimensions, 16 attention heads, and 336M parameters.
A transformer-based language model pre-trained on a large corpus of English data using a masked language modeling objective. It was introduced in a research paper by Google researchers and achieved state-of-the-art results on various natural language processing tasks. The model is cased, meaning it differentiates between English and english, and has a configuration of 24 layers, 1024 hidden dimensions, 16 attention heads, and 336M parameters.
text prompt, should include exactly one [MASK] token
You need to login to use this model
where is my father? (0.09)
where is my mother? (0.08)