bert-base-german-cased cover image

bert-base-german-cased

A pre-trained language model developed using Google's TensorFlow code and trained on a single cloud TPU v2. The model was trained for 810k steps with a batch size of 1024 and sequence length of 128, and then fine-tuned for 30k steps with sequence length of 512. The authors used a variety of data sources, including German Wikipedia, OpenLegalData, and news articles, and employed spacy v2.1 for data cleaning and segmentation. The model achieved good performance on various downstream tasks, such as germEval18Fine, germEval18coarse, germEval14, CONLL03, and 10kGNAD, without extensive hyperparameter tuning. Additionally, the authors found that even a randomly initialized BERT can achieve good performance when trained exclusively on labeled downstream datasets.

A pre-trained language model developed using Google's TensorFlow code and trained on a single cloud TPU v2. The model was trained for 810k steps with a batch size of 1024 and sequence length of 128, and then fine-tuned for 30k steps with sequence length of 512. The authors used a variety of data sources, including German Wikipedia, OpenLegalData, and news articles, and employed spacy v2.1 for data cleaning and segmentation. The model achieved good performance on various downstream tasks, such as germEval18Fine, germEval18coarse, germEval14, CONLL03, and 10kGNAD, without extensive hyperparameter tuning. Additionally, the authors found that even a randomly initialized BERT can achieve good performance when trained exclusively on labeled downstream datasets.

Public
$0.0005/sec

Input

text prompt, should include exactly one [MASK] token

You need to login to use this model

Output

where is my father? (0.09)

where is my mother? (0.08)

 


© 2023 Deep Infra. All rights reserved.

Discord Logo