microsoft/deberta-v3-base cover image

microsoft/deberta-v3-base

DeBERTaV3 is an improved version of the DeBERTa model that uses ELECTRA-style pre-training with gradient-disentangled embedding sharing. The new model significantly improves performance on downstream tasks compared to DeBERTa, and achieves state-of-the-art results on SQuAD 2.0 and MNLI tasks. DeBERTaV3 has a hidden size of 768 and 86 million backbone parameters, and was trained using a vocabulary of 128K tokens.

DeBERTaV3 is an improved version of the DeBERTa model that uses ELECTRA-style pre-training with gradient-disentangled embedding sharing. The new model significantly improves performance on downstream tasks compared to DeBERTa, and achieves state-of-the-art results on SQuAD 2.0 and MNLI tasks. DeBERTaV3 has a hidden size of 768 and 86 million backbone parameters, and was trained using a vocabulary of 128K tokens.

Public
$0.0005/sec
demoapi

8ccc9b6f36199bec6961081d44eb72fb3f7353f3

2023-03-03T22:37:22+00:00


© 2023 Deep Infra. All rights reserved.

Discord Logo