DeBERTaV3 is an improved version of the DeBERTa model that uses ELECTRA-style pre-training with gradient-disentangled embedding sharing. The new model significantly improves performance on downstream tasks compared to DeBERTa, and achieves state-of-the-art results on SQuAD 2.0 and MNLI tasks. DeBERTaV3 has a hidden size of 768 and 86 million backbone parameters, and was trained using a vocabulary of 128K tokens.
DeBERTaV3 is an improved version of the DeBERTa model that uses ELECTRA-style pre-training with gradient-disentangled embedding sharing. The new model significantly improves performance on downstream tasks compared to DeBERTa, and achieves state-of-the-art results on SQuAD 2.0 and MNLI tasks. DeBERTaV3 has a hidden size of 768 and 86 million backbone parameters, and was trained using a vocabulary of 128K tokens.
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request