The DistilBERT model is a small, fast, cheap, and lightweight Transformer model trained by distilling BERT base. It has 40% fewer parameters than the original BERT model and runs 60% faster, preserving over 95% of BERT's performance. The model was fine-tuned using knowledge distillation on the SQuAD v1.1 dataset and achieved a F1 score of 87.1 on the dev set.
The DistilBERT model is a small, fast, cheap, and lightweight Transformer model trained by distilling BERT base. It has 40% fewer parameters than the original BERT model and runs 60% faster, preserving over 95% of BERT's performance. The model was fine-tuned using knowledge distillation on the SQuAD v1.1 dataset and achieved a F1 score of 87.1 on the dev set.
question relating to context
question source material
You need to login to use this model
fox (0.18)