distilbert-base-cased-distilled-squad

The DistilBERT model is a small, fast, cheap, and lightweight Transformer model trained by distilling BERT base. It has 40% fewer parameters than the original BERT model and runs 60% faster, preserving over 95% of BERT's performance. The model was fine-tuned using knowledge distillation on the SQuAD v1.1 dataset and achieved a F1 score of 87.1 on the dev set.

Public

$0.0005 / sec

api versions

Input

Question

question relating to context

Context

question source material

You need to login to use this model

Output

fox (0.18)

DistilBERT base cased distilled SQuAD

Model Details
How To Get Started With the Model
Uses
Risks, Limitations and Biases
Training
Evaluation
Environmental Impact
Technical Specifications
Citation Information
Model Card Authors

Model Details

Model Description: The DistilBERT model was proposed in the blog post Smaller, faster, cheaper, lighter: Introducing DistilBERT, adistilled version of BERT, and the paper DistilBERT, adistilled version of BERT: smaller, faster, cheaper and lighter. DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark.

This model is a fine-tune checkpoint of DistilBERT-base-cased, fine-tuned using (a second step of) knowledge distillation on SQuAD v1.1.

Developed by: Hugging Face
Model Type: Transformer-based language model
Language(s): English
License: Apache 2.0
Related Models: DistilBERT-base-cased
Resources for more information:
- See this repository for more about Distil* (a class of compressed models including this model)
- See Sanh et al. (2019) for more information about knowledge distillation and the training procedure

Uses

This model can be used for question answering.

Misuse and Out-of-scope Use

The model should not be used to intentionally create hostile or alienating environments for people. In addition, the model was not trained to be factual or true representations of people or events, and therefore using the model to generate such content is out-of-scope for the abilities of this model.

Risks, Limitations and Biases

CONTENT WARNING: Readers should be aware that language generated by this model can be disturbing or offensive to some and can propagate historical and current stereotypes.

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model can include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. For example:

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Training

Training Data

The distilbert-base-cased model was trained using the same data as the distilbert-base-uncased model. The distilbert-base-uncased model model describes it's training data as:

DistilBERT pretrained on the same data as BERT, which is BookCorpus, a dataset consisting of 11,038 unpublished books and English Wikipedia (excluding lists, tables and headers).

To learn more about the SQuAD v1.1 dataset, see the SQuAD v1.1 data card.

Training Procedure

Preprocessing

See the distilbert-base-cased model card for further details.

Pretraining

See the distilbert-base-cased model card for further details.

Evaluation

As discussed in the model repository

This model reaches a F1 score of 87.1 on the [SQuAD v1.1] dev set (for comparison, BERT bert-base-cased version reaches a F1 score of 88.7).

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019). We present the hardware type and hours used based on the associated paper. Note that these details are just for training DistilBERT, not including the fine-tuning with SQuAD.

Hardware Type: 8 16GB V100 GPUs
Hours used: 90 hours
Cloud Provider: Unknown
Compute Region: Unknown
Carbon Emitted: Unknown

Technical Specifications

See the associated paper for details on the modeling architecture, objective, compute infrastructure, and training details.

Citation Information

@inproceedings{sanh2019distilbert,
  title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter},
  author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas},
  booktitle={NeurIPS EMC^2 Workshop},
  year={2019}
}

APA:

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.

Model Card Authors

This model card was written by the Hugging Face team.

Latest Models

Gryphe/

MythoMax-L2-13b

Phind/

Phind-CodeLlama-34B-v2

openchat/

openchat_3.5

openai/

whisper-tiny

bigcode/

starcoder2-15b

Featured Models

meta-llama/

Meta-Llama-3-8B-Instruct

microsoft/

WizardLM-2-8x22B

cognitivecomputations/

dolphin-2.6-mixtral-8x7b

mistralai/

Mistral-7B-Instruct-v0.2

google/

gemma-1.1-7b-it

meta-llama/

Llama-2-7b-chat-hf

Company

Pricing

Docs

Compare

DeepStart

About

Privacy

Terms