Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

csarron/bert-base-uncased-squad-v1 cover image
$0.0005 / sec
  • question-answering

We present a fine-tuned BERT-base uncased model for question answering on the SQuAD v1 dataset. Our model achieves an exact match score of 80.9104 and an F1 score of 88.2302 without any hyperparameter search.

deepset/bert-large-uncased-whole-word-masking-squad2 cover image
$0.0005 / sec
  • question-answering

We present a BERT-based language model called bert-large-uncased-whole-word-masking-squad2, trained on the SQuAD2.0 dataset for extractive question answering. The model achieves high scores on exact match and F1 metrics.

deepset/minilm-uncased-squad2 cover image
$0.0005 / sec
  • question-answering

Microsoft's MiniLM-L12-H384-uncased language model achieved state-of-the-art results on the SQuAD 2.0 question-answering benchmark, with exact match and F1 scores of 76.13% and 79.54%, respectively. The model was trained on the SQuAD 2.0 dataset using a batch size of 12, learning rate of 4e-5, and 4 epochs. The authors suggest using their model as a starting point for building large language models for downstream NLP tasks.

deepset/roberta-base-squad2 cover image
$0.0005 / sec
  • question-answering

A pre-trained language model based on RoBERTa, fine-tuned on the SQuAD2.0 dataset for extractive question answering. It achieved scores of 79.87% exact match and 82.91% F1 score on the SQuAD2.0 dev set. Deepset is the company behind the open-source NLP framework Haystack, and offers other resources such as Distilled roberta-base-squad2, German BERT, and GermanQuAD datasets and models.

deepset/roberta-base-squad2-covid cover image
$0.0005 / sec
  • question-answering

We present a RoBERTa-based question answering model called roberta-base-squad2 for extractive QA on COVID-19 related texts. The model was trained on the SQuAD-style CORD-19 annotations and achieved promising results on 5-fold cross-validation.

deepset/roberta-large-squad2 cover image
$0.0005 / sec
  • question-answering

This is the roberta-large model, fine-tuned using the SQuAD2.0 dataset.

deepset/tinyroberta-squad2 cover image
$0.0005 / sec
  • question-answering

Deepset presents tinyroberta-squad2, a distilled version of their roberta-base-squad2 model that achieves similar performance while being faster. The model is trained on SQuAD 2.0 and uses Haystack's infrastructure with 4x Tesla V100 GPUs. It achieved 78.69% exact match and 81.92% F1 score on the SQuAD 2.0 dev set.

distilbert-base-cased-distilled-squad cover image
$0.0005 / sec
  • question-answering

The DistilBERT model is a small, fast, cheap, and lightweight Transformer model trained by distilling BERT base. It has 40% fewer parameters than the original BERT model and runs 60% faster, preserving over 95% of BERT's performance. The model was fine-tuned using knowledge distillation on the SQuAD v1.1 dataset and achieved a F1 score of 87.1 on the dev set.

distilbert-base-multilingual-cased cover image
$0.0005 / sec
  • fill-mask

The DistilBERT model is a distilled version of the BERT base multilingual model, trained on 104 languages and featuring 6 layers, 768 dimensions, and 12 heads. It is designed for masked language modeling and next sentence prediction tasks, with potential applications in natural language processing and downstream tasks. However, it should not be used to intentionally create hostile or alienating environments for people, and users should be aware of its risks, biases, and limitations.

distilbert-base-uncased cover image
$0.0005 / sec
  • fill-mask

DistilBERT is a smaller, faster, and cheaper version of BERT, a popular language model. It was trained on the same data as BERT, including BookCorpus and English Wikipedia, but with a few key differences in the preprocessing and training procedures. Despite its smaller size, DistilBERT achieve's similar results to BERT on various natural language processing tasks.

distilbert-base-uncased-distilled-squad cover image
$0.0005 / sec
  • question-answering

DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark. This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned using (a second step of) knowledge distillation on SQuAD v1.1.

distilbert-base-uncased-finetuned-sst-2-english cover image
$0.0005 / sec
  • text-classification

DistilBERT-base-uncased-finetuned-sst-2-english achieved an accuracy of 0.91 on the glue dataset, with a loss of 0.39 and an F1 score of 0.91. The model was trained on the sst2 dataset with a configuration of default and a split of train.

distilroberta-base cover image
$0.0005 / sec
  • fill-mask

DistilRoBERTa is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimensions, and 12 heads, totaling 82M parameters. It is trained on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset, and achieves comparable performance to RoBERTa while being twice as fast. The model is designed for masked language modeling and can be fine-tuned for downstream tasks, but it also comes with potential biases and limitations, including significant gender and ethnicity biases in its predictions.

dmis-lab/biobert-base-cased-v1.2 cover image
$0.0005 / sec
  • fill-mask

BioBERT is a pre-trained biomedical language representation model for biomedical text mining based on original BERT and trained by DMIS-LAB

dslim/bert-base-NER cover image
$0.0005 / sec
  • token-classification

The bert-base-NER model is a fine-tuned BERT model that achieves state-of-the-art performance on the CoNLL-2003 Named Entity Recognition task. It was trained on the English version of the standard CoNLL-2003 dataset and recognizes four types of entities: location, organization, person, and miscellaneous. The model occasionally tags subword tokens as entities and post-processing of results may be necessary to handle these cases.

dslim/bert-large-NER cover image
$0.0005 / sec
  • token-classification

A fine-tuned BERT model that achieves state-of-the-art performance on the CoNLL-2003 Named Entity Recognition task. The model was trained on the English version of the standard CoNLL-2003 dataset and distinguishes between four types of entities: location, organization, person, and miscellaneous.