Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

distilbert-base-multilingual-cased cover image
$0.0005 / sec
  • fill-mask

The DistilBERT model is a distilled version of the BERT base multilingual model, trained on 104 languages and featuring 6 layers, 768 dimensions, and 12 heads. It is designed for masked language modeling and next sentence prediction tasks, with potential applications in natural language processing and downstream tasks. However, it should not be used to intentionally create hostile or alienating environments for people, and users should be aware of its risks, biases, and limitations.

distilbert-base-uncased cover image
$0.0005 / sec
  • fill-mask

DistilBERT is a smaller, faster, and cheaper version of BERT, a popular language model. It was trained on the same data as BERT, including BookCorpus and English Wikipedia, but with a few key differences in the preprocessing and training procedures. Despite its smaller size, DistilBERT achieve's similar results to BERT on various natural language processing tasks.

distilbert-base-uncased-distilled-squad cover image
$0.0005 / sec
  • question-answering

DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT base. It has 40% less parameters than bert-base-uncased, runs 60% faster while preserving over 95% of BERT's performances as measured on the GLUE language understanding benchmark. This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned using (a second step of) knowledge distillation on SQuAD v1.1.

distilbert-base-uncased-finetuned-sst-2-english cover image
$0.0005 / sec
  • text-classification

DistilBERT-base-uncased-finetuned-sst-2-english achieved an accuracy of 0.91 on the glue dataset, with a loss of 0.39 and an F1 score of 0.91. The model was trained on the sst2 dataset with a configuration of default and a split of train.

distilroberta-base cover image
$0.0005 / sec
  • fill-mask

DistilRoBERTa is a distilled version of the RoBERTa-base model, with 6 layers, 768 dimensions, and 12 heads, totaling 82M parameters. It is trained on OpenWebTextCorpus, a reproduction of OpenAI's WebText dataset, and achieves comparable performance to RoBERTa while being twice as fast. The model is designed for masked language modeling and can be fine-tuned for downstream tasks, but it also comes with potential biases and limitations, including significant gender and ethnicity biases in its predictions.

dmis-lab/biobert-base-cased-v1.2 cover image
$0.0005 / sec
  • fill-mask

BioBERT is a pre-trained biomedical language representation model for biomedical text mining based on original BERT and trained by DMIS-LAB

dslim/bert-base-NER cover image
$0.0005 / sec
  • token-classification

The bert-base-NER model is a fine-tuned BERT model that achieves state-of-the-art performance on the CoNLL-2003 Named Entity Recognition task. It was trained on the English version of the standard CoNLL-2003 dataset and recognizes four types of entities: location, organization, person, and miscellaneous. The model occasionally tags subword tokens as entities and post-processing of results may be necessary to handle these cases.

dslim/bert-large-NER cover image
$0.0005 / sec
  • token-classification

A fine-tuned BERT model that achieves state-of-the-art performance on the CoNLL-2003 Named Entity Recognition task. The model was trained on the English version of the standard CoNLL-2003 dataset and distinguishes between four types of entities: location, organization, person, and miscellaneous.

emilyalsentzer/Bio_ClinicalBERT cover image
$0.0005 / sec
  • fill-mask

The Bio+Clinical BERT model, is initialized from BioBERT and trained on all MIMIC notes. The model was pre-trained using a rules-based section splitter and Sentispacy tokenizer, with a batch size of 32, max sequence length of 128, and learning rate of 5·10^-5 for 150,000 steps.

emilyalsentzer/Bio_Discharge_Summary_BERT cover image
$0.0005 / sec
  • fill-mask

The Bio+Discharge Summary BERT model, initialized from BioBERT and trained on only discharge summaries from MIMIC, is described. The model was pre-trained using a rules-based section splitter and SentencePiece tokenizer, with a batch size of 32, maximum sequence length of 128, and learning rate of 5·10^-5 for 150,000 steps.

google/flan-t5-large cover image
$0.0005 / sec
  • text2text-generation

The FLAN-T5 large language model is a variant of the T5 model, trained on a mix of tasks and fine-tuned on over 1000 additional tasks covering multiple languages. It achieved state-of-the-art results on several benchmarks, including few-shot learning tasks, and demonstrates improved performance and usability compared to its predecessor.

google/flan-t5-small cover image
$0.0005 / sec
  • text2text-generation

FLAN-T5 is a family of instructor-finetuned T5 models scaling up to 540B parameters. They are trained on more than 1000 tasks across over 100 diverse domains and cover multiple languages. FLAN-T5 demonstrates superiority over its predecessor T5 in various NLP tasks while being computationally efficient.

google/flan-t5-xl cover image
$0.0005 / sec
  • text2text-generation

Fine tuned T5 model on collection of datasets phrased as instructions

google/flan-t5-xxl cover image
$0.0005 / sec
  • text2text-generation

Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language model.

google/vit-base-patch16-224 cover image
$0.0005 / sec
  • image-classification

The Vision Transformer (ViT) is a transformer encoder model pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieving state-of-the-art results in image classification. The model presents images as a sequence of fixed-size patches and adds a CLS token for classification tasks. The authors recommend using fine-tuned versions of the model for specific tasks.

google/vit-base-patch16-384 cover image
$0.0005 / sec
  • image-classification

The Vision Transformer (ViT) model, pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieves state-of-the-art results on image classification tasks. The model uses a transformer encoder architecture and presents images as a sequence of fixed-size patches, adding a [CLS] token for classification tasks. The pre-trained model can be used for downstream tasks such as extracting features and training standard classifiers.

gpt2 cover image
1k
$0.0005 / sec
  • text-generation

GPT-2 is a transformer-based language model developed by OpenAI that utilizes a causal language modeling (CLM) objective. It was trained on a 40GB dataset called WebText, which consists of texts from various websites, excluding Wikipedia. Without fine-tuning, GPT-2 achieved impressive zero-shot results on several benchmark datasets such as LAMBADA, CBT-CN, CBT-NE, WikiText2, PTB, enwiki8, and text8.