Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/all

emilyalsentzer/Bio_ClinicalBERT cover image
$0.0005 / sec
  • fill-mask

The Bio+Clinical BERT model, is initialized from BioBERT and trained on all MIMIC notes. The model was pre-trained using a rules-based section splitter and Sentispacy tokenizer, with a batch size of 32, max sequence length of 128, and learning rate of 5·10^-5 for 150,000 steps.

emilyalsentzer/Bio_Discharge_Summary_BERT cover image
$0.0005 / sec
  • fill-mask

The Bio+Discharge Summary BERT model, initialized from BioBERT and trained on only discharge summaries from MIMIC, is described. The model was pre-trained using a rules-based section splitter and SentencePiece tokenizer, with a batch size of 32, maximum sequence length of 128, and learning rate of 5·10^-5 for 150,000 steps.

google/vit-base-patch16-224 cover image
$0.0005 / sec
  • image-classification

The Vision Transformer (ViT) is a transformer encoder model pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieving state-of-the-art results in image classification. The model presents images as a sequence of fixed-size patches and adds a CLS token for classification tasks. The authors recommend using fine-tuned versions of the model for specific tasks.

google/vit-base-patch16-384 cover image
$0.0005 / sec
  • image-classification

The Vision Transformer (ViT) model, pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieves state-of-the-art results on image classification tasks. The model uses a transformer encoder architecture and presents images as a sequence of fixed-size patches, adding a [CLS] token for classification tasks. The pre-trained model can be used for downstream tasks such as extracting features and training standard classifiers.

hfl/chinese-bert-wwm-ext cover image
$0.0005 / sec
  • fill-mask

Chinese pre-trained BERT with Whole Word Masking, which can be used for various NLP tasks such as question answering, sentiment analysis, named entity recognition, etc. This work is based on the original BERT model but with additional whole word masking techniques to improve its performance on out-of-vocabulary words.

hfl/chinese-roberta-wwm-ext cover image
$0.0005 / sec
  • fill-mask

We present Chinese pre-trained BERT with Whole Word Masking, which is an extension of the original BERT model tailored for Chinese natural language processing tasks. This variant uses whole word masking instead of subword tokenization to improve performance on out-of-vocabulary words and enhance language understanding capabilities.

huggingface/CodeBERTa-small-v1 cover image
$0.0005 / sec
  • fill-mask

CodeBERTa is a RoBERTa-like model trained on the CodeSearchNet dataset from GitHub. Supported languages: go, java, javascript, php, python, ruby.

hustvl/yolos-base cover image
$0.0005 / sec
  • object-detection

Vision Transformer (ViT) trained for object detection

hustvl/yolos-small cover image
$0.0005 / sec
  • object-detection

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieves 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R-CNN, despite its small size. The model uses a bipartite matching loss to compare predicted classes and bounding boxes to ground truth annotations. Fine-tuned on COCO 2017 object detection dataset consisting of 118k/5k annotated images for training/validation, it achieved 36.1 AP on validation set.

hustvl/yolos-tiny cover image
$0.0005 / sec
  • object-detection

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieved an AP of 28.7 on COCO 2017 validation, outperforming other state-of-the-art object detection models while being much simpler. It was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. The model uses a bipartite matching loss and is trained using standard cross-entropy and a linear combination of the L1 and generalized IoU loss.

intfloat/e5-base-v2 cover image
512
$0.005 / Mtoken
  • embeddings

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Model has 24 layers and 1024 out dim.

intfloat/e5-large-v2 cover image
512
$0.010 / Mtoken
  • embeddings

Text Embeddings by Weakly-Supervised Contrastive Pre-training. Model has 24 layers and 1024 out dim.

jackaduma/SecBERT cover image
$0.0005 / sec
  • fill-mask

SecBERT is a pretrained language model for cyber security text, trained on a dataset of papers from various sources, including APTnotes, Stucco-Data, and CASIE. The model has its own wordpiece vocabulary, secvocab, and is available in two versions, SecBERT and SecRoBERTa. The model can improve downstream tasks such as NER, text classification, semantic understanding, and Q&A in the cyber security domain.

klue/bert-base cover image
$0.0005 / sec
  • fill-mask

The KLUE BERT base is a pre-trained BERT model on Korean Language. It was developed by the Facebook AI Research Lab and is licensed under cc-by-sa-4.0. The model can be used for various tasks like topic classification, semantic textual similarity, natural language inference, named entity recognition, and others.

meta-llama/Llama-2-13b-chat-hf cover image
4k
$0.13 / Mtoken
  • text-generation

Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext cover image
$0.0005 / sec
  • fill-mask

PubMedBERT is a pretrained language model specifically designed for biomedical natural language processing tasks. It was trained from scratch using abstracts and full-text articles from PubMed and PubMedCentral, and achieved state-of-the-art performance on various biomedical NLP tasks.

microsoft/beit-base-patch16-224-pt22k-ft22k cover image
$0.0005 / sec
  • image-classification

The BEiT model is a Vision Transformer (ViT) pre-trained on ImageNet-21k, a dataset of 14 million images and 21,841 classes, using a self-supervised approach. The model was fine-tuned on the same dataset and achieved state-of-the-art performance on various image classification benchmarks. The BEiT model uses relative position embeddings and mean-pools the final hidden states of the patch embeddings for classification.

microsoft/codebert-base-mlm cover image
$0.0005 / sec
  • fill-mask

A pre-trained language model designed to handle both programming languages and natural languages. With a multi-task learning framework that includes masked language modeling, next sentence prediction, and replaced token detection, CodeBERT achieves state-of-the-art results on various code understanding tasks while also performing well on natural language processing benchmarks. We analyze the effects of different design choices and provide insights into the behavior of CodeBERT, demonstrating its potential as a versatile tool for a wide range of applications involving both coding and natural language understanding.