Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:
Search

Category/image-classification

Image classification AI models are a type of artificial intelligence (AI) system that can analyze and categorize objects within images. Image classification AI models have many applications, including facial recognition, medical imaging, autonomous vehicles, and image search engines.

To train an image classification AI model, a large dataset of labeled images is required. During training, the model learns to recognize patterns and features within the images and associate them with the appropriate class labels. Once the model is trained, it can be used to classify new, unseen images based on their features and patterns.

However, image classification AI models do have some limitations. They rely heavily on high-quality, labeled training data, and can be susceptible to bias and adversarial attacks. It is important to address these limitations and ensure that the models are fair, transparent, and secure.

Image classification AI models offer many benefits, such as their ability to rapidly and accurately process and analyze large volumes of visual data. As the amount of visual data continues to grow, image classification AI models will become increasingly important for a variety of industries, including healthcare, retail, and manufacturing.

google/vit-base-patch16-224 cover image
$0.0005 / sec
  • image-classification

The Vision Transformer (ViT) is a transformer encoder model pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieving state-of-the-art results in image classification. The model presents images as a sequence of fixed-size patches and adds a CLS token for classification tasks. The authors recommend using fine-tuned versions of the model for specific tasks.

google/vit-base-patch16-384 cover image
$0.0005 / sec
  • image-classification

The Vision Transformer (ViT) model, pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieves state-of-the-art results on image classification tasks. The model uses a transformer encoder architecture and presents images as a sequence of fixed-size patches, adding a [CLS] token for classification tasks. The pre-trained model can be used for downstream tasks such as extracting features and training standard classifiers.

microsoft/beit-base-patch16-224-pt22k-ft22k cover image
$0.0005 / sec
  • image-classification

The BEiT model is a Vision Transformer (ViT) pre-trained on ImageNet-21k, a dataset of 14 million images and 21,841 classes, using a self-supervised approach. The model was fine-tuned on the same dataset and achieved state-of-the-art performance on various image classification benchmarks. The BEiT model uses relative position embeddings and mean-pools the final hidden states of the patch embeddings for classification.

microsoft/resnet-50 cover image
$0.0005 / sec
  • image-classification

Resnet model pre-trained on ImageNet-1k at resolution 224x224 for image classification