Browse deepinfra models:

All categories and models you can try out and directly use in deepinfra:

Viewing all

featured

text-generation

text-to-image

automatic-speech-recognition

embeddings

token-classification

fill-mask

text-classification

question-answering

image-classification

object-detection

custom

zero-shot-image-classification

Category/image-classification

Image classification AI models are a type of artificial intelligence (AI) system that can analyze and categorize objects within images. Image classification AI models have many applications, including facial recognition, medical imaging, autonomous vehicles, and image search engines.

To train an image classification AI model, a large dataset of labeled images is required. During training, the model learns to recognize patterns and features within the images and associate them with the appropriate class labels. Once the model is trained, it can be used to classify new, unseen images based on their features and patterns.

However, image classification AI models do have some limitations. They rely heavily on high-quality, labeled training data, and can be susceptible to bias and adversarial attacks. It is important to address these limitations and ensure that the models are fair, transparent, and secure.

Image classification AI models offer many benefits, such as their ability to rapidly and accurately process and analyze large volumes of visual data. As the amount of visual data continues to grow, image classification AI models will become increasingly important for a variety of industries, including healthcare, retail, and manufacturing.

$0.0005 / sec

google/

vit-base-patch16-224

image-classification

The Vision Transformer (ViT) is a transformer encoder model pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieving state-of-the-art results in image classification. The model presents images as a sequence of fixed-size patches and adds a CLS token for classification tasks. The authors recommend using fine-tuned versions of the model for specific tasks.

$0.0005 / sec

google/

vit-base-patch16-384

image-classification

The Vision Transformer (ViT) model, pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieves state-of-the-art results on image classification tasks. The model uses a transformer encoder architecture and presents images as a sequence of fixed-size patches, adding a [CLS] token for classification tasks. The pre-trained model can be used for downstream tasks such as extracting features and training standard classifiers.

$0.0005 / sec

microsoft/

beit-base-patch16-224-pt22k-ft22k

image-classification

The BEiT model is a Vision Transformer (ViT) pre-trained on ImageNet-21k, a dataset of 14 million images and 21,841 classes, using a self-supervised approach. The model was fine-tuned on the same dataset and achieved state-of-the-art performance on various image classification benchmarks. The BEiT model uses relative position embeddings and mean-pools the final hidden states of the patch embeddings for classification.

$0.0005 / sec

microsoft/

resnet-50

image-classification

Resnet model pre-trained on ImageNet-1k at resolution 224x224 for image classification

Latest Models

bigcode/

starcoder2-15b

openchat/

openchat_3.5

openai/

whisper-tiny

Gryphe/

MythoMax-L2-13b

Phind/

Phind-CodeLlama-34B-v2

Featured Models

openchat/

openchat_3.5

google/

gemma-1.1-7b-it

meta-llama/

Llama-2-7b-chat-hf

cognitivecomputations/

dolphin-2.6-mixtral-8x7b

HuggingFaceH4/

zephyr-orpo-141b-A35b-v0.1

mistralai/

Mixtral-8x7B-Instruct-v0.1

Company

Pricing

Docs

DeepStart

About

Privacy

Terms