microsoft/beit-base-patch16-224-pt22k-ft22k cover image

microsoft/beit-base-patch16-224-pt22k-ft22k

The BEiT model is a Vision Transformer (ViT) pre-trained on ImageNet-21k, a dataset of 14 million images and 21,841 classes, using a self-supervised approach. The model was fine-tuned on the same dataset and achieved state-of-the-art performance on various image classification benchmarks. The BEiT model uses relative position embeddings and mean-pools the final hidden states of the patch embeddings for classification.

The BEiT model is a Vision Transformer (ViT) pre-trained on ImageNet-21k, a dataset of 14 million images and 21,841 classes, using a self-supervised approach. The model was fine-tuned on the same dataset and achieved state-of-the-art performance on various image classification benchmarks. The BEiT model uses relative position embeddings and mean-pools the final hidden states of the patch embeddings for classification.

Public
$0.0005/sec

HTTP/cURL API

 

Input fields

imagestring

image to classify


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema


© 2023 Deep Infra. All rights reserved.

Discord Logo