microsoft/beit-base-patch16-224-pt22k-ft22k cover image

microsoft/beit-base-patch16-224-pt22k-ft22k

The BEiT model is a Vision Transformer (ViT) pre-trained on ImageNet-21k, a dataset of 14 million images and 21,841 classes, using a self-supervised approach. The model was fine-tuned on the same dataset and achieved state-of-the-art performance on various image classification benchmarks. The BEiT model uses relative position embeddings and mean-pools the final hidden states of the patch embeddings for classification.

The BEiT model is a Vision Transformer (ViT) pre-trained on ImageNet-21k, a dataset of 14 million images and 21,841 classes, using a self-supervised approach. The model was fine-tuned on the same dataset and achieved state-of-the-art performance on various image classification benchmarks. The BEiT model uses relative position embeddings and mean-pools the final hidden states of the patch embeddings for classification.

Public
$0.0005 / sec
demoapi

9da301148150e37e533abef672062fa49f6bda4f

2023-04-29T01:04:45+00:00