google/vit-base-patch16-224 cover image

google/vit-base-patch16-224

The Vision Transformer (ViT) is a transformer encoder model pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieving state-of-the-art results in image classification. The model presents images as a sequence of fixed-size patches and adds a CLS token for classification tasks. The authors recommend using fine-tuned versions of the model for specific tasks.

The Vision Transformer (ViT) is a transformer encoder model pre-trained on ImageNet-21k and fine-tuned on ImageNet, achieving state-of-the-art results in image classification. The model presents images as a sequence of fixed-size patches and adds a CLS token for classification tasks. The authors recommend using fine-tuned versions of the model for specific tasks.

Public
$0.0005/sec
demoapi

2ddc9d4e473d7ba52128f0df4723e478fa14fb80

2023-04-29T01:03:41+00:00


© 2023 Deep Infra. All rights reserved.

Discord Logo