We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New models by Bria.ai, generate and edit images at scale 🚀

sentence-transformers/

clip-ViT-B-32

$0.005

/ 1M tokens

The CLIP model maps text and images to a shared vector space, enabling various applications such as image search, zero-shot image classification, and image clustering. The model can be used easily after installation, and its performance is demonstrated through zero-shot ImageNet validation set accuracy scores. Multilingual versions of the model are also available for 50+ languages.

Public
77
sentence-transformers/clip-ViT-B-32 cover image

Input

inputs
You can add more items with the button on the right

You need to login to use this model

Login

Settings

ServiceTier

The service tier used for processing the request. When set to 'priority', the request will be processed with higher priority.

Normalize

whether to normalize the computed embeddings

Dimensions

The number of dimensions in the embedding. If not provided, the model's default will be used.If provided bigger than model's default, the embedding will be padded with zeros. (Default: empty, 32 ≤ dimensions ≤ 8192)

Custom Instruction

Custom instruction prepending to each input. If empty, no instruction will be used.. (Default: empty)

Output

[
  [
    0,
    0.5,
    1
  ],
  [
    1,
    0.5,
    0
  ]
]
Model Information

clip-ViT-B-32

This is the Image & Text model CLIP, which maps text and images to a shared vector space. For applications of the models, have a look in our documentation SBERT.net - Image Search

Performance

In the following table we find the zero-shot ImageNet validation set accuracy:

ModelTop 1 Performance
clip-ViT-B-3263.3
clip-ViT-B-1668.1
clip-ViT-L-1475.4

For a multilingual version of the CLIP model for 50+ languages have a look at: clip-ViT-B-32-multilingual-v1