We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

sentence-transformers/

clip-ViT-B-32-multilingual-v1

This model is a multilingual version of the OpenAI CLIP-ViT-B32 model, which maps text and images to a common dense vector space. It includes a text embedding model that works for 50+ languages and an image encoder from CLIP. The model was trained using Multilingual Knowledge Distillation, where a multilingual DistilBERT model was trained as a student model to align the vector space of the original CLIP image encoder across many languages.

Public
$0.005 / Mtoken
512
sentence-transformers/clip-ViT-B-32-multilingual-v1 cover image

Unlock the most affordable AI hosting

Run models at scale with our fully managed GPU infrastructure, delivering enterprise-grade uptime at the industry's best rates.