DeepInfra raises $107M Series B to scale the inference cloud — read the announcement
sentence-transformers/
$0.005
/ 1M tokens
This model is a multilingual version of the OpenAI CLIP-ViT-B32 model, which maps text and images to a common dense vector space. It includes a text embedding model that works for 50+ languages and an image encoder from CLIP. The model was trained using Multilingual Knowledge Distillation, where a multilingual DistilBERT model was trained as a student model to align the vector space of the original CLIP image encoder across many languages.
200b64f20b3cef15ade0d31b1392519a46024087
2023-03-03T02:52:46+00:00
© 2026 DeepInfra. All rights reserved.