We use essential cookies to make our site work. With your consent, we may also use non-essential cookies to improve user experience and analyze website traffic…

🚀 New model available: DeepSeek-V3.1 🚀

openai/

clip-vit-base-patch32

The CLIP model was developed by OpenAI to investigate the robustness of computer vision models. It uses a Vision Transformer architecture and was trained on a large dataset of image-caption pairs. The model shows promise in various computer vision tasks but also has limitations, including difficulties with fine-grained classification and potential biases in certain applications.

Public
$0.0005 / sec
openai/clip-vit-base-patch32 cover image

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -F image=@my_image.jpg  \
    -F 'candidate_labels=["cat","dog"]'  \
    'https://api.deepinfra.com/v1/inference/openai/clip-vit-base-patch32'
copy

which will give you back something similar to:

{
  "results": [
    {
      "label": "dog",
      "score": 0.9
    },
    {
      "label": "cat",
      "score": 0.1
    }
  ],
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

copy

Input fields

Input Schema

Output Schema