openai/clip-vit-base-patch32 cover image

openai/clip-vit-base-patch32

The CLIP model was developed by OpenAI to investigate the robustness of computer vision models. It uses a Vision Transformer architecture and was trained on a large dataset of image-caption pairs. The model shows promise in various computer vision tasks but also has limitations, including difficulties with fine-grained classification and potential biases in certain applications.

The CLIP model was developed by OpenAI to investigate the robustness of computer vision models. It uses a Vision Transformer architecture and was trained on a large dataset of image-caption pairs. The model shows promise in various computer vision tasks but also has limitations, including difficulties with fine-grained classification and potential biases in certain applications.

Public
$0.0005 / sec

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -F image=@my_image.jpg  \
    -F 'candidate_labels=["cat","dog"]'  \
    'https://api.deepinfra.com/v1/inference/openai/clip-vit-base-patch32'

which will give you back something similar to:

{
  "results": [
    {
      "label": "dog",
      "score": 0.9
    },
    {
      "label": "cat",
      "score": 0.1
    }
  ],
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

Input fields

imagestring

image to classify


candidate_labelsarray

list of labels to guess from


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema