microsoft/beit-base-patch16-224-pt22k-ft22k

The BEiT model is a Vision Transformer (ViT) pre-trained on ImageNet-21k, a dataset of 14 million images and 21,841 classes, using a self-supervised approach. The model was fine-tuned on the same dataset and achieved state-of-the-art performance on various image classification benchmarks. The BEiT model uses relative position embeddings and mean-pools the final hidden states of the patch embeddings for classification.

Public

$0.0005 / sec

demoversions

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -F image=@my_image.jpg  \
    'https://api.deepinfra.com/v1/inference/microsoft/beit-base-patch16-224-pt22k-ft22k'

which will give you back something similar to:

{
  "results": [
    {
      "label": "Maltese dog, Maltese terrier, Maltese",
      "score": 0.9235488176345825
    },
    {
      "label": "Lhasa, Lhasa apso",
      "score": 0.0298430435359478
    }
  ],
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

Input fields

`image`string

image to classify

`webhook`file

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema

Latest Models

bigcode/

starcoder2-15b

openchat/

openchat_3.5

openai/

whisper-tiny

Gryphe/

MythoMax-L2-13b

Phind/

Phind-CodeLlama-34B-v2

Featured Models

microsoft/

WizardLM-2-8x22B

cognitivecomputations/

dolphin-2.6-mixtral-8x7b

lizpreciatior/

lzlv_70b_fp16_hf

mistralai/

Mixtral-8x7B-Instruct-v0.1

BAAI/

bge-large-en-v1.5

google/

gemma-1.1-7b-it

Company

Pricing

Docs

Compare

DeepStart

About

Privacy

Terms