hustvl/yolos-small

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieves 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R-CNN, despite its small size. The model uses a bipartite matching loss to compare predicted classes and bounding boxes to ground truth annotations. Fine-tuned on COCO 2017 object detection dataset consisting of 118k/5k annotated images for training/validation, it achieved 36.1 AP on validation set.

Public

$0.0005 / sec

demoversions

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -H "Authorization: bearer $DEEPINFRA_TOKEN"  \
    -F image=@my_image.jpg  \
    'https://api.deepinfra.com/v1/inference/hustvl/yolos-small'

which will give you back something similar to:

{
  "results": [
    {
      "score": 0.9939407110214233,
      "label": "remote",
      "box": {
        "xmin": 46,
        "ymin": 72,
        "xmax": 181,
        "ymax": 119
      }
    },
    {
      "score": 0.983637809753418,
      "label": "cat",
      "box": {
        "xmin": 12,
        "ymin": 54,
        "xmax": 319,
        "ymax": 470
      }
    }
  ],
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

Input fields

`image`string

image to detect objects in

`webhook`file

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema

Latest Models

Phind/

Phind-CodeLlama-34B-v2

openai/

whisper-tiny

Gryphe/

MythoMax-L2-13b

openchat/

openchat_3.5

bigcode/

starcoder2-15b

Featured Models

meta-llama/

Meta-Llama-3-70B-Instruct

meta-llama/

Llama-2-7b-chat-hf

microsoft/

WizardLM-2-8x22B

openchat/

openchat_3.5

deepinfra/

airoboros-70b

stability-ai/

sdxl

Company

Pricing

Docs

Compare

DeepStart

About

Privacy

Terms