hustvl/yolos-small cover image

hustvl/yolos-small

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieves 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R-CNN, despite its small size. The model uses a bipartite matching loss to compare predicted classes and bounding boxes to ground truth annotations. Fine-tuned on COCO 2017 object detection dataset consisting of 118k/5k annotated images for training/validation, it achieved 36.1 AP on validation set.

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieves 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R-CNN, despite its small size. The model uses a bipartite matching loss to compare predicted classes and bounding boxes to ground truth annotations. Fine-tuned on COCO 2017 object detection dataset consisting of 118k/5k annotated images for training/validation, it achieved 36.1 AP on validation set.

Public
$0.0005 / sec

HTTP/cURL API

You can use cURL or any other http client to run inferences:

curl -X POST \
    -H "Authorization: bearer $(deepctl auth token)"  \
    -F image=@my_image.jpg  \
    'https://api.deepinfra.com/v1/inference/hustvl/yolos-small'

which will give you back something similar to:

{
  "results": [
    {
      "score": 0.9939407110214233,
      "label": "remote",
      "box": {
        "xmin": 46,
        "ymin": 72,
        "xmax": 181,
        "ymax": 119
      }
    },
    {
      "score": 0.983637809753418,
      "label": "cat",
      "box": {
        "xmin": 12,
        "ymin": 54,
        "xmax": 319,
        "ymax": 470
      }
    }
  ],
  "request_id": null,
  "inference_status": {
    "status": "unknown",
    "runtime_ms": 0,
    "cost": 0.0,
    "tokens_generated": 0,
    "tokens_input": 0
  }
}

Input fields

imagestring

image to detect objects in


webhookfile

The webhook to call when inference is done, by default you will get the output in the response of your inference request

Input Schema

Output Schema