The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieved an AP of 28.7 on COCO 2017 validation, outperforming other state-of-the-art object detection models while being much simpler. It was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. The model uses a bipartite matching loss and is trained using standard cross-entropy and a linear combination of the L1 and generalized IoU loss.
The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieved an AP of 28.7 on COCO 2017 validation, outperforming other state-of-the-art object detection models while being much simpler. It was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. The model uses a bipartite matching loss and is trained using standard cross-entropy and a linear combination of the L1 and generalized IoU loss.
You can use cURL or any other http client to run inferences:
curl -X POST \
-H "Authorization: bearer $DEEPINFRA_TOKEN" \
-F image=@my_image.jpg \
'https://api.deepinfra.com/v1/inference/hustvl/yolos-tiny'
which will give you back something similar to:
{
"results": [
{
"score": 0.9939407110214233,
"label": "remote",
"box": {
"xmin": 46,
"ymin": 72,
"xmax": 181,
"ymax": 119
}
},
{
"score": 0.983637809753418,
"label": "cat",
"box": {
"xmin": 12,
"ymin": 54,
"xmax": 319,
"ymax": 470
}
}
],
"request_id": null,
"inference_status": {
"status": "unknown",
"runtime_ms": 0,
"cost": 0.0,
"tokens_generated": 0,
"tokens_input": 0
}
}
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request