The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieved an AP of 28.7 on COCO 2017 validation, outperforming other state-of-the-art object detection models while being much simpler. It was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. The model uses a bipartite matching loss and is trained using standard cross-entropy and a linear combination of the L1 and generalized IoU loss.
The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieved an AP of 28.7 on COCO 2017 validation, outperforming other state-of-the-art object detection models while being much simpler. It was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. The model uses a bipartite matching loss and is trained using standard cross-entropy and a linear combination of the L1 and generalized IoU loss.
webhook
fileThe webhook to call when inference is done, by default you will get the output in the response of your inference request