hustvl/yolos-tiny cover image

hustvl/yolos-tiny

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieved an AP of 28.7 on COCO 2017 validation, outperforming other state-of-the-art object detection models while being much simpler. It was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. The model uses a bipartite matching loss and is trained using standard cross-entropy and a linear combination of the L1 and generalized IoU loss.

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieved an AP of 28.7 on COCO 2017 validation, outperforming other state-of-the-art object detection models while being much simpler. It was pre-trained on ImageNet-1k and fine-tuned on COCO 2017 object detection, a dataset consisting of 118k/5k annotated images for training/validation respectively. The model uses a bipartite matching loss and is trained using standard cross-entropy and a linear combination of the L1 and generalized IoU loss.

Public
$0.0005 / sec
demoapi

3686e65df0c914833fc8cbeca745a33b374c499b

2023-04-29T00:26:21+00:00