hustvl/yolos-small cover image

hustvl/yolos-small

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieves 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R-CNN, despite its small size. The model uses a bipartite matching loss to compare predicted classes and bounding boxes to ground truth annotations. Fine-tuned on COCO 2017 object detection dataset consisting of 118k/5k annotated images for training/validation, it achieved 36.1 AP on validation set.

The YOLOS model, a Vision Transformer (ViT) trained using the DETR loss, achieves 42 AP on COCO validation 2017, similar to DETR and more complex frameworks like Faster R-CNN, despite its small size. The model uses a bipartite matching loss to compare predicted classes and bounding boxes to ground truth annotations. Fine-tuned on COCO 2017 object detection dataset consisting of 118k/5k annotated images for training/validation, it achieved 36.1 AP on validation set.

Public
$0.0005/sec

Input

Please upload an image file

You need to login to use this model

Output

 


© 2023 Deep Infra. All rights reserved.

Discord Logo