ByteTrack -
{Tracking-by-detection}
{Tracking-by-detection}
0) Motivation, Object and Related works:
Motivation:
Objectives:
ByteTrack exploits YOLOX, a recent object detector in the YOLO series, to detect persons. Their YOLOX models were not only trained on the MOT17 [18] or MOT20 [19] training sets but also on large-scale person datasets including CrowdHuman [20], CityPerson [21], and ETHZ [22] to make their detectors to be able to detect occluded persons.
Another key contribution of this research is that it proposes a simple but effective object association algorithm called BYTE that aims to associate every detection box. Similar to SORT, the BYTE algorithm requires a motion model, which is a Kalman filter, to predict the location of existing tracklets in the next frame and it uses IOU-based distance for object association. However, the BYTE algorithm gives priority to detection boxes with high confidence scores. These detection boxes are first associated with existing tracklets using the Hungarian algorithm. The remaining detection boxes with lower confidence are associated later on, instead of being thrown away. This simple but effective strategy allows ByteTrack to associate occluded objects, which normally have lower confidence, to boost up the recall while maintaining the association precision.
ByteTrack surprisingly achieved state-of-the-art performance without the use of any appearance model. The reason for not using any appearance model is that it aims at associating every detection box, including those with low confidence which are usually corrupted by severe occlusion or motion blur, resulting in non-reliable visual appearance features.