Tracktor -
{Tracking-by-detection}
{Tracking-by-detection}
0) Motivation, Object and Related works:
Motivation:
Objectives:
Tracktor exploits a regression head in an object detector, such as Faster RCNN, to perform tracking. In many state-of-the-art object detectors, this regression is used to refine the location of detected bounding boxes. As shown in Fig. 7, Tracktor simply places the bounding box of a target object in the previous frame t−1 on the current frame t to get feature maps. These feature maps are then fed into the regression head to refine the object’s location, which becomes the new location of that target object in the frame t. In this process, the object’s identity is automatically transferred from frame t−1 to frame t. However, it relies on an assumption that target objects move only slightly between frames, which may hold in case of high frame rate sequences.
To account for new target objects, Tracktor runs the detector on the current frame t and finds detection boxes that do not or slightly overlap with existing tracklets. These detection boxes are used to initialize new tracklets.
Two extensions to the original, vanilla Tracktor were also presented in the paper. The extended Tracktor is named Tracktor++. The first extension is the use of a motion model to handle the case of low frame rate sequences and the case of moving cameras. In which cases, a bounding box in frame t−1 might not overlap with its target object in frame t at all; consequently, the regression head does not have any clue to refine its location correctly. A motion model can help compensate for these errors, leading to more robust tracking. The second extension is to exploit a ReID model to verify if the target object in frame t−1 really matches its refined location in frame t. If not, this tracklet of the target object is deactivated.
Even though Tracktor++ does not require any tracking-specific training, it could establish itself as a strong baseline for single-camera person tracking during the time it was proposed.