Sort - Simple Online Realtime Object Tracking

{Tracking-by-detection}

Motivation, Object and Related Works

Adopts Faster Region-CNN (FrRCNN), a two-stage object detector, to locate objects in each video frame.
- Each detected object is represented by a state vector consisting of the center location, scale, and aspect ratio of its bounding box in the current frame t as well as its velocity and the rate of change in its scale.
- This state vector is approximated using a linear constant velocity model based on the information of the current frame t and the previous frame t−1.
Kalman filter is used as a motion model to predict the state of each object in the next frame t+1.
- Once detected objects in frame t+1 are obtained, each is compared with the predicted state of each existing object in frame t to compute their IOU distance.
- Object association is then performed frame-by-frame using the Hungarian algorithm, which aims to minimize the cost of the bipartite matching between newly detected objects in frame t+1 and the existing objects.
- A minimum IOU threshold is also used to filter out poor matches. If a detection box is associated with an existing object, its state is then updated using the Kalman filter.