Sort - Simple Online Realtime Object Tracking
{Tracking-by-detection}
{Tracking-by-detection}
Adopts Faster Region-CNN (FrRCNN), a two-stage object detector, to locate objects in each video frame.
Each detected object is represented by a state vector consisting of the center location, scale, and aspect ratio of its bounding box in the current frame t as well as its velocity and the rate of change in its scale.
This state vector is approximated using a linear constant velocity model based on the information of the current frame t and the previous frame t−1.
Kalman filter is used as a motion model to predict the state of each object in the next frame t+1.
Once detected objects in frame t+1 are obtained, each is compared with the predicted state of each existing object in frame t to compute their IOU distance.
Object association is then performed frame-by-frame using the Hungarian algorithm, which aims to minimize the cost of the bipartite matching between newly detected objects in frame t+1 and the existing objects.
A minimum IOU threshold is also used to filter out poor matches. If a detection box is associated with an existing object, its state is then updated using the Kalman filter.
Detect: Detect objects in frames
Predict: Predict new objects based on previous frames.
Associate: Link detected positions with predicted positions for giving IDs.