YOLOX: Exceeding YOLO Series in 2021
{Decoupled Heads; Advanced Label Assignment Strategies; Anchor-free; Sim OTA; MNS free)
{Decoupled Heads; Advanced Label Assignment Strategies; Anchor-free; Sim OTA; MNS free)
Present some experienced improvements to YOLO series, forming a new high-performance detector — YOLOX.
Switch the YOLO detector to an anchor-free manner.
Conduct other advanced detection techniques:
Decoupled head
The leading label assignment strategy SimOTA.
Achieve state-of-the-art results across a large scale range of models.
Regression: IoU Loss
Classification/ Objectness: BCE Loss
Random Horizon Flip
Color Jitter
Mosaic
Mixup
By using an anchor-free mechanism, the number of hyperparameters can be reduced.
In order to make YOLO anchor-free, the amount of inference per grid was reduced from 3 to 1, and the offset value from the top left of the grid and the height and width of the bbox were output.
To reduce the extreme imbalance between positives / negatives when training, instead of only selecting 1 positive sample at the center location for each object, they assign the center 3x3 as the positives.
This strategy is called "center sampling" in FCOS.
The performance of the detector improves after this modification.
Label assignment here is to assign what is positive/negative training samples for each groundtruth object.
In anchor-based object detectors, they often calculate Intersect-Over-Union (IoU) between each groundtruth box with all anchorboxes to decide which anchorboxes are positive sample and which are negative samples.
Anchor-free methods like FCOS treat the center/bbox region of any gt object as corresponding positives.
These strategies could not leverage all object properties for pos/neg assignment. Some dynamic assignment methods have been proposed.
OTA models the label assignment as an optimal transport problem and uses Sinkhorn-Knopp Iteration algorithm to solve and find the best assignment.
However, in the original OTA, Sinkhorn-Knopp Iteration algorithm brings 25% extra training time.
YOLOX simplifies to dynamic top-k strategy. First, it calculates the pair-wise matching degree for each prediction-gt pair. The cost between gt gi and prediction pj is: cij = Lclsij + λLregij
λ is balancing coefficient.
Lclsij and Lregij are classification and regression loss.
For gi, select top k predictions with the least cost within a fixed center region as its positive samples. Note that k varies for different gt.