YOLOX: Exceeding YOLO Series in 2021

{Decoupled Heads; Advanced Label Assignment Strategies; Anchor-free; Sim OTA; MNS free)

Paper: https://arxiv.org/pdf/2107.08430v2.pdf

Code: https://github.com/%20Megvii-BaseDetection/YOLOX.

Motivation, Object and Related Works

Motivation

Present some experienced improvements to YOLO series, forming a new high-performance detector — YOLOX.

Objectives - YOLOX

1. Switch the YOLO detector to an anchor-free manner.
2. Conduct other advanced detection techniques:
  - Decoupled head
  - The leading label assignment strategy SimOTA.
3. Achieve state-of-the-art results across a large scale range of models.

Methods

Improvements

Decoupled Head

1. - Regression: IoU Loss
  - Classification/ Objectness: BCE Loss

Strong Data Augmentation

1. - Random Horizon Flip
  - Color Jitter
  - Mosaic
  - Mixup

Anchor-free

1. - By using an anchor-free mechanism, the number of hyperparameters can be reduced.
  - In order to make YOLO anchor-free, the amount of inference per grid was reduced from 3 to 1, and the offset value from the top left of the grid and the height and width of the bbox were output.

Multi-positive

1. - To reduce the extreme imbalance between positives / negatives when training, instead of only selecting 1 positive sample at the center location for each object, they assign the center 3x3 as the positives.
  - This strategy is called "center sampling" in FCOS.
  - The performance of the detector improves after this modification.

SimOTA (Advanced Label Assignment Strategies)

1. - Label assignment here is to assign what is positive/negative training samples for each groundtruth object.
    - In anchor-based object detectors, they often calculate Intersect-Over-Union (IoU) between each groundtruth box with all anchorboxes to decide which anchorboxes are positive sample and which are negative samples.
    - Anchor-free methods like FCOS treat the center/bbox region of any gt object as corresponding positives.
  - These strategies could not leverage all object properties for pos/neg assignment. Some dynamic assignment methods have been proposed.
    - OTA models the label assignment as an optimal transport problem and uses Sinkhorn-Knopp Iteration algorithm to solve and find the best assignment.
    - However, in the original OTA, Sinkhorn-Knopp Iteration algorithm brings 25% extra training time.
  - YOLOX simplifies to dynamic top-k strategy. First, it calculates the pair-wise matching degree for each prediction-gt pair. The cost between gt gi and prediction pj is: cij = Lclsij + λLregij
    - λ is balancing coefficient.
    - Lclsij and Lregij are classification and regression loss.
    - For gi, select top k predictions with the least cost within a fixed center region as its positive samples. Note that k varies for different gt.

Experimental Results

References

Page updated

Google Sites

Report abuse