[YOLOv5] Glenn, J. Yolov5-6.1— TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference.

Ultralytics

{CBS, C3, SPPF}

Ref: https://www.mdpi.com/1424-8220/22/15/5817

Code: https://github.com/ultralytics/yolov5

Motivation, Objectives and Related Works

Motivation

YOLO Series.

Objectives

YOLOv5 (v6.0/6.1) is a powerful object detection algorithm developed by Ultralytics.
1. (New) CSP-Darknet53 + (New) CSP-PAN + YOLOv3 Head.
2. Focus layer => 6x6 Conv2d.
3. SPP => SPPF.

Related Works

Figure. YOLOv5l

Model

Architecture

Figure. YOLOv5: Overall Architecture

Data Augmentation - Albumentations

Mosaic Augmentation.
Copy Paste
Random Affine (Rotation, Scale x0.5 - 1.5, Translation, Shear)
MixUp
Augment HSV (Hue, Saturation, Value)
Random Horizontal Flip

Mosaic

Copy Paste

Random Affine

Random Horizontal Flip

Augment HSV

MixUp Augmentation

Backbone

New CSPDarknet53.
1. 1. Start with 6x6 Conv2D (instead of Focus layer - space-to-depth operation).
  2. Stacking of multiple CBS (Conv + BatchNorm + SILU) modules and C3 modules.
  3. SPPF module is connected at the end.

Figure. YOLOv5: Model Architecture

Figure. Parameters of YOLOv5 Backbone.

CBS module is used to assist C3 module in feature extraction, while SPPF module enhances the feature expression ability of the backbone.
SPPF avoided the repeated operation of SPP as in SPPNet, by max pooling the previous max pooled features.

Figure. Structure of SPPF

class SPPF(nn.Module):

def __init__(self):

super().__init__()

self.maxpool = nn.MaxPool2d(5, 1, padding=2)

def forward(self, x):

o1 = self.maxpool(x)

o2 = self.maxpool(o1)

o3 = self.maxpool(o2)

return torch.cat([x, o1, o2, o3], dim=1)

Neck

New CSP-PAN.

Figure. Current Necks. (a) Without Feature Fusion, (b) FPN (+ top-down) and (c) PAN (+ bottom-up).

Figure. New CSP-PAN (Within Dashed Box)

Head

The process of adjusting the center coordinate and size of the preset prior anchor to the center coordinate and size of the final prediction box.
The upper left corner coordinate of the feature map is set to (0, 0).
rx and ry are the unadjusted coordinates of the predicted center point (grid cell coordinates).
gx, gy, gw, gh represent the information of the adjusted prediction box (final output coordinates).
pw and ph are for the information of the prior anchor.
sx and sy represent the offsets calculated by the model.

Build Targets

Match positive samples.
Calculate the aspect ratio of GT and Anchor Templates.

Assign the successfully matched Anchor Templates to the corresponding cells:

Because the center point offset range is adjusted from (0, 1) to (-0.5, 1.5).
GT Box can be assigned to more anchors.

Variants

There are 5 versions of YOLOv5: YOLOv5x, YOLOv5l, YOLOv5m, YOLOv5s, and YOLOv5n.

There are 5 larger versions: YOLOv5x6, YOLOv5l6, YOLOv5m6, YOLOv5s6, and YOLOv5n6.

Loss Function

Loss Categories

- Detection: CIoU Loss
- Objecness Score: BCE Loss
- Classification: BCE Loss

Balance Losses

- The objectness losses of the three prediction layers (P3, P4, P5) are weighted differently.
- The balance weights are [4.0, 1.0, 0.4] respectively.
- The balancing terms were based on the obj losses seen at the 3 output layers. We simply averaged them over a few epochs of early training and set them to their present values. The smaller output layers have more imbalance than the larger object output layers in COCO, and probably in many other datasets as well, but performance will naturally vary by dataset, so I'm not sure if the balancing helps or hurts. This question is interrelated to the number of output layers, and the anchors used. [Ref]

Training Strategy

Multi-scale training: The input images are randomly rescaled within a range of (0.5~1.5x) of their original size.
AutoAnchor (For training custom data): Optimize the prior anchor boxes to match the statistical characteristics of the GT boxes in custom data. (Kmeans + Genetic Algorithm)
Warmup and Cosine LR scheduler: Adjust the learning rate to enhance model performance.
EMA (Exponential Moving Average): A strategy that uses the average of parameters over past steps to stabilize the training process and reduce generalization error.
- """Model Exponential Moving Average from https://github.com/rwightman/pytorch-image-models
- Keep a moving average of everything in the model state_dict (parameters and buffers). This is intended to allow functionality like: https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage. A smoothed version of the weights is necessary for some training schemes to perform well.
- This class is sensitive where it is initialized in the sequence of model init, GPU assignment, and distributed training wrappers.
Mixed precision: A method to perform operations in half-precision format, reducing memory usage and enhancing computational speed.
Evolve hyper-parameters using Genetic Algorithm: Automatically tune hyperparameters to achieve optimal performance.

Experimental Results

Dataset

MSCOCO.

Metrics

Experimental Results

All YOLOv5 larger models outperform EfficientDet by a large margin.

Key Takeaways

Eliminate Grid Sensitivity

In YOLOv2 and YOLOv3, the formula for calculating the predicted target information is:

In YOLOv5, the formula is:

Compare the center point offset before and after scaling.
The center point offset range is adjusted from (0, 1) to (-0.5, 1.5).
Therefore, the offset can easily get 0 or 1.

Compare the height and width scaling ratio (relative to anchor) before and after adjustment, the original Yolo-darknet box equations have a serious flaw.
Width and Height are completely unbounded as they are simply out=exp(in), which is dangerous, as it can lead to runaway gradients, instabilities, NaN losses, and ultimately a complete loss of training. refer this issue.

Tip to get good Results:
1. https://docs.ultralytics.com/yolov5/tutorials/tips_for_best_training_results/#model-selection
2. https://karpathy.github.io/2019/04/25/recipe/

ClearML
1. https://docs.ultralytics.com/yolov5/tutorials/clearml_logging_integration/#training-yolov5-with-clearml
2. https://www.youtube.com/watch?v=MX3BrXnaULs

References

- n2 n0
- θ

Page updated

Google Sites

Report abuse

[YOLOv5] Glenn, J. Yolov5-6.1— TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference.

Motivation, Objectives and Related Works

Motivation

Objectives

Related Works

Model

Architecture

Data Augmentation - Albumentations

Backbone

Neck

Head

Build Targets

Variants

Loss Function

Loss Categories

Balance Losses

Training Strategy

Experimental Results

Dataset

Metrics

Experimental Results

Key Takeaways

Eliminate Grid Sensitivity

References

About Me: