[YOLO-NAS]
DECI's Super Gradient
{QSP, QCI, Re-Parameterization, 8-Bit Quantization}
Paper:
Code: https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md
DECI's Super Gradient
{QSP, QCI, Re-Parameterization, 8-Bit Quantization}
Paper:
Code: https://github.com/Deci-AI/super-gradients/blob/master/YOLONAS.md
Object detection has revolutionized how machines perceive and interpret the world around them.
Developing a new YOLO-based architecture can redefine state-of-the-art (SOTA) object detection by addressing the existing limitations and incorporating recent advancements in deep learning.
Imagine a new YOLO-based architecture that could enhance your ability to detect small objects, improve localization accuracy, and increase the performance-per-compute ratio, making the model more accessible for real-time edge-device applications.
The use of QSP and QCI blocks combine re-parameterization and 8-bit quantization advantages. These blocks allow for minimal accuracy loss during post-training quantization.
AutoNAC (Deci’s), was used to determine optimal sizes and structures of stages, including block type, number of blocks, and number of channels in each stage.
A hybrid quantization method that selectively quantizes certain parts of a model, reducing information loss and balancing latency and accuracy. Standard quantization affects all model layers, often leading to significant accuracy loss. Our hybrid method optimizes quantization to maintain accuracy by only quantizing certain layers while leaving others untouched.
Our layer selection algorithm considers each layer’s impact on accuracy and latency, as well as the effects of switching between 8-bit and 16-bit quantization on overall latency.
A pre-training regimen that includes automatically labeled data, self-distillation, and large datasets.
The YOLO-NAS architecture is available under an open-source license. Its’ pre-trained weights are available for research use (non-commercial) on SuperGradients, Deci’s PyTorch-based, open-source, computer vision training library.
Figure 3. Efficiency Frontier plot for object detection on the COCO2017 dataset (validation) comparing YOLO-NAS vs other YOLO architectures.
YOLO-NAS’s multi-phase training process involves pre-training on Object365, COCO Pseudo-Labeled data, Knowledge Distillation (KD), and Distribution Focal Loss (DFL).
The model is pre-trained on Objects365 for 25-40 epochs (depending on the model variant) due to the extensive time needed for each epoch (each epoch takes 50-80 minutes on 8 NVIDIA RTX A5000 GPUs).
An <accurate model> is trained on COCO to label these images, which are then used to train our model with the original 118k train images.
The YOLO-NAS architecture also incorporates Knowledge Distillation (KD) and Distribution Focal Loss (DFL) to enhance its training process.
Knowledge Distillation is applied by adding a KD term to the loss function, enabling the student network to mimic the logits of both classification and DFL predictions of the teacher network.
DFL is employed by learning box regression as a classification task, discretizing box predictions into finite values, and predicting distributions over these values, which are then converted to final predictions through a weighted sum.
Objects365, a comprehensive dataset with 2 million images and 365 categories.
COCO dataset provides an additional 123k unlabeled images, which are used to generate pseudo-labelled data.
RoboFlow100 dataset (RF100), a collection of 100 datasets from diverse domains, to demonstrate its ability to handle complex object detection tasks
Install Library:
!pip install super_gradients
n2 n0
θ