[VAT] Virtual Adversarial Training
{, }
Paper: https://arxiv.org/abs/1704.03976
Code:
1) Motivation, Objectives and Related Works:
Motivation:
Objectives:
We propose a new regularization method based on virtual adversarial loss: a new measure of local smoothness of the conditional label distribution given input.
Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation.
Unlike adversarial training, our method defines the adversarial direction without label information and is hence applicable to semi-supervised learning. Because the directions in which we smooth the model are only "virtually" adversarial, we call our method virtual adversarial training (VAT).
The computational cost of VAT is relatively low. For neural networks, the approximated gradient of virtual adversarial loss can be computed with no more than two pairs of forward- and back-propagations.
In our experiments, we applied VAT to supervised and semi-supervised learning tasks on multiple benchmark datasets. With a simple enhancement of the algorithm based on the entropy minimization principle, our VAT achieves state-of-the-art performance for semi-supervised learning tasks on SVHN and CIFAR-10.
Related Works:
Contribution:
It uses the concept of adversarial attack for consistency regularization.
The key idea is to generate an adversarial transformation of an image that will change the model prediction. To do so:
First, an image is taken and an adversarial variant of it is created such that the KL-divergence between the model output for the original image and the adversarial image is maximized.
Then, we take a labeled/unlabeled image as the first view and take its adversarial example generated in the previous step as the second view. Then, the same model is used to predict label distributions for both images. The KL-divergence of these two predictions is used as a consistency loss. For labeled images, we also calculate the cross-entropy loss. The final loss is a weighted sum of these two loss terms. A weight a is applied to decide how much the consistency loss contributes to the overall loss.
2) Methodology:
Method 1:
Method 2:
3) Experimental Results:
Experimental Results:
Ablations:
References:
n2 n0
θ