[UDA] Unsupervised Data Augmentation
{, }
1) Motivation, Objectives and Related Works:
Motivation:
Semi-supervised learning lately has shown much promise in improving deep learning models when labeled data is scarce.
Common among recent approaches is the use of consistency training on a large amount of unlabeled data to constrain model predictions to be invariant to input noise.
Objectives:
In this work, we present a new perspective on how to effectively noise unlabeled examples and argue that the quality of noising, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning.
By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, our method brings substantial improvements across six languages and three vision tasks under the same consistency training framework.
On the IMDb text classification dataset, with only 20 labeled examples, our method achieves an error rate of 4.20, outperforming the state-of-the-art model trained on 25,000 labeled examples. On a standard semi-supervised learning benchmark, CIFAR-10, our method outperforms all previous approaches and achieves an error rate of 5.43 with only 250 examples. Our method also combines well with transfer learning, e.g., when finetuning from BERT, and yields improvements in high-data regime, such as ImageNet, whether when there is only 10% labeled data or when a full labeled set with 1.3M extra unlabeled examples is used.
Related Works:
Contribution:
This method works for both images and text. Here, we will understand the method in the context of images.
The key idea is to create an augmented version of an unlabeled image using AutoAugment. Then, the same model is used to predict the label of both these images.
The KL-divergence of these two predictions is used as a consistency loss.
For labeled images, we only calculate the cross-entropy loss and don’t calculate any consistency loss.
The final loss is a weighted sum of these two loss terms. A weight w(t) is applied to decide how much the consistency loss contributes in the overall loss.
2) Methodology:
Method 1:
Method 2:
3) Experimental Results:
Experimental Results:
Ablations:
References:
n2 n0
θ