[MoCov2]
{}
Paper:
Code:
{}
Paper:
Code:
Relative to SimCLR, MoCo v2 manages to both decrease the batch size (from 4096 to 256) and improve the performance. Unlike SimCLR, where the top and bottom row in the diagram represent the same network (parameterized by θ), MoCo splits the single network into an online network (top row) parameterized by θ and a momentum network (bottom row) parameterized by ξ.
The online network is updated by stochastic gradient descent, while the momentum network is updated based on an exponential moving average of the online network weights.
The momentum network allows MoCo to efficiently use a memory bank of past projections as negative examples for the contrastive loss. This memory bank is what enables the much smaller batch sizes. In our dog image illustration, the positive examples would be crops of the same image of a dog. The negative examples are completely different images that were used in past mini-batches, projections of which are stored in the memory bank.
The MLP used for projection in MoCo v2 does not use batch normalization.