CCT: Compact Convolutional Transformer
Paper: https://arxiv.org/abs/2104.05704
Paper: https://arxiv.org/abs/2104.05704
0) Motivation, Object and Related works:
Motivation:
Objectives:
The CCT tokenizer:
The first recipe introduced by the CCT authors is the tokenizer for processing the images.
In a standard ViT, images are organized into uniform non-overlapping patches. This eliminates the boundary-level information present in between different patches.
This is important for a neural network to effectively exploit the locality information.
The figure below presents an illustration of how images are organized into patches.
References:
https://keras.io/examples/vision/cct/