CCT: Compact Convolutional Transformer

Paper: https://arxiv.org/abs/2104.05704

0) Motivation, Object and Related works:

Motivation:

Objectives:

The CCT tokenizer:

The first recipe introduced by the CCT authors is the tokenizer for processing the images.
In a standard ViT, images are organized into uniform non-overlapping patches. This eliminates the boundary-level information present in between different patches.
This is important for a neural network to effectively exploit the locality information.
The figure below presents an illustration of how images are organized into patches.

References:

Page updated

Google Sites

Report abuse

About Me: