GCNet: Global Context Network
{, }
0) Motivation, Object and Related works:
Motivation:
Objectives:
In order to capture long-distance dependencies, two types of methods have been created:
The self-attention mechanism is used to model the relationship between query pairs.
Model query-independent (can be understood as query-free) global context .
NLNet uses a self-attention mechanism to model the pixel pair relationship. However, NLNet learns an attention map that is not position-dependent for each position, causing a lot of computational waste.
SENet uses the global context to weight the different channels to adjust the channel dependence. However, the feature fusion using weight recalibration cannot make full use of the global context.
The GCNet author experimented with NLNet, selected 6 images in the COCO data set, and visualized the Attention maps for different query points , and obtained the following results:
It can be seen that for different query points, the attention map is almost the same, which shows that NLNet learns query-independent dependency , which shows that although NLNet wants to perform specific The global context is calculated, but the visualization results and experimental data prove that the global context of the non-local network is almost the same in different positions, which indicates that the global context without position dependence has been learned.
Based on the above findings, the author hopes to reduce unnecessary calculations and calculations, and combined with the SEnet design, proposes that GCNet combines the advantages of the two, which can not only use the global context modeling capabilities of NLNet, but also be as lightweight as SEnet .
Simplified Non-local module:
The author simplified the non-local block by calculating a global attention map , and shared this global attention map for all locations . The simplified non-local block is defined as:
In order to further reduce the amount of calculation of the simplified non-local block, Wv is moved outside the attention pooling , expressed as:
Different from the original non-local block, the second item of the simplified non-local block is not location dependent, and all locations share this item. Therefore, the author directly models the global context as the weighted average of all location features, and then aggregates the global context features to the features of each location .
The simplified non-local block can be abstracted into 3 steps:
Global attention pooling : 1x1 convolution Wk and softmax functions are used to obtain attention weights, and then attention pooling is performed to obtain global context features.
Feature conversion : 1x1 convolution Wv is used.
Feature aggregation : use the addition operation to aggregate the global context features to the features at each location.
SE module:
As mentioned earlier, the SE block is shown in the figure below, which can also be abstracted into 3 steps:
Global average pooling is used for context modeling (ie squeeze operation).
The bottleneck transform is used to calculate the importance of each channel (excitation operation).
The rescaling function is used for channel feature recalibration (ie element-wise multiplication).
GCNet module:
The author proposes a new global context modeling framework , global context block (GCNet for short), which can establish effective long-distance dependencies like a non-local block and save computation like an SE block.
The 3 steps of GC block are:
Global attention pooling is used for context modeling .
bottleneck transform to capture inter-channel dependencies .
The broadcast element-wise addition is used for feature fusion .
In the simplified non-local block, the transform module has a large number of parameters. In order to obtain the advantages of the light weight of the SE block, the 1x1 convolution is replaced by the bottleneck transform module , which can significantly reduce the amount of parameters (where r is the reduction rate). Because the two-layer bottleneck transform increases the difficulty of optimization, a layer normalization layer is added in front of ReLU (reducing the difficulty of optimization and improving generalization as a regular).