ECCV-2020 Papers Collection in Computer Vision

Conference Link: [link]
Paper Link: [link]
Topics:
1. Recognition, Detection, Segmentation, and Pose Estimation.
2. Semi-Supervised, Unsupervised, Transfer, Representation & Few-Shot Learning.
3. 3D Computer Vision & Robotics.
4. Image and Video Synthesis.
5. Vision and Language.
6. The Rest.

1. Recognition, Detection, Segmentation and Pose Estimation:

- End-to-End Object Detection with Transformers [Paper] [Code]
- MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution [Paper] [Code]
- Gradient Centralization: A New Optimization Technique for Deep Neural Networks [Paper] [Code]
- Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval [Paper]
- Hybrid Models for Open Set Recognition [Paper]
- Conditional Convolutions for Instance Segmentation [Paper]
- Multitask Learning Strengthens Adversarial Robustness [Paper]
- Dynamic Group Convolution for Accelerating Convolutional Neural Networks [Paper]
- Disentangled Non-local Neural Networks [Paper]
- Hard negative examples are hard, but useful [Paper]
- Volumetric Transformer Networks [Paper]
- Faster AutoAugment: Learning Augmentation Strategies Using Backpropagation [Paper]
- A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses [Paper]
- Semantic Flow for Fast and Accurate Scene Parsing [Paper]
- Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation [Paper] [Code]
- Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification [Paper]
- Feature Normalized Knowledge Distillation for Image Classification [Paper] [Code]
- AutoMix: Mixup Networks for Sample Interpolation via Cooperative Barycenter Learning [Paper]
- OnlineAugment: Online Data Augmentation with Less Domain Knowledge [Paper] [Code]
- Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets [Paper] [Code]
- DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning [Paper]
- Estimating People Flows to Better Count Them in Crowded Scenes [Paper]
- SoundSpaces: Audio-Visual Navigation in 3D Environments [Paper]
- Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation [Paper]
- DADA: Differentiable Automatic Data Augmentation [Paper]
- URIE: Universal Image Enhancement for Visual Recognition in the Wild [Paper]
- BorderDet: Border Feature for Dense Object Detection [Paper] [Code]
- TIDE: A General Toolbox for Understanding Errors in Object Detection [Paper] [Code]
- AABO: Adaptive Anchor Box Optimization for Object Detection via Bayesian Sub-sampling [Paper]
- PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments [Paper]
- Learning Object Depth from Camera Motion and Video Object Segmentation [Paper]
- Attentive Normalization [Paper]
- Momentum Batch Normalization for Deep Learning with Small Batch Size [Paper]
- A Simple Way to Make Neural Networks Robust Against Diverse Image Corruptions [Paper]

2. Semi-Supervised, Unsupervised, Transfer, Representation & Few-Shot Learning

- Big Transfer (BiT): General Visual Representation Learning [Paper]
- Learning Visual Representations with Caption Annotations [Paper]
- Memory-augmented Dense Predictive Coding for Video Representation Learning [Paper]
- SCAN: Learning to Classify Images without Labels [Paper]
- GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering [Paper]
- Associative Alignment for Few-shot Image Classification [Paper]
- Domain Adaptation through Task Distillation [Paper]
- Are Labels Necessary for Neural Architecture Search? [Paper]
- The Hessian Penalty: A Weak Prior for Unsupervised Disentanglement [Paper]
- Cross-Domain Cascaded Deep Translation [Paper]
- Self-Challenging Improves Cross-Domain Generalization [Paper]
- Label Propagation with Augmented Anchors: A Simple Semi-Supervised Learning baseline for Unsupervised Domain Adaptation [Paper]
- Regularization with Latent Space Virtual Adversarial Training [Paper]
- Transporting Labels via Hierarchical Optimal Transport for Semi-Supervised Learning [Paper]
- Negative Margin Matters: Understanding Margin in Few-shot Classification [Paper]
- Rethinking Few-Shot Image Classification: a Good Embedding Is All You Need? [Paper] [Code]
- Prototype Rectification for Few-Shot Learning [Paper]

3. 3D Computer Vision & Robotics

- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [Paper]
- Towards Streaming Perception [Paper]
- Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images [Paper]
- Convolutional Occupancy Networks [Paper]
- Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping [Paper]
- Privacy Preserving Structure-from-Motion [Paper]
- Multiview Detection with Feature Perspective Transformation [Paper]
- Motion Capture from Internet Videos [Paper]
- Atlas: End-to-End 3D Scene Reconstruction from Posed Images [Paper]
- Generative Sparse Detection Networks for 3D Single-shot Object Detection [Paper]
- PointTriNet: Learned Triangulation of 3D Point Sets [Paper]
- Points2Surf: Learning Implicit Surfaces from Point Cloud Patches [Paper]
- Geometric Capsule Autoencoders for 3D Point Clouds [Paper]
- Deep Feedback Inverse Problem Solver [Paper]
- Single View Metrology in the Wild [Paper]
- Shape and Viewpoint without Keypoints [Paper]
- Hierarchical Kinematic Human Mesh Recovery [Paper]
- 3D Human Shape and Pose from a Single Low-Resolution Image with Self-Supervised Learning [Paper]
- Few-Shot Single-View 3D Object Reconstruction with Compositional Priors [Paper]
- NASA: Neural Articulated Shape Approximation [Paper]
- Hand-Transformer: Non-Autoregressive Structured Modeling for 3D Hand Pose Estimation [Paper]
- Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild [Paper]

4. Image and Video Synthesis

- Transforming and Projecting Images into Class-conditional Generative Networks [Paper]
- Contrastive Learning for Unpaired Image-to-Image Translation [Paper]
- Rewriting a Deep Generative Model [Paper]
- Learning Stereo from Single Images [Paper]
- What makes fake images detectable? Understanding properties that generalize [Paper]
- Free View Synthesis [Paper]
- Unselfie: Translating Selfies to Neutral-pose Portraits in the Wild [Paper]
- World-Consistent Video-to-Video Synthesis [Paper]
- RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval [Paper]
- Generating Videos of Zero-Shot Compositions of Actions and Objects [Paper]
- Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild [Paper]
- Across Scales & Across Dimensions: Temporal Super-Resolution using Deep Internal Learning [Paper]
- Conditional Entropy Coding for Efficient Video Compression [Paper]
- Semantic View Synthesis [Paper]
- Learning Camera-Aware Noise Models [Paper]
- In-Domain GAN Inversion for Real Image Editing [Paper]

5. Vision Languages

- Connecting Vision and Language with Localized Narratives [Paper]
- UNITER: UNiversal Image-TExt Representation Learning [Paper]
- Learning to Learn Words from Visual Scenes [Paper]
- Contrastive Learning for Weakly Supervised Phrase Grounding [Paper]
- Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments [Paper]
- Adaptive Text Recognition through Visual Matching [Paper]

6. Others

- Deep Learning: Applications, Methodology, and Theory
  - A Generic Visualization Approach for Convolutional Neural Networks [Paper]
  - Spike-FlowNet: Event-based Optical Flow Estimation [Paper]
  - A Metric Learning Reality Check [Paper]
  - Learning Predictive Models from Observation and Interaction [Paper]
  - Beyond Fixed Grid: Learning Geometric Image Representation with a Deformable Grid [Paper]
  - Stable Low-rank Tensor Decomposition for Compression of Convolutional Neural Network [Paper]
  - EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning [Paper]
  - Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs [Paper]
  - Event-based Asynchronous Sparse Convolutional Networks [Paper]
- Low level vision, Motion and Tracking
  - RAFT: recurrent all pairs field transforms for optical flow [Paper]
  - VisualEchoes: Spatial Image Representation Learning through Echolocation [Paper]
  - Self-Supervised Learning of Audio-Visual Objects from Video [Paper]
  - Tracking Objects as Points
- Face, Gesture, and Body Pose
  - Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach [Paper]
  - Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues [Paper]
  - Lifespan Age Transformation Synthesis [Paper]
  - Monocular Expressive Body Regression through Body-Driven Attention [Paper]
  - DLow: Diversifying Latent Flows for Diverse Human Motion Prediction [Paper]
  - Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars [Paper]
  - Blind Face Restoration via Deep Multi-scale Component Dictionaries [Paper]
- Action Recognition, Understanding
  - RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition [Paper]
  - Self-supervised Video Representation Learning by Pace Prediction [Paper]
  - Aligning Videos in Space and Time [Paper]
  - Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video [Paper]
  - Foley Music: Learning to Generate Music from Videos [Paper]

Reference:

[link]

Page updated

Google Sites

Report abuse

ECCV-2020 Papers Collection in Computer Vision

About Me: