Search this site
Embedded Files
Lê Phong Phú
  • About Me!
  • AI Expert Roadmap
    • PyTorch
      • PyTorch Fundamentals
        • 1. Introduction to PyTorch
        • 2. Introduction to Computer Vision with PyTorch
        • 3. Introduction to Natural Language Processing with PyTorch
        • 4. Introduction to Audio Classification with PyTorch
      • Intermediate DL with Pytorch
        • 1_TrainingRobustNN
        • 2_Image&CNN
        • 3_Sequences&RNN
        • 4_Multi-Input&Multi-Output
    • Machine Learning
      • 01_ML_General
      • 02_ML_Supervised Learning
      • 03_ML_Unsupervised Learning
    • Mamba
      • 00_Sequence Modelling, S4 and Mamba
    • Transformers (CV&NLP)
      • NLNet
      • 01_Pure Transformer
        • ViT
        • Segformer
      • 02_Hybrid Transformer
        • DETR
        • Deformable DETR
        • DINO (Detection)
      • 99_Unfilter
        • LG-Transformer
        • Image GPT
        • Points as Queries
        • VST
        • MAXViT
        • ViTMAE-Detect
        • MAGNETO
        • AIT
        • MTV
        • PiT
        • Swin
        • PVTv2
        • PVT
        • FAVOR+
        • T2T-ViT
        • CaiT
        • CCT
        • DeiT
        • SSA
        • SA3D
    • [NLP] Natural Language Processing
      • 01_[LLMs] Large Language Models
      • [MoEs] Mixture of Experts
      • LLM Techniques
      • Attention is All You Need
      • Positional Encoding
      • Tokenization
      • MICLe
    • [CV] Computer Vision
      • MLP-based Classification
        • MLP-Mixer
        • FNet
        • EANet
      • 01_[SL] Supervised Learning
        • 01_Classification
          • Convolution Variants
          • 1x1 Convolution
          • EfficientNetV2
          • ConvNeXtV2
        • 02_Detection
          • ConvMixer
          • SOLO
          • YOLOX
          • YOLOR
          • AugFPN
          • BoT_Cls
          • BoF_OD
          • YOLOv3
          • YOLOv4
          • YOLOv5
          • YOLOv6
          • YOLOv7
          • YOLOv8
          • YOLOv9
          • YOLO-NAS
          • TPH-YOLOv5
          • TPH-YOLOv5++
          • ViTDET
        • 03_Segmentation
          • Object Instance Survey 2022
          • 01_Instance Segmentation
          • 02_Semantic Segmentation
          • 03_Panoptic Segmentation
          • 04_3D Segmentation
          • 05_Unsupervised Segmentation
          • BMask RCNN
          • ISTR
          • Transfuse
        • 04_[IS] Interactive Segmentation
          • Interactive Segmentation Techniques
          • 02_3D Interactive Segmentation
          • 03_Video Object Segmentation
          • SAM
          • HA_SAM
          • CFR-ICL
          • MST
          • ECONet
          • SimpleClick
          • FocusCut
          • f-BRS
          • iSegformer
        • 05_Object Tracking
          • 00_ObjectTracking
          • Sort
          • DeepSort
          • FairMOT
          • ByteTrack
          • StrongSORT
          • Tracktor
          • JDE
          • CenterTrack
          • PermaTrack
          • TransTrack
          • TrackFormer
          • BoT-SORT
        • 06_Face Recognition
        • 07_Image Stitching
        • 08_Image Restoration
        • 06_Refinement
          • BPR
        • 10_Scene Understanding
          • CPNet
        • 11_Human Pose Estimation
          • 3D Human Pose
          • Human Pose
        • 12_[SR] Super Resolution
          • Bicubic++
        • 13_VideoPropagation
        • 14_Image Mating
        • 15_Knowledge Distillation
        • 16_Others
      • 02_[UL] Unsupervised Learning
        • 00_Unsupervised Learning
        • 02_Deep Clustering
          • 00_K_Clusters Decision
          • Deep Cluster
          • Cluster Fit
          • DEC
          • Improving Relational Regularized Autoencoders with Spherical Sliced Fused G
          • Taxanomy
          • DeepDPM
          • BCL
          • VaDE
          • t-SNE
          • Tree-SNE
        • 04_Diffusional Models
      • 03_[SSL] Self-Supervised Learning
        • 00_Self-Supervised Learning
        • 01_Contrastive Learning
          • CPC
          • DIM
          • CMC
          • AMDIM
          • SimCLR
          • MoCo
          • MoCov2
          • YADIM
          • VICReg
          • CSL
          • Towards Domain-Agnostic Contrastive Learning
          • Non-Parametric Instance Discrimination
          • Video Contrastive Learning with Global Context
          • SupCon
          • Barlow Twin
        • 02_Predictive Tasks
        • 03_Bootstrapping
          • BYOL
        • 04_Regularization
        • 05_Masked Image Models
          • Patch Localization
          • MAE
          • SimMIM
          • DINO
        • 06_Pretext Tasks
          • PIRL
        • 07_Clustering-based
          • SwAV
      • 04_Semi-Supervised Learning
        • Fully-/Semi-/Weakly-/ Learning
        • 01_Self-training
          • Pseudo-label
          • Noisy Student
        • 02_Consistency Regularization
          • Temporal Ensembling
          • Mean Teacher
          • VAT
          • UDA
        • 03_Hybrid Methods
          • MixUp
          • MixMatch
          • ReMixMatch
          • FixMatch
          • FixMatch (unmerge)
      • 05_Multi-learning Paradigm
        • 00_Multi-learning
        • 01_Multitask
        • Gradient Surgery
        • EtE Multi-task Learning with Attention
        • MTL for Dense Predictions
        • MTL using Uncertainty
        • Which Task learned together
        • GradNorm
        • OM-Net
        • 06_Multi-task Learning
      • 06_Generative Models
        • 00_Generative Models
        • 01_Autoencoders
          • AE vs Others
          • Sparse AE
          • Denoising AE
          • Contractive AE
          • Variational AE
          • DELG
        • 02_GAN
      • Graph Convolutional Networks
        • 00_Graph Convolutional Networks
      • Neural Radiance Fields (NeRFs)
      • Deep Belief Networks
    • Multimodal Models
    • Bag of Freebies - BOF
      • 01_Augmentation
        • Mosaic
        • Cut Out
        • Mix Up
      • 02_Loss Functions
        • 01_Classification Loss
        • 02_Segmentation Loss
        • 03_Object Detection Loss
        • 04_Self-Supervised Loss
        • 05_Interactive Segmentation Loss
      • 03_Optimizer
      • 04_Normalization
        • 00_Normalization
      • 05_Regularization
      • 06_Label Assignment
        • 00_Label Assignment
        • OTA
        • SimOTA
      • 07_Auxiliary Head
    • Bag of Specials - BoS
      • Feature Pyramid
        • RCNet
      • Receptive Field
      • Attention
        • 00_Attention Modules
        • SENet
        • CBAM
        • DANet
        • SDANet
        • AttaNet
        • HaloNets
        • GCNet
        • DeepSquare
        • LBAM
        • External-Attention
        • PCT
        • Residual Attention
        • DCANet
        • GANet
        • Triplet Attention
        • Lambda Networks
        • ACTION
        • VAN
        • SegNeXt
      • Local-/Global- Features
        • Unifying Nonlocal Blocks for Neural Networks
        • Local Features
        • Global Features
      • Activation Functions
        • SiLU dSiLU
      • Post-Processing
        • Soft-NMS
        • NMW
        • WBF
      • Sliding Window
      • Graph Networks
      • Feature Fusion/Integration
      • Data-Centric
    • Others
      • Selected Top-Conference Papers
        • AAAI2021_Papers
        • CVPR2021_Papers
        • ECCV2020_Papers
        • ICCV2021_Papers
        • ICLM2022_Papers
      • Cheat Sheets
        • Pandas
      • Conference Schedule
  • Data Science
    • 03_DS_Discrete Distribution
    • Data Scientist Professional
      • 3. Statistical Experimentation Theory
      • 4. Statistical Experimentation in Python
      • 5. Model development in Python
      • 7. Data Management in SQL
    • Data...
    • ETL
    • Airflow
  • Cloud Computing
    • Azure Data Fundamental
    • Amazon Web Services
      • AWS - Cloud 101
      • AWS - Machine Learning Foundation (Lab)
        • 1. Introduction to MLF
        • 2. AI and ML
        • 3. ML Pipeline
        • 4. ML Tools and Services
        • 5. Wrapping it Up
      • AWS - Cloud Practitioner Essentials
      • AWS - GenAI
    • Google Cloud
    • IBM Watson
  • Big Data
    • PySpark
      • Introduction to PySpark
        • 1. Getting to know PySpark
        • 2. Manipulating Data
        • 3. Getting Started with ML Pipelines
        • 4. Model Tuning and Selection
      • Big Data Fundamentals with PySpark
        • 1. Introduction to BigData Analysis with Spark
        • 2. Programming in PySpark RDD’s
        • 3. PySpark SQL & DataFrames
        • 4. Machine Learning with PySpark MLlib
  • English
    • Reading
    • Listening
    • Speaking
      • Speaking_Part1
        • 1_Speaking Part 1
        • 2_Speaking Part 1
        • 3_Speaking Part 1
        • 4_Speaking Part 1
        • 5_Speaking Part 1
        • 6_Speaking Part 1
        • 7_Speaking Part 1
        • 8_Speaking Part 1
        • 9_Speaking Part 1
        • 10_Speaking Part 1
        • 11_Speaking Part 1
        • 12_Speaking Part 1
        • 13_Speaking Part 1
        • 14_Speaking Part 1
        • 15_Speaking Part 1
        • 16_Speaking Part 1
        • 17_Speaking Part 1
        • 18_Speaking Part 1
        • 19_Speaking Part 1
        • 20_Speaking Part 1
        • 21_Speaking Part 1
        • 22_Speaking Part 1
        • 23_Speaking Part 1
      • Speaking_Part2
        • 1_Speaking Part 2
        • 2_Speaking Part 2
        • 3_Speaking Part 2
        • 4_Speaking Part 2
        • 5_Speaking Part 2
        • 6_Speaking Part 2
        • 7_Speaking Part 2
        • 8_Speaking Part 2
        • 9_Speaking Part 2
        • 10_Speaking Part 2
        • 11_Speaking Part 2
        • 12_Speaking Part 2
        • 13_Speaking Part 2
        • 14_Speaking Part 2
        • 15_Speaking Part 2
        • 16_Speaking Part 2
        • 17_Speaking Part 2
        • 18_Speaking Part 2
        • 19_Speaking Part 2
        • 20_Speaking Part 2
        • People
        • Places
          • Visited House
        • Events
        • Activities
          • Interesting Job
        • Things
      • Speaking_Part3
        • Advertisements
        • Outdoor Activities
        • Navigation and Exploration
        • Fast Food
        • Air Pollution
        • Free Time
        • Interesting Movie
        • Gifts
        • Independence in Children
        • Noisy
        • Complain
        • T-shirts
        • Value of Money
        • Restaurant
        • Global
        • Relaxation
        • Special Places
      • Mixed-Test
        • 01_Mix_Language
    • Writing
      • Writing_Task1
        • Paraphrase
        • Overview Sentence
        • Grammar
        • Charts
          • Line - Average Montly Temperatures
          • Line - Fuels
          • Line - Birth Rate
          • Line - River Water
          • Line - U.S Energy
          • Line - Areas of Crime
          • Line - Renewable Energy
          • Line - Oversea Visitors
          • Chart - People ate in the UK
          • Chart - Music Event Attendance
          • Chart - Wind Energy
          • Chart - Children Attend Sports in Australia
          • Chart - Weekly Hours in Australia
          • Chart - Films released vs Tickets sold
          • Chart - Average Retirement Age
        • Process
        • Maps
          • Library Ground
        • Table
        • Multiple Graphs
          • Life Expectancy
      • Writing_Task2
        • Opinion Essay
          • Higher Salary
          • Goal of Schools
          • Local History
          • Retirement Age
          • Happy Society
          • Food Necessary
          • Pay for more Art
          • Eradicate Poverty
          • Team Activities
          • Wild Animals and Birds
        • Discussion Essay
          • Sports
          • Make Money
          • Crime punished
          • Equipment for Student
          • Keep a Gun
        • Advantages and Disadvantages Essay
          • Live Away
          • Transform to Farms
        • Problem-Solution Essay
          • Extreme Sports
          • Spend Time Away From Families
      • Complex Sentence
      • If, Wish, Hope
    • Synonym Common Mistakes
    • Phrasal Verbs
    • TOEIC 990
  • Interview
    • Deep Learning Questions
      • C1_Mathematical Foundation
      • C2_Fundamentals of ML
      • C3_Fundamentals of DL
      • C4_Classic Network
      • C5_CNN
      • C6_RNN
      • C7_Target Detection
      • C8_Image Segmentation
      • C9_Reinforcement Learning
      • C10_Migration Learning
      • C13_Optimization Algorithm
      • C14_Super Parameter Adjustment
      • C15_Hetorogeneous Computing
    • Data Science Questions
  • Courses (Uni and Mooc)
    • AI Open Courses
    • DS Certificates
    • IBM Gen AI Engineering Professional Certificate
      • 10. Generative AI and LLMs: Architecture and Data Preparation
      • 11. Gen AI Foundational Models for NLP & Language Understanding
      • 12. Gen AI Language Modeling with Transformers
        • Module 1 - Fundamental Concepts of Transformer Architecture
        • Module 2 - Advanced Concepts of Transformer Architecture
      • 13. Generative AI Engineering and Fine-Tuning Transformers
      • 14. Generative AI Advanced Fine-Tuning for LLMs
      • 15. Fundamentals of AI Agents using RAG and Langchain
        • Module 1 - RAG Framework
        • Module 2 - Prompt Engineering and LangChain
      • 16. Project: Generative AI Applications with RAG and LangChain
    • Data Science Foundations: Data Structures and Algorithms Specialization
    • Flask - AI Applications
      • 1. Packaging Concepts
      • 2. Web App Deployment
      • 3. Creating AI Application
        • Sentiment Analysis
        • Emotion Detector
      • Deploy Deep Learning Models using Flask
    • Docker, Kubernetes & OpenShift
      • 1. Containers and Containerization
      • 2. Kubernetes Basics
      • 3. Managing Applications with Kubernetes
      • 4. The Kubernetes Ecosystem
      • 5. Final Assignments
    • Data Structures
      • 1. Introduction to DS&A
    • Algorithms
      • QE - Algorithms
      • Sorting Algorithms
        • Binary Search
        • Insertion Sort
        • Merge Sort
        • Quick sort
        • Heap sort
      • Divide and Conquer
      • Greedy Algorithm
      • Dynamic Programming
    • Operating System
      • QE - Operating System
      • 00_Operating System
    • CS231n Deep Learning for Computer Vision
      • 13. Self-Supervised Learning
    • CS480 Introduction to Machine Learning
      • 19. Attention and Transformer Networks
    • CS330 Multi-task and Meta Learning
      • 1. What is Multi-task Learning
    • Processing the Environment
      • Attention
    • Open VINO
    • Metaverse
      • 00_Metaverse
      • Spark AR
  • Research Projects
    • PPE Detection
      • Few-shot Data Sampling
    • Multiple Object Tracking
      • In-place Augmentation
    • Deep Clustering
      • Metrics
    • Defect Detection
      • 01_Defect_Improvement
      • Dataset: MVTec
      • Mixed supervision for surface-defect detection:
      • Practical Defect Detection
      • (Survey) Fabric Defect Detection
      • (Summary) Fabric Defect Detection
    • Medical Images
      • 01_Lung_Improvement
      • SANet
      • AnaXNet
      • 3D_EtoE Lung Cancer Screening
      • Semantics-enriched Representation
      • Attend And Compare
      • Recent Works
      • Kaggle_Medical Images
  • AI Engineer
  • Financial Invesment
    • 01_TPTrading
    • 02_BCTC
    • 03_Demand Side Platform (DSP)
    • 04_Business Models
    • Trading
      • 01_Technical Analysis
      • 02_Mentality
      • 03_Support and Resistance
  • Books
    • AI Books
    • Books
      • Persuasion IQ
      • Communication Skills
      • 48 Hours a Day
      • Maslow's Pyramid
      • MBTI
      • Tư Duy Ngược
    • Audio Books
  • Project Management
    • PM Methods
      • Agile
      • Scrum
      • Kanban
    • Foundations of PM
      • Module 1
      • Module 2
      • Module 3
      • Module 4
    • Project Initiation: Starting a Successul Projet
      • Module 1
      • Module 2
      • Module 3
      • Module 4
    • Project Planning: Putting It All Together
      • Module 1
    • Project Execution: Running the Project
    • Agile Project Management
    • Capstone: Applying Project Management in the Real World
  • Administrator
Lê Phong Phú
  • About Me!
  • AI Expert Roadmap
    • PyTorch
      • PyTorch Fundamentals
        • 1. Introduction to PyTorch
        • 2. Introduction to Computer Vision with PyTorch
        • 3. Introduction to Natural Language Processing with PyTorch
        • 4. Introduction to Audio Classification with PyTorch
      • Intermediate DL with Pytorch
        • 1_TrainingRobustNN
        • 2_Image&CNN
        • 3_Sequences&RNN
        • 4_Multi-Input&Multi-Output
    • Machine Learning
      • 01_ML_General
      • 02_ML_Supervised Learning
      • 03_ML_Unsupervised Learning
    • Mamba
      • 00_Sequence Modelling, S4 and Mamba
    • Transformers (CV&NLP)
      • NLNet
      • 01_Pure Transformer
        • ViT
        • Segformer
      • 02_Hybrid Transformer
        • DETR
        • Deformable DETR
        • DINO (Detection)
      • 99_Unfilter
        • LG-Transformer
        • Image GPT
        • Points as Queries
        • VST
        • MAXViT
        • ViTMAE-Detect
        • MAGNETO
        • AIT
        • MTV
        • PiT
        • Swin
        • PVTv2
        • PVT
        • FAVOR+
        • T2T-ViT
        • CaiT
        • CCT
        • DeiT
        • SSA
        • SA3D
    • [NLP] Natural Language Processing
      • 01_[LLMs] Large Language Models
      • [MoEs] Mixture of Experts
      • LLM Techniques
      • Attention is All You Need
      • Positional Encoding
      • Tokenization
      • MICLe
    • [CV] Computer Vision
      • MLP-based Classification
        • MLP-Mixer
        • FNet
        • EANet
      • 01_[SL] Supervised Learning
        • 01_Classification
          • Convolution Variants
          • 1x1 Convolution
          • EfficientNetV2
          • ConvNeXtV2
        • 02_Detection
          • ConvMixer
          • SOLO
          • YOLOX
          • YOLOR
          • AugFPN
          • BoT_Cls
          • BoF_OD
          • YOLOv3
          • YOLOv4
          • YOLOv5
          • YOLOv6
          • YOLOv7
          • YOLOv8
          • YOLOv9
          • YOLO-NAS
          • TPH-YOLOv5
          • TPH-YOLOv5++
          • ViTDET
        • 03_Segmentation
          • Object Instance Survey 2022
          • 01_Instance Segmentation
          • 02_Semantic Segmentation
          • 03_Panoptic Segmentation
          • 04_3D Segmentation
          • 05_Unsupervised Segmentation
          • BMask RCNN
          • ISTR
          • Transfuse
        • 04_[IS] Interactive Segmentation
          • Interactive Segmentation Techniques
          • 02_3D Interactive Segmentation
          • 03_Video Object Segmentation
          • SAM
          • HA_SAM
          • CFR-ICL
          • MST
          • ECONet
          • SimpleClick
          • FocusCut
          • f-BRS
          • iSegformer
        • 05_Object Tracking
          • 00_ObjectTracking
          • Sort
          • DeepSort
          • FairMOT
          • ByteTrack
          • StrongSORT
          • Tracktor
          • JDE
          • CenterTrack
          • PermaTrack
          • TransTrack
          • TrackFormer
          • BoT-SORT
        • 06_Face Recognition
        • 07_Image Stitching
        • 08_Image Restoration
        • 06_Refinement
          • BPR
        • 10_Scene Understanding
          • CPNet
        • 11_Human Pose Estimation
          • 3D Human Pose
          • Human Pose
        • 12_[SR] Super Resolution
          • Bicubic++
        • 13_VideoPropagation
        • 14_Image Mating
        • 15_Knowledge Distillation
        • 16_Others
      • 02_[UL] Unsupervised Learning
        • 00_Unsupervised Learning
        • 02_Deep Clustering
          • 00_K_Clusters Decision
          • Deep Cluster
          • Cluster Fit
          • DEC
          • Improving Relational Regularized Autoencoders with Spherical Sliced Fused G
          • Taxanomy
          • DeepDPM
          • BCL
          • VaDE
          • t-SNE
          • Tree-SNE
        • 04_Diffusional Models
      • 03_[SSL] Self-Supervised Learning
        • 00_Self-Supervised Learning
        • 01_Contrastive Learning
          • CPC
          • DIM
          • CMC
          • AMDIM
          • SimCLR
          • MoCo
          • MoCov2
          • YADIM
          • VICReg
          • CSL
          • Towards Domain-Agnostic Contrastive Learning
          • Non-Parametric Instance Discrimination
          • Video Contrastive Learning with Global Context
          • SupCon
          • Barlow Twin
        • 02_Predictive Tasks
        • 03_Bootstrapping
          • BYOL
        • 04_Regularization
        • 05_Masked Image Models
          • Patch Localization
          • MAE
          • SimMIM
          • DINO
        • 06_Pretext Tasks
          • PIRL
        • 07_Clustering-based
          • SwAV
      • 04_Semi-Supervised Learning
        • Fully-/Semi-/Weakly-/ Learning
        • 01_Self-training
          • Pseudo-label
          • Noisy Student
        • 02_Consistency Regularization
          • Temporal Ensembling
          • Mean Teacher
          • VAT
          • UDA
        • 03_Hybrid Methods
          • MixUp
          • MixMatch
          • ReMixMatch
          • FixMatch
          • FixMatch (unmerge)
      • 05_Multi-learning Paradigm
        • 00_Multi-learning
        • 01_Multitask
        • Gradient Surgery
        • EtE Multi-task Learning with Attention
        • MTL for Dense Predictions
        • MTL using Uncertainty
        • Which Task learned together
        • GradNorm
        • OM-Net
        • 06_Multi-task Learning
      • 06_Generative Models
        • 00_Generative Models
        • 01_Autoencoders
          • AE vs Others
          • Sparse AE
          • Denoising AE
          • Contractive AE
          • Variational AE
          • DELG
        • 02_GAN
      • Graph Convolutional Networks
        • 00_Graph Convolutional Networks
      • Neural Radiance Fields (NeRFs)
      • Deep Belief Networks
    • Multimodal Models
    • Bag of Freebies - BOF
      • 01_Augmentation
        • Mosaic
        • Cut Out
        • Mix Up
      • 02_Loss Functions
        • 01_Classification Loss
        • 02_Segmentation Loss
        • 03_Object Detection Loss
        • 04_Self-Supervised Loss
        • 05_Interactive Segmentation Loss
      • 03_Optimizer
      • 04_Normalization
        • 00_Normalization
      • 05_Regularization
      • 06_Label Assignment
        • 00_Label Assignment
        • OTA
        • SimOTA
      • 07_Auxiliary Head
    • Bag of Specials - BoS
      • Feature Pyramid
        • RCNet
      • Receptive Field
      • Attention
        • 00_Attention Modules
        • SENet
        • CBAM
        • DANet
        • SDANet
        • AttaNet
        • HaloNets
        • GCNet
        • DeepSquare
        • LBAM
        • External-Attention
        • PCT
        • Residual Attention
        • DCANet
        • GANet
        • Triplet Attention
        • Lambda Networks
        • ACTION
        • VAN
        • SegNeXt
      • Local-/Global- Features
        • Unifying Nonlocal Blocks for Neural Networks
        • Local Features
        • Global Features
      • Activation Functions
        • SiLU dSiLU
      • Post-Processing
        • Soft-NMS
        • NMW
        • WBF
      • Sliding Window
      • Graph Networks
      • Feature Fusion/Integration
      • Data-Centric
    • Others
      • Selected Top-Conference Papers
        • AAAI2021_Papers
        • CVPR2021_Papers
        • ECCV2020_Papers
        • ICCV2021_Papers
        • ICLM2022_Papers
      • Cheat Sheets
        • Pandas
      • Conference Schedule
  • Data Science
    • 03_DS_Discrete Distribution
    • Data Scientist Professional
      • 3. Statistical Experimentation Theory
      • 4. Statistical Experimentation in Python
      • 5. Model development in Python
      • 7. Data Management in SQL
    • Data...
    • ETL
    • Airflow
  • Cloud Computing
    • Azure Data Fundamental
    • Amazon Web Services
      • AWS - Cloud 101
      • AWS - Machine Learning Foundation (Lab)
        • 1. Introduction to MLF
        • 2. AI and ML
        • 3. ML Pipeline
        • 4. ML Tools and Services
        • 5. Wrapping it Up
      • AWS - Cloud Practitioner Essentials
      • AWS - GenAI
    • Google Cloud
    • IBM Watson
  • Big Data
    • PySpark
      • Introduction to PySpark
        • 1. Getting to know PySpark
        • 2. Manipulating Data
        • 3. Getting Started with ML Pipelines
        • 4. Model Tuning and Selection
      • Big Data Fundamentals with PySpark
        • 1. Introduction to BigData Analysis with Spark
        • 2. Programming in PySpark RDD’s
        • 3. PySpark SQL & DataFrames
        • 4. Machine Learning with PySpark MLlib
  • English
    • Reading
    • Listening
    • Speaking
      • Speaking_Part1
        • 1_Speaking Part 1
        • 2_Speaking Part 1
        • 3_Speaking Part 1
        • 4_Speaking Part 1
        • 5_Speaking Part 1
        • 6_Speaking Part 1
        • 7_Speaking Part 1
        • 8_Speaking Part 1
        • 9_Speaking Part 1
        • 10_Speaking Part 1
        • 11_Speaking Part 1
        • 12_Speaking Part 1
        • 13_Speaking Part 1
        • 14_Speaking Part 1
        • 15_Speaking Part 1
        • 16_Speaking Part 1
        • 17_Speaking Part 1
        • 18_Speaking Part 1
        • 19_Speaking Part 1
        • 20_Speaking Part 1
        • 21_Speaking Part 1
        • 22_Speaking Part 1
        • 23_Speaking Part 1
      • Speaking_Part2
        • 1_Speaking Part 2
        • 2_Speaking Part 2
        • 3_Speaking Part 2
        • 4_Speaking Part 2
        • 5_Speaking Part 2
        • 6_Speaking Part 2
        • 7_Speaking Part 2
        • 8_Speaking Part 2
        • 9_Speaking Part 2
        • 10_Speaking Part 2
        • 11_Speaking Part 2
        • 12_Speaking Part 2
        • 13_Speaking Part 2
        • 14_Speaking Part 2
        • 15_Speaking Part 2
        • 16_Speaking Part 2
        • 17_Speaking Part 2
        • 18_Speaking Part 2
        • 19_Speaking Part 2
        • 20_Speaking Part 2
        • People
        • Places
          • Visited House
        • Events
        • Activities
          • Interesting Job
        • Things
      • Speaking_Part3
        • Advertisements
        • Outdoor Activities
        • Navigation and Exploration
        • Fast Food
        • Air Pollution
        • Free Time
        • Interesting Movie
        • Gifts
        • Independence in Children
        • Noisy
        • Complain
        • T-shirts
        • Value of Money
        • Restaurant
        • Global
        • Relaxation
        • Special Places
      • Mixed-Test
        • 01_Mix_Language
    • Writing
      • Writing_Task1
        • Paraphrase
        • Overview Sentence
        • Grammar
        • Charts
          • Line - Average Montly Temperatures
          • Line - Fuels
          • Line - Birth Rate
          • Line - River Water
          • Line - U.S Energy
          • Line - Areas of Crime
          • Line - Renewable Energy
          • Line - Oversea Visitors
          • Chart - People ate in the UK
          • Chart - Music Event Attendance
          • Chart - Wind Energy
          • Chart - Children Attend Sports in Australia
          • Chart - Weekly Hours in Australia
          • Chart - Films released vs Tickets sold
          • Chart - Average Retirement Age
        • Process
        • Maps
          • Library Ground
        • Table
        • Multiple Graphs
          • Life Expectancy
      • Writing_Task2
        • Opinion Essay
          • Higher Salary
          • Goal of Schools
          • Local History
          • Retirement Age
          • Happy Society
          • Food Necessary
          • Pay for more Art
          • Eradicate Poverty
          • Team Activities
          • Wild Animals and Birds
        • Discussion Essay
          • Sports
          • Make Money
          • Crime punished
          • Equipment for Student
          • Keep a Gun
        • Advantages and Disadvantages Essay
          • Live Away
          • Transform to Farms
        • Problem-Solution Essay
          • Extreme Sports
          • Spend Time Away From Families
      • Complex Sentence
      • If, Wish, Hope
    • Synonym Common Mistakes
    • Phrasal Verbs
    • TOEIC 990
  • Interview
    • Deep Learning Questions
      • C1_Mathematical Foundation
      • C2_Fundamentals of ML
      • C3_Fundamentals of DL
      • C4_Classic Network
      • C5_CNN
      • C6_RNN
      • C7_Target Detection
      • C8_Image Segmentation
      • C9_Reinforcement Learning
      • C10_Migration Learning
      • C13_Optimization Algorithm
      • C14_Super Parameter Adjustment
      • C15_Hetorogeneous Computing
    • Data Science Questions
  • Courses (Uni and Mooc)
    • AI Open Courses
    • DS Certificates
    • IBM Gen AI Engineering Professional Certificate
      • 10. Generative AI and LLMs: Architecture and Data Preparation
      • 11. Gen AI Foundational Models for NLP & Language Understanding
      • 12. Gen AI Language Modeling with Transformers
        • Module 1 - Fundamental Concepts of Transformer Architecture
        • Module 2 - Advanced Concepts of Transformer Architecture
      • 13. Generative AI Engineering and Fine-Tuning Transformers
      • 14. Generative AI Advanced Fine-Tuning for LLMs
      • 15. Fundamentals of AI Agents using RAG and Langchain
        • Module 1 - RAG Framework
        • Module 2 - Prompt Engineering and LangChain
      • 16. Project: Generative AI Applications with RAG and LangChain
    • Data Science Foundations: Data Structures and Algorithms Specialization
    • Flask - AI Applications
      • 1. Packaging Concepts
      • 2. Web App Deployment
      • 3. Creating AI Application
        • Sentiment Analysis
        • Emotion Detector
      • Deploy Deep Learning Models using Flask
    • Docker, Kubernetes & OpenShift
      • 1. Containers and Containerization
      • 2. Kubernetes Basics
      • 3. Managing Applications with Kubernetes
      • 4. The Kubernetes Ecosystem
      • 5. Final Assignments
    • Data Structures
      • 1. Introduction to DS&A
    • Algorithms
      • QE - Algorithms
      • Sorting Algorithms
        • Binary Search
        • Insertion Sort
        • Merge Sort
        • Quick sort
        • Heap sort
      • Divide and Conquer
      • Greedy Algorithm
      • Dynamic Programming
    • Operating System
      • QE - Operating System
      • 00_Operating System
    • CS231n Deep Learning for Computer Vision
      • 13. Self-Supervised Learning
    • CS480 Introduction to Machine Learning
      • 19. Attention and Transformer Networks
    • CS330 Multi-task and Meta Learning
      • 1. What is Multi-task Learning
    • Processing the Environment
      • Attention
    • Open VINO
    • Metaverse
      • 00_Metaverse
      • Spark AR
  • Research Projects
    • PPE Detection
      • Few-shot Data Sampling
    • Multiple Object Tracking
      • In-place Augmentation
    • Deep Clustering
      • Metrics
    • Defect Detection
      • 01_Defect_Improvement
      • Dataset: MVTec
      • Mixed supervision for surface-defect detection:
      • Practical Defect Detection
      • (Survey) Fabric Defect Detection
      • (Summary) Fabric Defect Detection
    • Medical Images
      • 01_Lung_Improvement
      • SANet
      • AnaXNet
      • 3D_EtoE Lung Cancer Screening
      • Semantics-enriched Representation
      • Attend And Compare
      • Recent Works
      • Kaggle_Medical Images
  • AI Engineer
  • Financial Invesment
    • 01_TPTrading
    • 02_BCTC
    • 03_Demand Side Platform (DSP)
    • 04_Business Models
    • Trading
      • 01_Technical Analysis
      • 02_Mentality
      • 03_Support and Resistance
  • Books
    • AI Books
    • Books
      • Persuasion IQ
      • Communication Skills
      • 48 Hours a Day
      • Maslow's Pyramid
      • MBTI
      • Tư Duy Ngược
    • Audio Books
  • Project Management
    • PM Methods
      • Agile
      • Scrum
      • Kanban
    • Foundations of PM
      • Module 1
      • Module 2
      • Module 3
      • Module 4
    • Project Initiation: Starting a Successul Projet
      • Module 1
      • Module 2
      • Module 3
      • Module 4
    • Project Planning: Putting It All Together
      • Module 1
    • Project Execution: Running the Project
    • Agile Project Management
    • Capstone: Applying Project Management in the Real World
  • Administrator
  • More
    • About Me!
    • AI Expert Roadmap
      • PyTorch
        • PyTorch Fundamentals
          • 1. Introduction to PyTorch
          • 2. Introduction to Computer Vision with PyTorch
          • 3. Introduction to Natural Language Processing with PyTorch
          • 4. Introduction to Audio Classification with PyTorch
        • Intermediate DL with Pytorch
          • 1_TrainingRobustNN
          • 2_Image&CNN
          • 3_Sequences&RNN
          • 4_Multi-Input&Multi-Output
      • Machine Learning
        • 01_ML_General
        • 02_ML_Supervised Learning
        • 03_ML_Unsupervised Learning
      • Mamba
        • 00_Sequence Modelling, S4 and Mamba
      • Transformers (CV&NLP)
        • NLNet
        • 01_Pure Transformer
          • ViT
          • Segformer
        • 02_Hybrid Transformer
          • DETR
          • Deformable DETR
          • DINO (Detection)
        • 99_Unfilter
          • LG-Transformer
          • Image GPT
          • Points as Queries
          • VST
          • MAXViT
          • ViTMAE-Detect
          • MAGNETO
          • AIT
          • MTV
          • PiT
          • Swin
          • PVTv2
          • PVT
          • FAVOR+
          • T2T-ViT
          • CaiT
          • CCT
          • DeiT
          • SSA
          • SA3D
      • [NLP] Natural Language Processing
        • 01_[LLMs] Large Language Models
        • [MoEs] Mixture of Experts
        • LLM Techniques
        • Attention is All You Need
        • Positional Encoding
        • Tokenization
        • MICLe
      • [CV] Computer Vision
        • MLP-based Classification
          • MLP-Mixer
          • FNet
          • EANet
        • 01_[SL] Supervised Learning
          • 01_Classification
            • Convolution Variants
            • 1x1 Convolution
            • EfficientNetV2
            • ConvNeXtV2
          • 02_Detection
            • ConvMixer
            • SOLO
            • YOLOX
            • YOLOR
            • AugFPN
            • BoT_Cls
            • BoF_OD
            • YOLOv3
            • YOLOv4
            • YOLOv5
            • YOLOv6
            • YOLOv7
            • YOLOv8
            • YOLOv9
            • YOLO-NAS
            • TPH-YOLOv5
            • TPH-YOLOv5++
            • ViTDET
          • 03_Segmentation
            • Object Instance Survey 2022
            • 01_Instance Segmentation
            • 02_Semantic Segmentation
            • 03_Panoptic Segmentation
            • 04_3D Segmentation
            • 05_Unsupervised Segmentation
            • BMask RCNN
            • ISTR
            • Transfuse
          • 04_[IS] Interactive Segmentation
            • Interactive Segmentation Techniques
            • 02_3D Interactive Segmentation
            • 03_Video Object Segmentation
            • SAM
            • HA_SAM
            • CFR-ICL
            • MST
            • ECONet
            • SimpleClick
            • FocusCut
            • f-BRS
            • iSegformer
          • 05_Object Tracking
            • 00_ObjectTracking
            • Sort
            • DeepSort
            • FairMOT
            • ByteTrack
            • StrongSORT
            • Tracktor
            • JDE
            • CenterTrack
            • PermaTrack
            • TransTrack
            • TrackFormer
            • BoT-SORT
          • 06_Face Recognition
          • 07_Image Stitching
          • 08_Image Restoration
          • 06_Refinement
            • BPR
          • 10_Scene Understanding
            • CPNet
          • 11_Human Pose Estimation
            • 3D Human Pose
            • Human Pose
          • 12_[SR] Super Resolution
            • Bicubic++
          • 13_VideoPropagation
          • 14_Image Mating
          • 15_Knowledge Distillation
          • 16_Others
        • 02_[UL] Unsupervised Learning
          • 00_Unsupervised Learning
          • 02_Deep Clustering
            • 00_K_Clusters Decision
            • Deep Cluster
            • Cluster Fit
            • DEC
            • Improving Relational Regularized Autoencoders with Spherical Sliced Fused G
            • Taxanomy
            • DeepDPM
            • BCL
            • VaDE
            • t-SNE
            • Tree-SNE
          • 04_Diffusional Models
        • 03_[SSL] Self-Supervised Learning
          • 00_Self-Supervised Learning
          • 01_Contrastive Learning
            • CPC
            • DIM
            • CMC
            • AMDIM
            • SimCLR
            • MoCo
            • MoCov2
            • YADIM
            • VICReg
            • CSL
            • Towards Domain-Agnostic Contrastive Learning
            • Non-Parametric Instance Discrimination
            • Video Contrastive Learning with Global Context
            • SupCon
            • Barlow Twin
          • 02_Predictive Tasks
          • 03_Bootstrapping
            • BYOL
          • 04_Regularization
          • 05_Masked Image Models
            • Patch Localization
            • MAE
            • SimMIM
            • DINO
          • 06_Pretext Tasks
            • PIRL
          • 07_Clustering-based
            • SwAV
        • 04_Semi-Supervised Learning
          • Fully-/Semi-/Weakly-/ Learning
          • 01_Self-training
            • Pseudo-label
            • Noisy Student
          • 02_Consistency Regularization
            • Temporal Ensembling
            • Mean Teacher
            • VAT
            • UDA
          • 03_Hybrid Methods
            • MixUp
            • MixMatch
            • ReMixMatch
            • FixMatch
            • FixMatch (unmerge)
        • 05_Multi-learning Paradigm
          • 00_Multi-learning
          • 01_Multitask
          • Gradient Surgery
          • EtE Multi-task Learning with Attention
          • MTL for Dense Predictions
          • MTL using Uncertainty
          • Which Task learned together
          • GradNorm
          • OM-Net
          • 06_Multi-task Learning
        • 06_Generative Models
          • 00_Generative Models
          • 01_Autoencoders
            • AE vs Others
            • Sparse AE
            • Denoising AE
            • Contractive AE
            • Variational AE
            • DELG
          • 02_GAN
        • Graph Convolutional Networks
          • 00_Graph Convolutional Networks
        • Neural Radiance Fields (NeRFs)
        • Deep Belief Networks
      • Multimodal Models
      • Bag of Freebies - BOF
        • 01_Augmentation
          • Mosaic
          • Cut Out
          • Mix Up
        • 02_Loss Functions
          • 01_Classification Loss
          • 02_Segmentation Loss
          • 03_Object Detection Loss
          • 04_Self-Supervised Loss
          • 05_Interactive Segmentation Loss
        • 03_Optimizer
        • 04_Normalization
          • 00_Normalization
        • 05_Regularization
        • 06_Label Assignment
          • 00_Label Assignment
          • OTA
          • SimOTA
        • 07_Auxiliary Head
      • Bag of Specials - BoS
        • Feature Pyramid
          • RCNet
        • Receptive Field
        • Attention
          • 00_Attention Modules
          • SENet
          • CBAM
          • DANet
          • SDANet
          • AttaNet
          • HaloNets
          • GCNet
          • DeepSquare
          • LBAM
          • External-Attention
          • PCT
          • Residual Attention
          • DCANet
          • GANet
          • Triplet Attention
          • Lambda Networks
          • ACTION
          • VAN
          • SegNeXt
        • Local-/Global- Features
          • Unifying Nonlocal Blocks for Neural Networks
          • Local Features
          • Global Features
        • Activation Functions
          • SiLU dSiLU
        • Post-Processing
          • Soft-NMS
          • NMW
          • WBF
        • Sliding Window
        • Graph Networks
        • Feature Fusion/Integration
        • Data-Centric
      • Others
        • Selected Top-Conference Papers
          • AAAI2021_Papers
          • CVPR2021_Papers
          • ECCV2020_Papers
          • ICCV2021_Papers
          • ICLM2022_Papers
        • Cheat Sheets
          • Pandas
        • Conference Schedule
    • Data Science
      • 03_DS_Discrete Distribution
      • Data Scientist Professional
        • 3. Statistical Experimentation Theory
        • 4. Statistical Experimentation in Python
        • 5. Model development in Python
        • 7. Data Management in SQL
      • Data...
      • ETL
      • Airflow
    • Cloud Computing
      • Azure Data Fundamental
      • Amazon Web Services
        • AWS - Cloud 101
        • AWS - Machine Learning Foundation (Lab)
          • 1. Introduction to MLF
          • 2. AI and ML
          • 3. ML Pipeline
          • 4. ML Tools and Services
          • 5. Wrapping it Up
        • AWS - Cloud Practitioner Essentials
        • AWS - GenAI
      • Google Cloud
      • IBM Watson
    • Big Data
      • PySpark
        • Introduction to PySpark
          • 1. Getting to know PySpark
          • 2. Manipulating Data
          • 3. Getting Started with ML Pipelines
          • 4. Model Tuning and Selection
        • Big Data Fundamentals with PySpark
          • 1. Introduction to BigData Analysis with Spark
          • 2. Programming in PySpark RDD’s
          • 3. PySpark SQL & DataFrames
          • 4. Machine Learning with PySpark MLlib
    • English
      • Reading
      • Listening
      • Speaking
        • Speaking_Part1
          • 1_Speaking Part 1
          • 2_Speaking Part 1
          • 3_Speaking Part 1
          • 4_Speaking Part 1
          • 5_Speaking Part 1
          • 6_Speaking Part 1
          • 7_Speaking Part 1
          • 8_Speaking Part 1
          • 9_Speaking Part 1
          • 10_Speaking Part 1
          • 11_Speaking Part 1
          • 12_Speaking Part 1
          • 13_Speaking Part 1
          • 14_Speaking Part 1
          • 15_Speaking Part 1
          • 16_Speaking Part 1
          • 17_Speaking Part 1
          • 18_Speaking Part 1
          • 19_Speaking Part 1
          • 20_Speaking Part 1
          • 21_Speaking Part 1
          • 22_Speaking Part 1
          • 23_Speaking Part 1
        • Speaking_Part2
          • 1_Speaking Part 2
          • 2_Speaking Part 2
          • 3_Speaking Part 2
          • 4_Speaking Part 2
          • 5_Speaking Part 2
          • 6_Speaking Part 2
          • 7_Speaking Part 2
          • 8_Speaking Part 2
          • 9_Speaking Part 2
          • 10_Speaking Part 2
          • 11_Speaking Part 2
          • 12_Speaking Part 2
          • 13_Speaking Part 2
          • 14_Speaking Part 2
          • 15_Speaking Part 2
          • 16_Speaking Part 2
          • 17_Speaking Part 2
          • 18_Speaking Part 2
          • 19_Speaking Part 2
          • 20_Speaking Part 2
          • People
          • Places
            • Visited House
          • Events
          • Activities
            • Interesting Job
          • Things
        • Speaking_Part3
          • Advertisements
          • Outdoor Activities
          • Navigation and Exploration
          • Fast Food
          • Air Pollution
          • Free Time
          • Interesting Movie
          • Gifts
          • Independence in Children
          • Noisy
          • Complain
          • T-shirts
          • Value of Money
          • Restaurant
          • Global
          • Relaxation
          • Special Places
        • Mixed-Test
          • 01_Mix_Language
      • Writing
        • Writing_Task1
          • Paraphrase
          • Overview Sentence
          • Grammar
          • Charts
            • Line - Average Montly Temperatures
            • Line - Fuels
            • Line - Birth Rate
            • Line - River Water
            • Line - U.S Energy
            • Line - Areas of Crime
            • Line - Renewable Energy
            • Line - Oversea Visitors
            • Chart - People ate in the UK
            • Chart - Music Event Attendance
            • Chart - Wind Energy
            • Chart - Children Attend Sports in Australia
            • Chart - Weekly Hours in Australia
            • Chart - Films released vs Tickets sold
            • Chart - Average Retirement Age
          • Process
          • Maps
            • Library Ground
          • Table
          • Multiple Graphs
            • Life Expectancy
        • Writing_Task2
          • Opinion Essay
            • Higher Salary
            • Goal of Schools
            • Local History
            • Retirement Age
            • Happy Society
            • Food Necessary
            • Pay for more Art
            • Eradicate Poverty
            • Team Activities
            • Wild Animals and Birds
          • Discussion Essay
            • Sports
            • Make Money
            • Crime punished
            • Equipment for Student
            • Keep a Gun
          • Advantages and Disadvantages Essay
            • Live Away
            • Transform to Farms
          • Problem-Solution Essay
            • Extreme Sports
            • Spend Time Away From Families
        • Complex Sentence
        • If, Wish, Hope
      • Synonym Common Mistakes
      • Phrasal Verbs
      • TOEIC 990
    • Interview
      • Deep Learning Questions
        • C1_Mathematical Foundation
        • C2_Fundamentals of ML
        • C3_Fundamentals of DL
        • C4_Classic Network
        • C5_CNN
        • C6_RNN
        • C7_Target Detection
        • C8_Image Segmentation
        • C9_Reinforcement Learning
        • C10_Migration Learning
        • C13_Optimization Algorithm
        • C14_Super Parameter Adjustment
        • C15_Hetorogeneous Computing
      • Data Science Questions
    • Courses (Uni and Mooc)
      • AI Open Courses
      • DS Certificates
      • IBM Gen AI Engineering Professional Certificate
        • 10. Generative AI and LLMs: Architecture and Data Preparation
        • 11. Gen AI Foundational Models for NLP & Language Understanding
        • 12. Gen AI Language Modeling with Transformers
          • Module 1 - Fundamental Concepts of Transformer Architecture
          • Module 2 - Advanced Concepts of Transformer Architecture
        • 13. Generative AI Engineering and Fine-Tuning Transformers
        • 14. Generative AI Advanced Fine-Tuning for LLMs
        • 15. Fundamentals of AI Agents using RAG and Langchain
          • Module 1 - RAG Framework
          • Module 2 - Prompt Engineering and LangChain
        • 16. Project: Generative AI Applications with RAG and LangChain
      • Data Science Foundations: Data Structures and Algorithms Specialization
      • Flask - AI Applications
        • 1. Packaging Concepts
        • 2. Web App Deployment
        • 3. Creating AI Application
          • Sentiment Analysis
          • Emotion Detector
        • Deploy Deep Learning Models using Flask
      • Docker, Kubernetes & OpenShift
        • 1. Containers and Containerization
        • 2. Kubernetes Basics
        • 3. Managing Applications with Kubernetes
        • 4. The Kubernetes Ecosystem
        • 5. Final Assignments
      • Data Structures
        • 1. Introduction to DS&A
      • Algorithms
        • QE - Algorithms
        • Sorting Algorithms
          • Binary Search
          • Insertion Sort
          • Merge Sort
          • Quick sort
          • Heap sort
        • Divide and Conquer
        • Greedy Algorithm
        • Dynamic Programming
      • Operating System
        • QE - Operating System
        • 00_Operating System
      • CS231n Deep Learning for Computer Vision
        • 13. Self-Supervised Learning
      • CS480 Introduction to Machine Learning
        • 19. Attention and Transformer Networks
      • CS330 Multi-task and Meta Learning
        • 1. What is Multi-task Learning
      • Processing the Environment
        • Attention
      • Open VINO
      • Metaverse
        • 00_Metaverse
        • Spark AR
    • Research Projects
      • PPE Detection
        • Few-shot Data Sampling
      • Multiple Object Tracking
        • In-place Augmentation
      • Deep Clustering
        • Metrics
      • Defect Detection
        • 01_Defect_Improvement
        • Dataset: MVTec
        • Mixed supervision for surface-defect detection:
        • Practical Defect Detection
        • (Survey) Fabric Defect Detection
        • (Summary) Fabric Defect Detection
      • Medical Images
        • 01_Lung_Improvement
        • SANet
        • AnaXNet
        • 3D_EtoE Lung Cancer Screening
        • Semantics-enriched Representation
        • Attend And Compare
        • Recent Works
        • Kaggle_Medical Images
    • AI Engineer
    • Financial Invesment
      • 01_TPTrading
      • 02_BCTC
      • 03_Demand Side Platform (DSP)
      • 04_Business Models
      • Trading
        • 01_Technical Analysis
        • 02_Mentality
        • 03_Support and Resistance
    • Books
      • AI Books
      • Books
        • Persuasion IQ
        • Communication Skills
        • 48 Hours a Day
        • Maslow's Pyramid
        • MBTI
        • Tư Duy Ngược
      • Audio Books
    • Project Management
      • PM Methods
        • Agile
        • Scrum
        • Kanban
      • Foundations of PM
        • Module 1
        • Module 2
        • Module 3
        • Module 4
      • Project Initiation: Starting a Successul Projet
        • Module 1
        • Module 2
        • Module 3
        • Module 4
      • Project Planning: Putting It All Together
        • Module 1
      • Project Execution: Running the Project
      • Agile Project Management
      • Capstone: Applying Project Management in the Real World
    • Administrator

[MAE] Masked Autoencoders Are Scalable Vision Learners

{, }

Paper: https://openaccess.thecvf.com/content/CVPR2022/papers/He_Masked_Autoencoders_Are_Scalable_Vision_Learners_CVPR_2022_paper.pdf 

Code: https://github.com/facebookresearch/mae 

Video: https://www.youtube.com/watch?v=Dp6iICL2dVI 

Application: 

  1. ADE20k Semantic segmentation with MAE

  2. Benchmarking Detection Transfer Learning with Vision Transformers

  3. https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb 

Motivation, Objectives and Related Works

Motivation

  • A huge number of data can be seen in NLPs easily. 

  • Autoregressive language modeling in GPT and masked autoencoding in BERT are not complex: they delete a percentage of the data and learn to predict the removed content. 

  • These methods make the training of NLP models, including billions of billion parameters, viable.

Objectives

  • MAE is a scalable self-supervised learner for computer vision that divides the image into patches and performs the task of predicting the masked parts of the image as pre-training. 

  1. An asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens.

  2. Masking a high proportion of the input image, e.g., 75%, generates a nontrivial and meaningful self-supervisory task.

  • Enables to train large models efficiently and effectively: accelerate training (by 3× or more) and improve accuracy. 

Related Works

Masked Language Modeling 

  • These are successful models that have been used for pre-training in NLP. 

  • These feed the sequential inputs to forecast the missing contents. Also, these are scalable too.

  • Peers such as BERT, GPT, etc.

Autoencoding

  • This classical method includes two main parts: 

      1. Encoder (which maps an input to a latent representation). 

      2. Decoder (which rebuilds the input). 

    • Some examples: PCA, K-means, DAE (Denoising AutoEncoder), etc.

Masked Image Encoding methods

  • These learn representations from images.

Self-Supervised learning - Contrastive Learning.

  • This models image similarity and dissimilarity between two or more views. 

  • These are firmly connected with data augmentation.

Figure. BeIT

Model

The proposed MAE in this research is not complex; a simple autoencoder that uses partial observation (the input image is not complete) and then turns out the image is entire. This autoencoder is almost like other previous (classical) autoencoders except for its asymmetric architecture, which is different from others. This design lets the model not train on all pixels of the image.

Idea

  • The input image I ∈ R3xHxW is split into patches (ex: 16x16) like ViT, and 75% of it is masked at a high rate.

  • The unmasked patches are input and passed through the encoder Φenc, then combined with the masked patches and passed through the decoder Φdec.

  • The goal is to restore the original image as closely as possible. Each patch carries out this process, so each patch can be seen as learning semantic information.

  • Hence, this method is termed the “token-level approach”.

Steps

  1. Produce a token for every patch (by linear projection with an added positional embedding)

  2. Shuffle the list of tokens randomly then, delete the last portion of the list (based on the masking ratio). This process generates a small subset of tokens (sampling patches with no replacement)

  3. After the encoding, a list of mask tokens is added to the list of encoded patches, and unshuffle this full list (inverting the random shuffle operation) to be equal with their targets.

Architecture

Masking

  • Researchers split an image into regular non-overlapping patches, then sampled a subset of patches and masked the rest (e.g. remove). 

  • The strategy (random sampling) used in this research is direct: random sample patches (with no replacement), following a uniform distribution (which avoids a potential center bias).

  • The result of a high masking ratio (the ratio of patches removal = 75%) considerably wipes out the plenty, therefore creating a task that cannot be simply solved by extrapolation from visible neighboring patches. 

Encoder

  • ViT (Vision Transformer).

  • The encoder is just applied on visible, unmasked patches, so the advantage seems to be that we can use a huge model while saving memory.

  • The encoder in this research embeds patches by the use of a leaner projection with a positional embedding and then operates the resulting set by a series of Transformer blocks. 

Decoder

  • The decoder also uses lightweight transformers, which is much lighter than the encoder, and each token requires less than 10% of the computation of the encoder. 

  • The decoder is only used to pre-train the mask partial reconstruction.

  • The output of the decoder is a vector of pixel values representing a patch.

Reconstruction Target

  • The final layer of the decoder is a linear projection. 

Loss Function

  • Mean Squared Error between masked tokens and reconstructed tokens in pixel space.

  • p là token index, Ω là tập các token được mask, I là ảnh đầu vào, I^ là ảnh được reconstruct.

Experimental Results

Dataset


Metrics

  •  

Experimental Results


Ablations

  • First, let’s look at the image reconstruction task (pre-training task). In this experiment, the results are the image reconstruction of the ImageNet validation set. We can see that the image is successfully reconstructed even though 80% of the image is masked.

  • Next, let’s take a look at the effect of mask ratio. The figure below shows an experiment of mask ratio and accuracy. We can see that the higher the mask ratio is, the better the results are in the downstream task, the image classification task.

  • Next, let’s take a look at the results of the downstream tasks. The first one is image classification. It gives excellent results compared to the self-supervised learning method using ViT.

  • Finally, there is object detection and semantic segmentation. This one also outperforms existing self-supervised learning methods and supervised learning.


Key Takeaways

References

  • https://huggingface.co/docs/transformers/model_doc/vit_mae 

  • https://keras.io/examples/vision/masked_image_modeling/ 

  • https://medium.com/mlearning-ai/paper-summary-masked-autoencoders-are-scalable-vision-learners-2dea8cdb1884 

    • n2  n0

    • θ

About Me:

  • Phone: +84 946 937 937 (Phu)

  • Email: [email protected]  or  [email protected] 

  • Facebook: https://www.facebook.com/phu210.vn/

  • Page: https://www.lephongphu.works/home-page 

"I have gathered information from various sources on the internet, and it is now available for your perusal.

Thank you so much for coming here!"

Google Sites
Report abuse
Page details
Page updated
Google Sites
Report abuse