Bag of words.
Words of Embeddings.
RNN based models.
LSTM based models.
Bi-directional LSTM.
Attention based.
Transformers.
Timeline of natural language processing models
Figure. The Transformer (Muppet) family | Source: PLM Papers
Ryan Kiros, et al. “Skip-Thought Vectors”
Tomas Mikolov, et al. “Efficient Estimation of Word Representations in Vector Space”
Bjarke Felbo, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm”
[ALBERT] Zhenzhong Lan, et al. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”
[BERT] Jacob Devlin, et al. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”
[DistilBERT]
[DeBERTaV3]
[Electra]
[RoBERTa] Yinhan Liu, et al. “RoBERTa: A Robustly Optimized BERT Pretraining Approach”
[BioGPT]
[CodeGen]
[LLaMa]
[GPT] Alec Radford, et al. “Improving Language Understanding by Generative Pre-Training”
[GPT-2]
[GPT-J]
[GPT-NEO]
[GPT-NEOX]
[NeMo Megatron-GPT]
[OPT]
[BLOOM]
[GLM]
[YaLM]
[T5]
[FLAN-T5]
[Code-T5]
[BART] Mike Lewis, et al. “BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension”
[Pegasus]
[MT5]
[UL2]
[FLAN-UL2]
[EdgeFormer]
[RoBERTa]
[XLNet]
[LongFormer]
[Reformer]
[T5 Transformer]
Bjarke Felbo, et al. “Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm”