[MAGNETO] Foundation Transformers

{Multi-modal, Sub-LayerNorm, Initialization}

Paper: https://arxiv.org/pdf/2210.06423.pdf

Code: https://github.com/microsoft/unilm