1 Comment

Good write-up, thanks for sharing. I would bring a bit of nuance here. These are more different applications of Transformers than different types of Transformers. Bert encoder is an auto-encoder, Bert decoder is auto-regressive. Auto-encoder are used to map their input to some latent representations (generally embeddings), auto-regressive decoder are used to generate sequences of words and sentences.

Expand full comment