Building a Decoder-Only Transformer Model for Text Generation

This post is divided into five parts; they are: • From a Full Transformer to a Decoder-Only Model • Building a Decoder-Only Model • Data Preparation for Self-Supervised Learning • Training the Model • Extensions The transformer model originated as a sequence-to-sequence (seq2seq) model that converts an input sequence into a context vector, which is then used to generate a new sequence.