Categories: AI/ML Research

Building a Transformer Model for Language Translation

This post is divided into six parts; they are: • Why Transformer is Better than Seq2Seq • Data Preparation and Tokenization • Design of a Transformer Model • Building the Transformer Model • Causal Mask and Padding Mask • Training and Evaluation Traditional seq2seq models with recurrent neural networks have two main limitations: • Sequential processing prevents parallelization • Limited ability to capture long-term dependencies since hidden states are overwritten whenever an element is processed The Transformer architecture, introduced in the 2017 paper “Attention is All You Need”, overcomes these limitations.

Building a Decoder-Only Transformer Model for Text Generation

This post is divided into five parts; they are: • From a Full Transformer to a Decoder-Only Model • Building a Decoder-Only Model • Data Preparation for Self-Supervised Learning • Training the Model • Extensions The transformer model originated as a sequence-to-sequence (seq2seq) model that converts an input sequence into…

August 5, 2025

In "AI/ML Research"

Encoders and Decoders in Transformer Models

This article is divided into three parts; they are: • Full Transformer Models: Encoder-Decoder Architecture • Encoder-Only Models • Decoder-Only Models The original transformer architecture, introduced in "Attention is All You Need," combines an encoder and decoder specifically designed for sequence-to-sequence (seq2seq) tasks like machine translation.

May 25, 2025

In "AI/ML Research"

Building a Seq2Seq Model with Attention for Language Translation

This post is divided into four parts; they are: • Why Attnetion Matters: Limitations of Basic Seq2Seq Models • Implementing Seq2Seq Model with Attention • Training and Evaluating the Model • Using the Model Traditional seq2seq models use an encoder-decoder architecture where the encoder compresses the input sequence into a…

July 29, 2025

In "AI/ML Research"

AI Generated Robotic Content