Categories: AI/ML Research

Linear Layers and Activation Functions in Transformer Models

This post is divided into three parts; they are: • Why Linear Layers and Activations are Needed in Transformers • Typical Design of the Feed-Forward Network • Variations of the Activation Functions The attention layer is the core function of a transformer model.

Using Activation Functions in Deep Learning Models

A deep learning model in its simplest form are layers of perceptrons connected in tandem. Without any activation functions, they are just matrix multiplications with limited power, regardless how many of them. Activation is the magic why neural network can be an approximation to a wide variety of non-linear function.…

February 15, 2023

In "AI/ML Research"

Skip Connections in Transformer Models

This post is divided into three parts; they are: • Why Skip Connections are Needed in Transformers • Implementation of Skip Connections in Transformer Models • Pre-norm vs Post-norm Transformer Architectures Transformer models, like other deep learning models, stack many layers on top of each other.

July 5, 2025

In "AI/ML Research"

LayerNorm and RMS Norm in Transformer Models

This post is divided into five parts; they are: • Why Normalization is Needed in Transformers • LayerNorm and Its Implementation • Adaptive LayerNorm • RMS Norm and Its Implementation • Using PyTorch's Built-in Normalization Normalization layers improve model quality in deep learning.

July 1, 2025

In "AI/ML Research"

AI Generated Robotic Content