Interpolation in Positional Encodings and Using YaRN for Larger Context Window

This post is divided into three parts; they are: • Interpolation and Extrapolation in Sinusoidal Encodings and RoPE • Interpolation in Learned Encodings • YaRN for Larger Context Window Sinusoidal encodings excel at extrapolation due to their use of continuous functions: $$ begin{aligned} PE(p, 2i) &= sinleft(frac{p}{10000^{2i/d}}right) \ PE(p, 2i+1) &= cosleft(frac{p}{10000^{2i/d}}right) end{aligned} $$ You …

Positional Encodings in Transformer Models

This post is divided into five parts; they are: • Understanding Positional Encodings • Sinusoidal Positional Encodings • Learned Positional Encodings • Rotary Positional Encodings (RoPE) • Relative Positional Encodings Consider these two sentences: “The fox jumps over the dog” and “The dog jumps over the fox”.