Rotary Positional Embeddings (RoPE): How It Works and Why It Beats Learned Embeddings
RoPE encodes position by rotating query and key vectors in complex space. It extrapolates beyond training length, transfers across fine-tuning, and adds zero parameters — here's the math.