Language Model Architectures: Transformers, Attention, and the Path from GPT-1 to GPT-4
Modern LLMs are Transformers. Understand the evolution: self-attention, positional encoding, scaling laws, and how each architectural change improved performance.