Chapter 3: Coding Attention Mechanisms#
Main Chapter Code#
01_main-chapter-code contains the main chapter code.
Bonus Materials#
02_bonus_efficient-multihead-attention implements and compares different implementation variants of multihead-attention
03_understanding-buffers explains the idea behind PyTorch buffers, which are used to implement the causal attention mechanism in chapter 3
In the video below, I provide a code-along session that covers some of the chapter contents as supplementary material.