More Efficient Multi-Head Attention Implementations#
mha-implementations.ipynb contains and compares different implementations of multi-head attention
Summary#
The figures below summarize the performance benchmarks (lower is better).
Forward pass only#
Forward and backward pass#