Supplementary code for the Build a Large Language Model From Scratch book by Sebastian Raschka

Code repository: https://github.com/rasbt/LLMs-from-scratch

Chapter 6 Exercise solutions#

Exercise 6.1: Increasing the context length#

We can pad the inputs to the maximum number of tokens the model supports by setting the max length to 1024:

max_length = 1024

train_dataset = SpamDataset(base_path / "train.csv", max_length=max_length, tokenizer=tokenizer)
val_dataset = SpamDataset(base_path / "validation.csv", max_length=max_length, tokenizer=tokenizer)
test_dataset = SpamDataset(base_path / "test.csv", max_length=max_length, tokenizer=tokenizer)

or, equivalently, we can define the max_length via:

max_length = model.pos_emb.weight.shape[0]

or

max_length = BASE_CONFIG["context_length"]

For convenience, you can run this experiment via

python additional-experiments.py --context_length "model_context_length"

using the code in the ../02_bonus_additional-experiments folder, which results in a substantially worse test accuracy of 78.33% (versus the 95.67% in the main chapter).

Exercise 6.2: Finetuning the whole model#

Instead of finetuning just the final transformer block, we can finetune the entire model by removing the following lines from the code:

for param in model.parameters():
    param.requires_grad = False

For convenience, you can run this experiment via

python additional-experiments.py --trainable_layers all

using the code in the ../02_bonus_additional-experiments folder, which results in a 1% improved test accuracy of 96.67% (versus the 95.67% in the main chapter).

Exercise 6.3: Finetuning the first versus last token#

Rather than finetuning the last output token, we can finetune the first output token by changing

model(input_batch)[:, -1, :]

to

model(input_batch)[:, 0, :]

everywhere in the code.

For convenience, you can run this experiment via

python additional-experiments.py --trainable_token first

using the code in the ../02_bonus_additional-experiments folder, which results in a substantially worse test accuracy of 75.00% (versus the 95.67% in the main chapter).