Supplementary code for the Build a Large Language Model From Scratch book by Sebastian Raschka Code repository: https://github.com/rasbt/LLMs-from-scratch |
![]() |
Chapter 6 Exercise solutions#
Exercise 6.1: Increasing the context length#
We can pad the inputs to the maximum number of tokens the model supports by setting the max length to 1024:
max_length = 1024
train_dataset = SpamDataset(base_path / "train.csv", max_length=max_length, tokenizer=tokenizer)
val_dataset = SpamDataset(base_path / "validation.csv", max_length=max_length, tokenizer=tokenizer)
test_dataset = SpamDataset(base_path / "test.csv", max_length=max_length, tokenizer=tokenizer)
or, equivalently, we can define the max_length
via:
max_length = model.pos_emb.weight.shape[0]
or
max_length = BASE_CONFIG["context_length"]
For convenience, you can run this experiment via
python additional-experiments.py --context_length "model_context_length"
using the code in the ../02_bonus_additional-experiments folder, which results in a substantially worse test accuracy of 78.33% (versus the 95.67% in the main chapter).
Exercise 6.2: Finetuning the whole model#
Instead of finetuning just the final transformer block, we can finetune the entire model by removing the following lines from the code:
for param in model.parameters():
param.requires_grad = False
For convenience, you can run this experiment via
python additional-experiments.py --trainable_layers all
using the code in the ../02_bonus_additional-experiments folder, which results in a 1% improved test accuracy of 96.67% (versus the 95.67% in the main chapter).
Exercise 6.3: Finetuning the first versus last token#
Rather than finetuning the last output token, we can finetune the first output token by changing
model(input_batch)[:, -1, :]
to
model(input_batch)[:, 0, :]
everywhere in the code.
For convenience, you can run this experiment via
python additional-experiments.py --trainable_token first
using the code in the ../02_bonus_additional-experiments folder, which results in a substantially worse test accuracy of 75.00% (versus the 95.67% in the main chapter).