Skip to main content
Back to top
Ctrl
+
K
LLMs from Scratch
Build a Large Language Model (From Scratch)
Setup
Optional Setup Instructions
Chapters
Chapter 1: Understanding Large Language Models
Chapter 2: Working with Text Data
Chapter 2: Working with Text Data
Chapter 2: Working with Text Data
The Main Data Loading Pipeline Summarized
Chapter 2 Exercise solutions
Chapter 2: Working with Text Data
Comparing Various Byte Pair Encoding (BPE) Implementations
Chapter 2: Working with Text Data
Understanding the Difference Between Embedding Layers and Linear Layers
Chapter 2: Working with Text Data
Data sampling with a sliding window with number data
Byte Pair Encoding (BPE) Tokenizer From Scratch
Byte Pair Encoding (BPE) Tokenizer From Scratch
Chapter 3: Coding Attention Mechanisms
Chapter 3: Coding Attention Mechanisms
Chapter 3: Coding Attention Mechanisms
Multi-head Attention Plus Data Loading
Chapter 3 Exercise solutions
More Efficient Multi-Head Attention Implementations
Comparing Efficient Multi-Head Attention Implementations
Understanding PyTorch Buffers
Understanding PyTorch Buffers
Chapter 4: Implementing a GPT Model from Scratch to Generate Text
Chapter 4: Implementing a GPT Model from Scratch To Generate Text
Chapter 4: Implementing a GPT model from Scratch To Generate Text
Chapter 4 Exercise solutions
Chapter 4: Implementing a GPT Model from Scratch To Generate Text
FLOPS Analysis
Chapter 5: Pretraining on Unlabeled Data
Chapter 5: Pretraining on Unlabeled Data
Chapter 5: Pretraining on Unlabeled Data
Chapter 5 Exercise solutions
Alternative Approaches to Loading Pretrained Weights
Bonus Code for Chapter 5
Bonus Code for Chapter 5
Bonus Code for Chapter 5
Pretraining GPT on the Project Gutenberg Dataset
Adding Bells and Whistles to the Training Loop
Optimizing Hyperparameters for Pretraining
Building a User Interface to Interact With the Pretrained LLM
Converting GPT to Llama
Converting a From-Scratch GPT Architecture to Llama 2
Converting Llama 2 to Llama 3.2 From Scratch
Llama 3.2 From Scratch (A Standalone Notebook)
Llama 3.2 From Scratch (A Standalone Notebook)
Memory-efficient Model Weight Loading
Memory-efficient Model Weight Loading
Extending the Tiktoken BPE Tokenizer with New Tokens
Extending the Tiktoken BPE Tokenizer with New Tokens
PyTorch Performance Tips for Faster LLM Training
Chapter 6: Finetuning for Classification
Chapter 6: Finetuning for Classification
Chapter 6: Finetuning for Text Classification
Load And Use Finetuned Model
Chapter 6 Exercise solutions
Additional Classification Finetuning Experiments
Additional Experiments Classifying the Sentiment of 50k IMDB Movie Reviews
Scikit-learn Logistic Regression Model
Building a User Interface to Interact With the GPT-based Spam Classifier
Chapter 7: Finetuning to Follow Instructions
Chapter 7: Finetuning to Follow Instructions
Chapter 7: Finetuning To Follow Instructions
Load And Use Finetuned Model
Chapter 7 Exercise solutions
Chapter 7: Finetuning to Follow Instructions
Create “Passive Voice” Entries for an Instruction Dataset
Chapter 7: Finetuning to Follow Instructions
Evaluating Instruction Responses Locally Using a Llama 3 Model Via Ollama
Evaluating Instruction Responses Using the OpenAI API
Score Correlation Analysis
Chapter 7: Finetuning to Follow Instructions
Generating A Preference Dataset With Llama 3.1 70B And Ollama
Direct Preference Optimization (DPO) for LLM Alignment (From Scratch)
Generating Datasets for Instruction Finetuning
Generating An Instruction Dataset via Llama 3 and Ollama
Improving Instruction-Data Via Reflection-Tuning Using GPT-4
Building a User Interface to Interact With the Instruction Finetuned GPT Model
Appendices
Appendix A: Introduction to PyTorch
Appendix A: Introduction to PyTorch
Appendix A: Introduction to PyTorch (Part 1)
Appendix A: Introduction to PyTorch (Part 2)
Exercise A.1
Python and Environment Setup Recommendations
Appendix D: Adding Bells and Whistles to the Training Loop
Appendix D: Adding Bells and Whistles to the Training Loop
Appendix E: Parameter-efficient Finetuning with LoRA
Appendix E: Parameter-efficient Finetuning with LoRA
Repository
Open issue
Index