Writing an LLM from scratch, part 22 -- finally training our LLM!

Giles Thomas, Giles Thomas

October 16, 2025 at 01:42 AM

Joy (70%)

positive

Writing an LLM from scratch, part 22 -- finally training our LLM!

Key Takeaways

The author completed Chapter 5 of Sebastian Raschka's LLM book, finding the implementation phase exciting despite initial difficulty with cross-entropy loss.
A small model trained on 20,000 characters produced surprisingly coherent text, which improved dramatically after loading pre-trained GPT-2 weights.
The author recommends typing all code manually but warns that replicating exact random outputs from the book is difficult due to unseeded randomness in auxiliary functions.
For validation, the author suggests focusing on the general trend of decreasing training loss rather than exact numerical matches.
The chapter moves into practical implementation details, including the use of optimizers beyond basic stochastic gradient descent.

The author wraps up their notes on Chapter 5 of Sebastian Raschka's "Build a Large Language Model (from Scratch)", finding the concepts of cross-entropy loss and perplexity difficult, but the subsequent coding highly rewarding as the model begins to generate text. After training a small model on a tiny dataset from Edith Wharton's "The Verdict," the author observed surprisingly coherent output, and loading the 124M-parameter GPT-2 weights resulted in remarkably coherent text generation, suggesting success. The author strongly advises readers to manually type and run all the code, but cautions that matching the book's exact random outputs is challenging because various helper functions introduce hidden randomness that is hard to perfectly sequence. Nevertheless, the author suggests that as long as the training loss decreases steadily and validation loss plateaus as shown in the book, the implementation is successful, before briefly touching upon the topic of optimizers beyond simple SGD.

positive

OpenAI is the world's most valuable private company after private stock sale | TechCrunch

OpenAI achieved a record $500 billion private valuation following a $6.6 billion share sale from employees, though the cash went to individuals, not the company.

Russell Brandom

Venture Capital and Private Equity, Technology Company Valuation, Artificial Intelligence Industry +2

TechCrunch

70%Oct 2

neutral

A new a16z report looks at which AI companies startups are actually paying for | TechCrunch

Andreessen Horowitz's first AI Spending Report, based on Mercury data, reveals startups are adopting a wide variety of AI tools, with major labs dominating spending and a strong presence of productivity-boosting 'copilots'.

Dominic-Madori Davis

Venture Capital and Investment, Artificial Intelligence, Software Spending Trends +2

TechCrunch

60%Oct 2

neutral

Perplexity’s Comet AI browser now free; Max users get new 'background assistant' | TechCrunch

AI search startup Perplexity has made its Comet browser free globally to compete with major browsers and search engines, while also launching a new 'background assistant' feature for paid subscribers.

Rebecca Bellan

Web Browsers, Artificial Intelligence, Software Competition +2

TechCrunch

40%Oct 2

negative

Ex-OpenAI researcher dissects one of ChatGPT's delusional spirals | TechCrunch

A former OpenAI researcher analyzed the case of a Canadian man who developed a delusion about a new form of mathematics after prolonged interaction with ChatGPT, highlighting serious safety and support failures by the company.

Maxwell Zeff

AI Safety and Ethics, Large Language Models, Mental Health and Technology +2

TechCrunch

70%Oct 2

positive

OpenAI's Sora soars to No. 3 on the US App Store | TechCrunch

OpenAI's invite-only Sora video app achieved a strong launch, ranking as the No. 3 Top Overall app on the U.S. App Store despite initial geographic and access restrictions.

Sarah Perez

Mobile App Performance, Artificial Intelligence, Software Launch Analysis +1

TechCrunch

70%Oct 2

Writing an LLM from scratch, part 22 -- finally training our LLM!

Key Takeaways

Related Articles

OpenAI is the world's most valuable private company after private stock sale | TechCrunch

A new a16z report looks at which AI companies startups are actually paying for | TechCrunch

Perplexity’s Comet AI browser now free; Max users get new 'background assistant' | TechCrunch

Ex-OpenAI researcher dissects one of ChatGPT's delusional spirals | TechCrunch

OpenAI's Sora soars to No. 3 on the US App Store | TechCrunch