Writing an LLM from scratch, part 22 – training our LLM

177 points by gpjt 10 hours ago

mettamage 9 hours ago

Here's part 1 [1]. Since his archive goes by date, it makes it a bit easier to guestimate which part is made in which month.

[1] https://www.gilesthomas.com/2024/12/llm-from-scratch-1

3abiton 19 minutes ago

It's interesting 22 parts in under a year, seems like a fun up to date project. Karpathy did something very similar with nanochat (following nanogpt).

mrasong 3 hours ago

The cost comparison between local RTX 3090 and cloud A100 clusters is useful, but I wonder if the author accounted for hidden overhead—like data transfer time for large datasets or the time spent debugging CUDA compatibility issues on local hardware.

js8 3 hours ago

It's based on a book https://www.manning.com/books/build-a-large-language-model-f..., is it a good book?

checker659 2 hours ago

I have done a little bit of DL stuff (with keras) before this. I'm currently in the attention chapter. The book gives you the code, but I feel like there is very little in the way of building intuition. Thankfully, there are tons of videos online to help with that.
I think it is a great guide. An extended tutorial if you will (at least until this point in my reading). Also having the code right in front of you helps a lot. For example, I was under the impression that embedding vectors were static like in word2vec. Turns out, they are learnable parameters too. I wouldn't have been able to tell for sure if I didn't have the code right in front of me.

roschdal 4 hours ago

Nice, this is a recipe for making an evil AI which will destroy humanity.