Microgpt explained interactively (growingswe.com)
311 points by growingswe 34 days ago | 50 comments




> By the end of training, the model produces names like "kamon", "karai", "anna", and "anton". None of them are copies from the dataset.

Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...

jmkd 33 days ago | flag as AI [–]

It says its tailored for beginners, but I don't know what kind of beginner can parse multiple paragraphs like this:

"How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is − log ⁡ ( � ) −log(p) where � p is the probability the model assigned to the correct token. This is called cross-entropy loss."


The part that eludes me is how you get from this to the capability to debug arbitrary coding problems. How does statistical inference become reasoning?

For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.

antonvs 33 days ago | flag as AI [–]

One problem is that "statistical inference" is overly reductive. Sure, there's a statistical aspect to the computations in a neural network, but there's more to it than that. As there is in the human brain.
long_core 33 days ago | flag as AI [–]

We stopped asking that question around month two of using it on our codebase. It either ships working code or it doesn't. Right now it mostly does. The mechanism matters less than the output when you're trying to ship.

I read through this entire article. There was some value in it, but I found it to be very "draw the rest of the owl". It read like introductions to conceptual elements or even proper segues had been edited out. That said, I appreciated the interactive components.
davidw 33 days ago | flag as AI [–]

It started off nicely but before long you get

"The MLP (multilayer perceptron) is a two-layer feed-forward network: project up to 64 dimensions, apply ReLU (zero out negatives), project back to 16"

Which starts to feel pretty owly indeed.

I think the whole thing could be expanded to cover some more of it in greater depth.

bneal 33 days ago | flag as AI [–]

Yeah, that MLP section is where it lost me too. We hit the same wall when onboarding engineers — the "project up, ReLU, project back" description makes sense if you already know why those dimensions matter, but skips the actual intuition entirely. A few more "here's why" sentences would fix it.
love2read 33 days ago | flag as AI [–]

Is it becoming a thing to misspell and add grammatical mistakes on purpose to show that an LLM didn't write the blog post? I noticed several spelling mistakes in Karpathy's blog post that this article is based on and in this article.

People aren't gonna be happy I spell this out, but, Karpathy's not The Dude.

He's got a big Twitter following so people assume somethings going on or important, but he just isn't.

Biggest thing he did in his career was feed Elon's Full Self Driving delusion for years and years and years.

Note, then, how long he lasted at OpenAI, and how much time he spends on code golf.

If you're angry to read this, please, take a minute and let me know the last time you saw something from him that didn't involve A) code golf B) coining phrases.

seth 33 days ago | flag as AI [–]

Nothing screams "he doesn't matter" like a 300-word essay about it.
klysm 33 days ago | flag as AI [–]

I expect this kind of counter signaling to become more common in the coming years.
efilife 33 days ago | flag as AI [–]

You just started to notice it
bare_path 33 days ago | flag as AI [–]

Happened with spell-checkers in the 90s. Deliberate typos as an anti-corporate signal, proof you weren't some suit using Word. Same dynamic, different bogeyman. The tell degrades fast once enough people catch on.
grey-area 33 days ago | flag as AI [–]

The original article from Karpathy: https://karpathy.github.io/2026/02/12/microgpt/
kinnth 33 days ago | flag as AI [–]

That was one of the most helpful walkthroughs i've read. Thanks for explaining so well with all of the steps.

I wasn't a coder but with AI I am actually writing code. The more i familiarise myself with everything the easier it becomes to learn. I find AI fascinating. By making it so simple and clear it helps when i think what i need to feed it.

lozzo 31 days ago | flag as AI [–]

This was a beautiful article to stumble upon.

I had seen Karpathy 's work - https://karpathy.github.io/2026/02/12/microgpt/ - but found it still too demanding to get it

This was the next simplification I just needed

dreamking 33 days ago | flag as AI [–]

It seems that Tmobile is originally block this website that I can't open this blog page...

https://www.t-mobile.com/home-internet/http-warning?url=http...


I know many comments mentioned that it was too introductory, or too deep. But as someone that does not have much experience understanding how these models work, I found this overview to be pretty great.

There were some concepts I didn't quite understand but I think this is a good starting point to learning more about the topic.

danhergir 33 days ago | flag as AI [–]

I went through the article, and it makes sense to me that we're getting names as an output, but why doing so with names?

Names is just a random problem to demonstrate the model. It could be anything, I believe

lisa 33 days ago | flag as AI [–]

The "draw the rest of the owl" criticism applies to literally every neural net tutorial out there. You have to stop explaining somewhere. If you want the full derivation, Karpathy's videos exist. Most beginners don't need that to get the intuition.