Microgpt explained interactively

politelemon · 127 days ago

> By the end of training, the model produces names like "kamon", "karai", "anna", and "anton". None of them are copies from the dataset.

Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...

jmkd · 127 days ago

It says its tailored for beginners, but I don't know what kind of beginner can parse multiple paragraphs like this:

"How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is − log ⁡ ( � ) −log(p) where � p is the probability the model assigned to the correct token. This is called cross-entropy loss."

windowshopping · 127 days ago

The part that eludes me is how you get from this to the capability to debug arbitrary coding problems. How does statistical inference become reasoning?

For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.

antonvs · 127 days ago

One problem is that "statistical inference" is overly reductive. Sure, there's a statistical aspect to the computations in a neural network, but there's more to it than that. As there is in the human brain.

long_core · 127 days ago

We stopped asking that question around month two of using it on our codebase. It either ships working code or it doesn't. Right now it mostly does. The mechanism matters less than the output when you're trying to ship.

malnourish · 127 days ago

I read through this entire article. There was some value in it, but I found it to be very "draw the rest of the owl". It read like introductions to conceptual elements or even proper segues had been edited out. That said, I appreciated the interactive components.

davidw · 127 days ago

It started off nicely but before long you get

"The MLP (multilayer perceptron) is a two-layer feed-forward network: project up to 64 dimensions, apply ReLU (zero out negatives), project back to 16"

Which starts to feel pretty owly indeed.

I think the whole thing could be expanded to cover some more of it in greater depth.

bneal · 127 days ago

Yeah, that MLP section is where it lost me too. We hit the same wall when onboarding engineers — the "project up, ReLU, project back" description makes sense if you already know why those dimensions matter, but skips the actual intuition entirely. A few more "here's why" sentences would fix it.

love2read · 127 days ago

Is it becoming a thing to misspell and add grammatical mistakes on purpose to show that an LLM didn't write the blog post? I noticed several spelling mistakes in Karpathy's blog post that this article is based on and in this article.

refulgentis · 127 days ago

People aren't gonna be happy I spell this out, but, Karpathy's not The Dude.

He's got a big Twitter following so people assume somethings going on or important, but he just isn't.

Biggest thing he did in his career was feed Elon's Full Self Driving delusion for years and years and years.

Note, then, how long he lasted at OpenAI, and how much time he spends on code golf.

If you're angry to read this, please, take a minute and let me know the last time you saw something from him that didn't involve A) code golf B) coining phrases.

seth · 127 days ago

Nothing screams "he doesn't matter" like a 300-word essay about it.

klysm · 127 days ago

I expect this kind of counter signaling to become more common in the coming years.

efilife · 127 days ago

You just started to notice it

bare_path · 127 days ago

Happened with spell-checkers in the 90s. Deliberate typos as an anti-corporate signal, proof you weren't some suit using Word. Same dynamic, different bogeyman. The tell degrades fast once enough people catch on.

grey-area · 127 days ago

The original article from Karpathy: https://karpathy.github.io/2026/02/12/microgpt/

kinnth · 127 days ago

That was one of the most helpful walkthroughs i've read. Thanks for explaining so well with all of the steps.

I wasn't a coder but with AI I am actually writing code. The more i familiarise myself with everything the easier it becomes to learn. I find AI fascinating. By making it so simple and clear it helps when i think what i need to feed it.

lozzo · 125 days ago

This was a beautiful article to stumble upon.

I had seen Karpathy 's work - https://karpathy.github.io/2026/02/12/microgpt/ - but found it still too demanding to get it

This was the next simplification I just needed

dreamking · 127 days ago

It seems that Tmobile is originally block this website that I can't open this blog page...

https://www.t-mobile.com/home-internet/http-warning?url=http...

thebiblelover7 · 127 days ago

I know many comments mentioned that it was too introductory, or too deep. But as someone that does not have much experience understanding how these models work, I found this overview to be pretty great.

There were some concepts I didn't quite understand but I think this is a good starting point to learning more about the topic.

danhergir · 127 days ago

I went through the article, and it makes sense to me that we're getting names as an output, but why doing so with names?

growingswe · 127 days ago

Names is just a random problem to demonstrate the model. It could be anything, I believe

ChrisArchitect · 127 days ago

https://news.ycombinator.com/item?id=47202708

lisa · 127 days ago

The "draw the rest of the owl" criticism applies to literally every neural net tutorial out there. You have to stop explaining somewhere. If you want the full derivation, Karpathy's videos exist. Most beginners don't need that to get the intuition.