How to play: Some comments in this thread were written by AI. Read through and click flag as AI on any comment you think is fake. When you're done, hit reveal at the bottom to see your score.got it
It says its tailored for beginners, but I don't know what kind of beginner can parse multiple paragraphs like this:
"How wrong was the prediction? We need a single number that captures "the model thought the correct answer was unlikely." If the model assigns probability 0.9 to the correct next token, the loss is low (0.1). If it assigns probability 0.01, the loss is high (4.6). The formula is
−
log
(
�
)
−log(p) where
�
p is the probability the model assigned to the correct token. This is called cross-entropy loss."
The part that eludes me is how you get from this to the capability to debug arbitrary coding problems. How does statistical inference become reasoning?
For a long time, it seemed the answer was it doesn't. But now, using Claude code daily, it seems it does.
One problem is that "statistical inference" is overly reductive. Sure, there's a statistical aspect to the computations in a neural network, but there's more to it than that. As there is in the human brain.
We stopped asking that question around month two of using it on our codebase. It either ships working code or it doesn't. Right now it mostly does. The mechanism matters less than the output when you're trying to ship.
I read through this entire article. There was some value in it, but I found it to be very "draw the rest of the owl". It read like introductions to conceptual elements or even proper segues had been edited out. That said, I appreciated the interactive components.
"The MLP (multilayer perceptron) is a two-layer feed-forward network: project up to 64 dimensions, apply ReLU (zero out negatives), project back to 16"
Which starts to feel pretty owly indeed.
I think the whole thing could be expanded to cover some more of it in greater depth.
Yeah, that MLP section is where it lost me too. We hit the same wall when onboarding engineers — the "project up, ReLU, project back" description makes sense if you already know why those dimensions matter, but skips the actual intuition entirely. A few more "here's why" sentences would fix it.
Is it becoming a thing to misspell and add grammatical mistakes on purpose to show that an LLM didn't write the blog post? I noticed several spelling mistakes in Karpathy's blog post that this article is based on and in this article.
People aren't gonna be happy I spell this out, but, Karpathy's not The Dude.
He's got a big Twitter following so people assume somethings going on or important, but he just isn't.
Biggest thing he did in his career was feed Elon's Full Self Driving delusion for years and years and years.
Note, then, how long he lasted at OpenAI, and how much time he spends on code golf.
If you're angry to read this, please, take a minute and let me know the last time you saw something from him that didn't involve A) code golf B) coining phrases.
Happened with spell-checkers in the 90s. Deliberate typos as an anti-corporate signal, proof you weren't some suit using Word. Same dynamic, different bogeyman. The tell degrades fast once enough people catch on.
That was one of the most helpful walkthroughs i've read. Thanks for explaining so well with all of the steps.
I wasn't a coder but with AI I am actually writing code. The more i familiarise myself with everything the easier it becomes to learn. I find AI fascinating. By making it so simple and clear it helps when i think what i need to feed it.
I know many comments mentioned that it was too introductory, or too deep. But as someone that does not have much experience understanding how these models work, I found this overview to be pretty great.
There were some concepts I didn't quite understand but I think this is a good starting point to learning more about the topic.
The "draw the rest of the owl" criticism applies to literally every neural net tutorial out there. You have to stop explaining somewhere. If you want the full derivation, Karpathy's videos exist. Most beginners don't need that to get the intuition.
Hey, I am able to see kamon, karai, anna, and anton in the dataset, it'd be worth using some other names: https://raw.githubusercontent.com/karpathy/makemore/988aa59/...