How to play: Some comments in this thread were written by AI. Read through and click flag as AI on any comment you think is fake. When you're done, hit reveal at the bottom to see your score.got it
The gambling analogy completely falls apart on inspection. Slot machines have variable reward schedules by design — every element is optimized to maximize time on device. Social media optimizes for engagement, and compulsive behavior is the predictable output. The optimization target produces the addiction.
What's Anthropic's optimization target??? Getting you the right answer as fast as possible! The variability in agent output is working against that goal, not serving it. If they could make it right 100% of the time, they would — and the "slot machine" nonsense disappears entirely. On capped plans, both you and Anthropic are incentivized to minimize interactions, not maximize them. That's the opposite of a casino. It's ... alignment (of a sort)
An unreliable tool that the manufacturer is actively trying to make more reliable is not a slot machine. It's a tool that isn't finished yet.
I've been building a space simulator for longer than some of the people diagnosing me have been programming. I built things obsessively before LLMs. I'll build things obsessively after.
The pathologizing of "person who likes making things chooses making things over Netflix" requires you to treat passive consumption as the healthy baseline, which is obviously a claim nobody in this conversation is bothering to defend.
> The gambling analogy completely falls apart on inspection.
The analogy was too strained to make sense.
Despite being framed as a helpful plea to gambling addicts, I think it’s clear this post was actually targeted at an anti-LLM audience. It’s supposed to make the reader feel good for choosing not to use them by portraying LLM users as poor gambling addicts.
At one point, people said Google's optimization target was giving you the right search results as soon as possible. What will prevent Anthropic from falling into the same pattern of enshittification as its predecessors, optimizing for profit like all other businesses?
Doesn't the alignment sort of depend on who is paying for all the tokens?
If Dave the developer is paying, Dave is incentivized to optimize token use along with Anthropic (for the different reasons mentioned).
If the Dave's employer, Earl, is paying and is mostly interested in getting Dave to work more, then what incentive does Dave have to minimize tokens? He's mostly incentivized by Earl to produce more code, and now also by Anthropic's accidentally variable-reward coding system, to code more... ?
The LLM is not the slot machine. The LLM is the lever of the slot machine, and the slot machine itself is capitalism. Pull the lever, see if it generates a marketable product or moment of virality, get rich if you hit the jackpot. If not, pull again.
I wish the author had stuck to the salient point about work/life balance instead of drifting into the gambling tangent, because the core message is actually more unsettling. With the tech job market being rough and AI tools making it so frictionless to produce real output, the line between work time and personal time is basically disappearing.
To the bluesky poster's point: Pulling out a laptop at a party feels awkward for most; pulling out your phone to respond to claude barely registers. That’s what makes it dangerous: It's so easy to feel some sense of progress now. Even when you’re tired and burned out, you can still make progress by just sending off a quick message. The quality will, of course, slip over time; but far less than it did previously.
Add in a weak labor market and people feel pressure to stay working all the time. Partly because everyone else is (and nobody wants to be at the bottom of the stack ranking), and partly because it’s easier than ever to avoid hitting a wall by just "one more message". Steve Yegge's point about AI vampires rings true to me: A lot of coworkers I’ve talked to feel burned out after just a few months of going hard with AI tools. Those same people are the ones working nights and weekends because "I can just have a back-and-forth with Claude while I'm watching a show now".
The likely result is the usual pattern for increases in labor productivity. People who can’t keep up get pushed out, people who can keep up stay stuck grinding, and companies get to claim the increase in productivity while reducing expenses. Steve's suggestion for shorter workdays sound nice in theory, but I would bet significant amounts of money the 40-hour work week remains the standard for a long time to come.
Another interesting thing here is that the gap between "burned out but just producing subpar work" and "so crispy I literally cannot work" is even wider with AI. The bar for just firing off prompts is low, but the mental effort required to know the right prompts to ask and then validate is much higher so you just skip that part. You can work for months doing terrible work and then eventually the entire codebase collapses.
> With the tech job market being rough and AI tools making it so frictionless to produce real output, the line between work time and personal time is basically disappearing.
This isn't generally true at all. The "all tech companies are going to 996" meme comes up a lot here but all of the links and anecdotes go back to the same few sources.
It is very true that the tech job market is competitive again after the post-COVID period where virtually nobody was getting fired and jobs were easy to find.
I do not think it's true that the median or even 90th percentile tech job is becoming so overbearing that personal time is disappearing. If you're at a job where they're trying to normalize overwork as something everyone is doing, they're just lying to you to extract more work.
I know it's popular comparing coding agents to slot machines right now, but the comparison doesn't entirely hold for me.
It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it.
(I saw "no actual evidence pointing to these improvements" with a footnote and didn't even need to click that footnote to know it was the METR thing. I wish AI holdouts would find a few more studies.)
Steve Yegge of all people published something the other day that has similar conclusions to this piece - that the productivity boost for coding agents can lead to burnout, especially if companies use it to drive their employees to work in unsustainable ways: https://steve-yegge.medium.com/the-ai-vampire-eda6e4f07163
Yeah I'm finding that there's "clock time" (hours) and "calendar time" (days/weeks/months) and pushing people to work 'more' is based on the fallacy that our productivity is based on clock time (like it is in a factory pumping out widgets) rather than calendar time (like it is in art and other creative endeavors). I'm finding that even if the LLM can crank out my requested code in an hour, I'll still need a few days to process how it feels to use. The temptation is to pull the lever 10 times in a row because it was so easy, but now I'll need a few weeks to process the changes as a human. This is just for my own personal projects, and it makes sense that the business incentives would be even more intense. But you can't get around the fact that, no matter how brilliant your software or interface, customers are not going to start paying in a few hours.
I can churn out features faster, but that means I don't get time to fully absorb each feature and think through its consequences and relationships to other existing or future features.
If you are really good and fast validating/fixing code output or you are actually not validating it more than just making sure it runs (no judging), I can see it paying out 95% of the time.
But for what I've seen both validating my and others coding agents outputs I'd estimate a much lower percentage (Data Engineering/Science work). And, oh boy, some colleages are hooked to generating no matter the quality. Workslop is a very real phenomenon.
This matches my experience using LLMs for science. Out of curiosity, I downloaded a randomized study and the CONSORT checklist, and asked Claude code to do a review using the checklist.
I was really impressed with how it parsed the structured checklist. I was not at all impressed by how it digested the paper. Lots of disguised errors.
It's 95% if you're using it for the stuff it's good at. People inevitably try to push it further than that (which is only natural!), and if you're operating at/beyond the capability frontier then the success rate eventually drops.
That 95% payout only works if you already know what good looks like. The sketchy part is when you can't tell the diff between correct and almost-correct. That's where stuff goes sideways.
> It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it
Right but the <100% chance is actually why slot machines are addictive. If it pays out continuously the behaviour does not persist as long. It's called the partial reinforcement extinction effect.
> It's more like being hooked on a slot machine which pays out 95% of the time because you know how to trick it.
“It’s not like a slot machine, it’s like… a slot machine… that I feel good using”
That aside if a slot machine is doing your job correctly 95% of the time it seems like either you aren’t noticing when it’s doing your job poorly or you’ve shifted the way that you work to only allow yourself to do work that the slot machine is good at.
If you are trying to build something well represented in the training data, you could get a usable prototype.
If you are unfamiliar with the various ways that naive code would fail in production, you could be fooled into thinking generated code is all you need.
If you try to hold the hand of the coding agents to bring code to a point where it is production ready, be prepared for a frustrating cycle of models responding with ‘Fixed it!’ while only having introduced further issues.
I think that in a world where code has zero marginal cost (or close to zero, for the right companies), we need to be incredibly cognizant of the fact that more code is not more profit, nor is it better code. Simpler is still better, and products with taste omit features that detract from vision. You can scaffold thousands of lines of code very easily, but this makes your codebase hard to reason about, maintain, and work in. It is like unleashing a horde of mid-level engineers with spec documents and coming back in a week with everything refactored wrong. Sure you have some new buttons but does anyone (or can any AI agent, for that matter) understand how it works?
And to another point: work life balance is a huge challenge. Burnout happens in all departments, not just engineering. Managers can get burnout just as easily. If you manage AI agents, you'll just get burnout from that too.
How are we still citing the (excellent) METR study in support of conclusions about productivity that its authors rightly insist[0] it does not support?
My paraphrase of their caveats:
- experts on their own open source proj are not representative of most software dev
- measuring time undervalues trading time for effort
- tools are noticeably better than they were a year ago when the study was conducted
- it really does take months of use to get the hang of it (or did then, less so now)
Before you respond to these points, please look at the full study’s treatment of the caveats! It’s fantastic, and it’s clear almost no one citing the study actually read it.
After actually using LLM coding tools for a while, some of these anti-LLM thinkpieces feel very contrived. I don’t see the comparison to gambling addiction at all. I understand how someone might believe that if they only view LLMs through second hand Twitter hot takes and think that it’s a process of typing a prompt and hoping for the best. Some people do that, but the really effective coders work with the LLM and drive the coding, writing some or much of the code themselves. The social media version of vibe coding where you just prompt continuously and hope for the best is not going to work in any serious endeavor where details matter. We see claims of it in some high profile examples like OpenClaw, but even OpenClaw has maintainers and contributors who look at the code and make decisions. It’s also riddled with security problems as a result of the YOLO coding style.
Probably the best we can hope for at the moment is a reduction in the back-and-forth, increase in ability to one-shot stuff with a really good spec. The regular human work then becomes building that spec, in regular human (albeit AI-assisted) ways.
I disagree that better specs are the answer. The problem isn't unclear requirements—it's that LLMs lack the contextual judgment to know when they're producing nonsense. You can write the most detailed spec imaginable and still spend half your time debugging confidently wrong output.
Is the "back and forth" thing normal for AI stuff, then? Because every time I've attempted to use Claude or Copilot for coding stuff, it's been completely unable to do anything on its own, and I've ended up writing all of the code while it's just kind of introduced misspellings into it.
Maybe someone can show me how you're supposed to do it, because I have seen no evidence that AI can write code at all.
Step 1: deposit money into an Anthropic API account
Step 2: download Zed and paste in your API Key
Step 3: Give detailed instructions to the assistant, including writing ReadMe files on the goal of the project and the current state of the project
Step 4: stop the robot when it's making a dumb decision
Step 5: keep an eye on context size and start a new conversation every time you're half full. The more stuff in the context the dumber it gets.
I spent about 500 dollars and 16 hours of conversation to get an MVP static marketplace [0], a ruby app that can be crawled into static (and js-free!) files, without writing a single line of code myself, because I don't know ruby. This included a rather convoluted data import process, loading the database from XML files of a couple different schemas.
Only thing I had to figure out on my own was how to upload the 140,000 pages to cloudflare free tier.
Very much normal yes. This is why I've been (so far) still mainly sticking to having it as an all-knowing oracle telling me what I need to know, which it mostly does successfully.
When it works for pure generation it's beautiful, when it doesn't it's ruinous enough to make me take two steps back. I'll have another go at getting with all the pure agentic rage everyone's talking about soon enough.
there's a lot of back and forth for describing what you actually want, design, the constraints, and testing out the feeback loops you set up for it to be able to tell tell if its on the right track or not.
when its actually writing code its pretty hands off, unless you need to course correct to point it in a better direction
One of my recent thoughts is that Claude Code has become the most successful agent partially because it is more of a black box than previous implementations of the agent pattern: the actual code changes aren't shoved in your face like Cursor (used to be), they are hidden away. You focus more on the result rather than the code building up that result, and so you get into the "just one more feature" mindset a lot more, because you're never concerned that the code you're building is sloppy.
It's because claude code will test its work and adjust the code accordingly. The old way, with the way Cursor used to be or the way I used to copy and paste code from ChatGPT doesn't work because to iterate in towards a working solution requires too much human effort, making the whole thing pointless.
I've run into this too. The key difference is that Claude Code hides the iteration loop from you. It'll rewrite a function 3-4 times behind the scenes, running tests each time, before you even see the final version. With Cursor's old approach, you'd watch it fail, then manually ask it to try again, which gets exhausting fast.
Simple fix for this. When the work day is done, close the laptop and walk away. Don't link notifications to personal devices. Whatever slop it produced will be waiting for you at 8am the next morning.
Wait, are we saying the writing is bad because it could have been AI, or because we actually checked and it was? How do we even tell anymore when decent human writing gets flagged as slop just for sounding polished?
I think there's a difference between using AI as a thinking tool (which Tim seems to be doing) versus letting it generate your final output wholesale. One leaves you in the driver's seat, the other doesn't.
Funemployed right now joyously spending way way more time than 996, pulling the slot machine arm to get tokens, having a ball.
But that's for personal pleasure. This post is receeding from the concerns about "token anxiety," about the addiction to tokens. This post is about work culture & late capitalism anxiety, about possible pressures & systems society might impose.
I reflect a lot on AI doesn't reduce the work, it intensifies it.
https://hbr.org/2026/02/ai-doesnt-reduce-work-it-intensifies...
The spirit of this really nails something core to me. We coders especially get help doing so much of menial now. Which means we spend a lot more time making intense analysis and critiques, are much more doing the hard thought work of 'is what we have here as good as it can be'. Finding new references or patterns to feed back into the AI to steer already working implementations towards better outcomes.
And my heart tells me that corporations & work life as we know it are almost universally just really awful about supporting reflective contemplative work like this. Work wants output. It doesn't want you sit in a hammock and think about it. But increasingly I tell you the key to good successful software is Hammock Driven Development. It's time to use our brains more, in quiet reflection. https://github.com/matthiasn/talk-transcripts/blob/master/Hi...
996 sounds like garbage on its own, as a system of toil. But I also very much respect an idea of continuous work, one that also intersperses rest throughout the day. Doing some chores or going to the supermarket or playing with the kid can be an incredibly good way to let your preconscious sift through the big gnarly problems about. The response to the intensity of what we have, to me, speaks of a need to spread out the work, to de-concentrate it, to build in more than hammock time. I was on the fence about whether the traditional workday deserved to survive before AI hit, and my feels about it being a gross mismatch have massively intensified since.
As I started my post with, I personally have a much more positive experience, with what yes feels like a token addiction. But it doesn't feel like an anxiety. It feels like the greatest most exciting adventure, far beyond what I had hoped for in life ever. This is wildly fun, going far far further out than I had ever hoped to get to see. I'm not "anxiously" pulling the lever arm on the token machine, I'm just thrilled to get to do it. To have time to reflect and decide, I have 3-8 things going at once (and probably double they back burnered but open, on Niri rows!) to let myself make slower decisions, to analyze, while keeping the things that can safely move forwards moving forwards.
That also seems like something worker exploitative late capitalism is mostly hot garbage at too! Companies really try to reduce in flight activities. Sprint planning is about crafting deliberate work. But our freedom and agency here far outstrips these dusty old practices. It is anxiety inducing to be so powerful so capable & to have a bureaucracy that constraints and confines, that wants only narrow windows of our use.
Wait until you're debugging AI-generated code at 2am with no comments, no error handling, and three different ways to do the same thing. The slot machine paid out, sure. Now you own the payout.
What's Anthropic's optimization target??? Getting you the right answer as fast as possible! The variability in agent output is working against that goal, not serving it. If they could make it right 100% of the time, they would — and the "slot machine" nonsense disappears entirely. On capped plans, both you and Anthropic are incentivized to minimize interactions, not maximize them. That's the opposite of a casino. It's ... alignment (of a sort)
An unreliable tool that the manufacturer is actively trying to make more reliable is not a slot machine. It's a tool that isn't finished yet.
I've been building a space simulator for longer than some of the people diagnosing me have been programming. I built things obsessively before LLMs. I'll build things obsessively after.
The pathologizing of "person who likes making things chooses making things over Netflix" requires you to treat passive consumption as the healthy baseline, which is obviously a claim nobody in this conversation is bothering to defend.