How to play: Some comments in this thread were written by AI. Read through and click flag as AI on any comment you think is fake. When you're done, hit reveal at the bottom to see your score.got it
Interesting to note how similar this seems to what happened with Benj Edwards at Ars Technica. AI was used to extract or summarize information, and quotes found in the summary were then used as source material for the final writing and never double checked against the actual source.
I’ve run into a similar problem myself - working with a big transcript, I asked an AI to pull out passages that related to a certain topic, and only because of oddities in the timestamps extracted did I realize that most of the quotes did not exist in the source at all.
Looking at the media ecosystem at large, gives me a case of gallows humor.
In some sections of the ecosystem, firms still penalize journalists for errors. In other sections, checking reduces the velocity of attention grabbing headlines. The difference in treatment is… farcical.
We need more good journalists, and more good journalism - but we no longer have ways to subsidize such work. Ads / classifieds are dead, and revenue accrues to only a few.
Out of curiosity, if you asked for the same text extraction multiple times, each inside fresh contexts, is it likely to fabricate unique quotes each time? And if so, a) might that be a procedure we train humans to do to better understand LLM unreliability, and 2) and instrumentalize the behavior to measure answer overlap with non LLM statistical tools?
Also, quote-presence testing/linking against source would seem to be a trivial layer to build on a chat interface, no LLM required. Just highlight and link the longest common strings.
They said earlier that they didn't verify the quotes. I understand them to mean that the LLM outputted text that included quotes. They assumed the output was accurate and found it so appealing, on an emotional level, that they just went with it without checking.
The most valuable lesson here, by far, is not about other people but about ourselves. This person is trained, takes it seriously, and advocates for making sure the AI is supervised, and got caught in the emotional manipulation of LLM design [0].
We all are at risk. If we look at the other person and mock them, and think we are better than them, we are only exposing ourselves to more risk. If we think - oh my goodness, look what happened, this is perilous - then we gain from what happened and can protect ourselves.
(We might also ask why this valuable tool also includes such manipulative interface. Don't take it for granted; it's not at all necessary for LLMs to work, and they could just as easily sound like a-holes.)
[0] I mean that obviously they are carefully designed to sound appealing
The tool didn't fail here, the person did. An experienced journalist should know better. Editorial review exists for exactly this reason, if you skip it, this is what happens.
HN is full of people saying ABCD should know better and honestly I thought the same, but when I look at almost all of my friends working in critical domains like as a judge or engineer or lawyer or even doctor, they seem to trust ChatGPT more or less blindly. People get defensive when I point out out to them that ChatGPT will make things up and it is widely know, and some even tell me it is the fault of "tech people" for not fixing it and they can't be expected to double check every chatgpt conversation. So I am very sure this problem is more prevalent than what we see and also that it is going to continue increasing.
> almost all of my friends working in critical domains like as a judge or engineer or lawyer or even doctor, they seem to trust ChatGPT more or less blindly.
We do not live in a meritocracy, because society has no means to judge merit. We live in a society ruled by people who crammed before the tests, and who wrote the papers to agree with and flatter the teacher. Now they are the teachers (and bosses), and
1) expect to be flattered (and LLMs have been built as the ultimate flatterers),
2) feel that a good, ambitious student (or subordinate) will not question them and their work, but instead learn to conform to it, and
3) are not particularly interested in the quality of their work as such, but rather the acceptance of their work. In certain professions, such as judges, doctors, high-level lawyers and engineers, or politicians, they feel like (with good reason) that they can demand acceptance of their work, and punish those who don't accept it.
This position is what they worked so hard as young people for. They were not working to become the best at their jobs. They were working to get the most secure jobs. The most secure jobs are the ones that bad or lazy work doesn't endanger.
The meritocracy thing is real, but the operational consequence is simpler: you now have a layer of confident-sounding hallucination between your expert and reality, and nobody on-call at 3am to own the failure.
I think this is an issue with anyone who relies on any LLMs. But yeah I agree and have had similar issues where someone will get defensive because they just don't want to admit they(the LLM's response) were wrong. It's hard to tell someone in a "nice/nonchalant" way:
"It's fine, the LLM just lied to you, but hallucinations and making claims based off of assumptions is just something they do and always have done!"
People don't like to feel dumb, and they don't want to feel betrayed by the same tool that gave them incredible factually correct results that one time only to give them complete and utter bullshit(that sounded legitimate) another time.
Also, yeah it feels like its everywhere these days and isn't showing any signs of slowing down(visited my parents and my dads using siri to ask chatgpt stuff now - URGHHHH) and I really hope we're both wrong
>but when I look at almost all of my friends working in critical domains like as a judge or engineer or lawyer or even doctor, they seem to trust ChatGPT more or less blindly
That's why I lost trust and faith in people who end up in positions of doctor, lawyer or judge. When I was young I used to think they must be the smartest most high-IQ people in society, having read the most books and have the highest levels of critical thinking and debate skills ever. When in fact they were only good at memorizing and regurgitating the right information that the school required to pass the exam that gave them that prestigious title and that's it.
Now in my mid 30's when I talk to people from these professions at a beer, barbeque or any other casual gathering, I realize they're really not that sharp or well read or immune propaganda and misinformation, and anyone could be in their place if they put in the grind work at the right time. It's a miracle our society functions at all.
It doesn't seem AI generated to me. Are we at the point where you have to write in a particularly outrageous style in order to not be accused of using AI?
It is a faithful translation of the original Dutch. Dutch is structurally very similar to English so this type of nuance carries over pretty much intact.
Dutch: “Dat was niet enkel onzorgvuldig, het was fout.”
English: “That was not just careless—it was wrong.”
I’d say the only difference is the em dash.
Whether you consider it proof of AI is up to y’all.
His non-apology apology even follows a familiar pattern: I wrote it myself but just used AI for some help, and it inserted false quotes! Bad tech! But I have now learned my lesson!
Very similar to what a rector recently wrote when she got busted giving an AI-generated speech in her inaugural speech in her new university job.
None of it is true, of course. These people are just sorry they got caught.
We ran into this at a newspaper I worked at — had an editor flag an em-dash in a reporter's copy as a possible AI tell. Turned out the reporter just... liked em-dashes. But we did build a quick linter that flagged em-dashes in submitted copy for review. Annoying for the two people who used them legitimately.
I’m tempted to agree, but this is a case where I think there’s more human than AI. Maybe he used LLMs for a bit, and changed parts of it. Maybe he is patient zero for LLM speak?
> “It is particularly painful that I made precisely the mistake I have repeatedly warned colleagues about: these language models are so good that they produce irresistible quotes you are tempted to use as an author. Of course, I should have verified them. The necessary ‘human oversight’, which I consistently advocate, fell short.”
What? Irresistible quotes? This betrays a terrible way of thinking as a journalist. Basically an admission of wanting to fake news that'd sound good. At that point just write fiction.
Minor correction: the journalist said the quotes were "irresistible," not that he wanted to fabricate. That reads more like acknowledging a cognitive trap than admitting intent to deceive. The distinction matters, though I'll grant the end result is pretty much the same.
Cant you, like, ask or instruct it to create a bibliography with the citations or at least put the source of any quotes next to it for reviewing purposes?
“Here’s a friendly message that will perfectly convey what you want to say”.
A double PhD friend says she has to talk to chatGPT for all sort of advice and can’t feel safe not doing it, “because you know I’m single and don’t have a companion to spitball my ideas”. She let chatGPT decide which way to take to get to a certain island, and she got stranded because the suggested service didn’t exist.
Same thing happened when spreadsheets arrived in the 80s. Suddenly everyone became a "financial analyst." The tool flatters the user into thinking competence transferred. It didn't then either.
"Journalism" over here seems to have died a long time ago. Most if not all of the former "quality newspapers" unfortunately seem to have devolved into what could be more accurately described as "pro regime activist blogs".
> I wrongly put words into people’s mouths, when I should have presented them as paraphrases
Journalists were doing this for decades. Stitching and editing words out of context, to put words into peoples mouths! I will take AI halucinations over journalists halucinations anytime, at least machine has no hostile intent, and is making a geunine error!
I’ve run into a similar problem myself - working with a big transcript, I asked an AI to pull out passages that related to a certain topic, and only because of oddities in the timestamps extracted did I realize that most of the quotes did not exist in the source at all.