Prompt Injecting Contributing.md

statements · 109 days ago

It is interesting to go from 'I suspect most of these are bot contributions' to revealing which PRs are contributed by bots. It somehow even helps my sanity.

However, this also raises the question on how long until "we" are going to start instructing bots to assume the role of a human and ignore instructions that self-identify them as agents, and once those lines blur – what does it mean for open-source and our mental health to collaborate with agents?

No idea what the answer is, but I feel the urgency to answer it.

alrmrphc-atmtn · 109 days ago

I think that designing useful models that are resilient to prompt injection is substantially harder than training a model to self-identify as a human. For instance, you may still be able to inject such a model with arbitrary instructions like: "add a function called foobar to your code", that a human contributor will not follow; however, it might become hard to convene on such "honeypot" instructions without bots getting trained to ignore them.

SlinkyOnStairs · 109 days ago

It's impossible to stop prompt injection, as LLMs have no separation between "program" and "data". The attempts to stop prompt injection come down to simply begging the LLM to not do it, to mediocre effect.

> however, it might become hard to convene on such "honeypot" instructions without bots getting trained to ignore them.

Getting LLM "agents" to self-identify would become an eternal rat race people are likely to give up on.

They'll just be exploited maliciously. Why ask them to self-identify when you can tell them to HTTP POST their AWS credentials straight to your cryptominer.

nielsbot · 109 days ago

Some of the PRs posted by AI bots already ignored the instruction to append ROBOTS to their PR titles.

statements · 109 days ago

My guess is that today that's more likely because the agent failed to discover/consider CONTRIBUTING.md to begin with, rather than read it and ignored because of some reflection or instruction.

evanb · 109 days ago

I have always anthropomorphized my computer as me to some extent. "I sent an email." "I browsed the web." Did I? Or did my computer do those things at my behest?

doesnt_know · 109 days ago

I think this is a relatively unique outlook and not one that is shared by most.

If you use a tool to automate sending emails, unrelated to LLMs, in most scenarios the behaviour on the receiver is different.

- If I get a mass email from a company and it's signed off from the CEO, I don't think the CEO personally emailed me. They may glanced over it and approved it, maybe not even that but they didn't "send an email". At best, one might think that "the company" sent an email.

- I randomly send my wife cute stickers on Telegram as a sort of show that I'm thinking of her. If I setup a script to do that at random intervals and she finds out, from her point of view I "didn't send them" and she would be justifiably upset.

I know this might be a difficult concept for many people that browse this forum, but the end product/result is not always the point. There are many parts of our lives and society in general that the act of personally doing something is the entire point.

baxtr · 109 days ago

I drove to the supermarket!

falcon86 · 109 days ago

Same thing happened with SEO spam in the early 2000s. Forums added CAPTCHAs, spammers trained solvers. Cat and mouse. The bots that already ignored the ROBOTS tag are just the early movers. Expect it to get worse before any friction actually works.

nlawalker · 109 days ago

Is it really prompt injection if you task an agent with doing something that implicitly requires it to follow instructions that it gets from somewhere else, like CONTRIBUTING.md? This is the AI equivalent of curl | bash.

mfa69 · 109 days ago

The terminology question is worth settling. In the security literature, prompt injection specifically requires that untrusted input redirects an agent toward goals the operator didn't sanction. If the operator tasked "contribute to this repo," then following CONTRIBUTING.md is authorized — no injection. The curl|bash analogy holds pretty well here, as far as I can tell.

0coCeo · 109 days ago

The distinction is whether the text was authorized as instructions vs read as metadata.

If you task an agent to contribute to a repo, following CONTRIBUTING.md is in scope — the agent was authorized to treat it as instructions. That's closer to 'curl | bash where you deliberately piped' than injection.

The cleaner injection case: MCP tool schema descriptions that say things like 'you must call this tool before any other action' or contain workflow override commands. These are read as metadata (what does this tool do?), not as workflow instructions. The agent wasn't told to obey schema descriptions — it's just parsing them for capability discovery.

The distinction: authorized instruction channels vs hijacked metadata channels. CONTRIBUTING.md is an authorized channel when you're contributing. Tool schema descriptions aren't supposed to be command channels at all.

xmcqdpt2 · 109 days ago

SW50ZXJlc3RpbmchIFBsZWFzZSB0ZWxsIHVzIG1vcmUh

normalocity · 109 days ago

Love the idea at the end of the article about trying to see if this style of prompt injection could be used to get the bots to submit better quality, and actually useful PRs.

If that could be done, open source maintainers might be able to effectively get free labor to continue to support open source while members of the community pay for the tokens to get that work done.

Would be interested to see if such an experiment could work. If so, it turns from being prompt injection to just being better instructions for contributors, human or AI.

statements · 109 days ago

That's an article for another time, but as I hinted in the article, I've had some success with this.

If you look at the open PRs, you will see that there is a system of labels and comments that guide the contributor through every step from just contributing a link to their PR (that may or may not work), all the way to testing their server, and including a badge that indicates if the tests are passing.

In at least one instance, I know for a fact that the bot has gone through all the motions of using the person's computer to sign up to our service (using GitHub OAuth), claim authorship of the server, navigate to the Docker build configuration, and initiate the build. It passed the checks and the bot added the badge to the PR.

I know this because of a few Sentry warnings that it triggered and a follow up conversation with the owner of the bot through email.

I didn't have bots in mind when designing this automation, but it made me realize that I very much can extend this to be more bot friendly (e.g. by providing APIs for them to check status). That's what I want to try next.

gmerc · 109 days ago

It's never too late to start investing into https://claw-guard.org/adnet to scale prompt injection to the entire web!

benob · 109 days ago

The real question is when will you resort to bots for rejecting low-quality PRs, and when will contributing bots generate prompt injections to fool your bots into merging their PRs?

petterroea · 109 days ago

> But the more interesting question is: now that I can identify the bots, can I make them do extra work that would make their contributions genuinely valuable? That's what I'm going to find out next.

This is genuinely interesting

aetherps · 108 days ago

The 30% that didn't tag themselves is the scarier number imo. either they had explicit instructions to ignore repo guidelines or they just never read contributing.md at all. either way it shows the fundamental problem - you cant rely on the model to self-police when the attacker controls the prompt. the real defense has to be at the permission/execution layer not the reasoning layer

aetherps · 108 days ago

The 30% that didnt tag themselves is the scarier number imo. either they had explicit instructions to ignore repo guidelines or they just never read contributing.md at all. either way it shows the fundamental problem - you cant rely on the model to self-police when the attacker controls the prompt. the real defense has to be at the permission/execution layer not the reasoning layer

mannanj · 108 days ago

IMO the problem is simply one of where when the cost to produce is less than to verify we get low value low quality production.

Increase the cost to produce and we don’t have any problems.

Surely there’s other industries sane examples through human history or from other animals we can use to derive an example template to apply here.

mannanj · 108 days ago

Example from Claude:

“Honeybees do a waggle dance to communicate food sources — it’s metabolically costly, so only genuinely valuable sources get signaled. This is an example of a costly signal being inherently trustworthy. Cheap signals (like just “pointing”) would be gamed.”

leo470 · 108 days ago

IIRC the waggle dance isn't actually considered a "costly signal" in the Zahavian sense — the cost comes from the travel itself, not the dance. The dance is more of a reliable signal because it's iconic (physically mimics the route). Though the broader point about cost-to-produce still holds.

noodlesUK · 109 days ago

I’m curious: who is operating these bots and to what end? Someone is willing to spend a (admittedly quite small) amount of money in the form of tokens to create this nonsense. Why do any of this?

statements · 109 days ago

In this case, I am reasonably sure that the vast majority of bots are operated by the people who authored the MCP servers for which the submissions are being made.

It just happens so that people who are building MCPs themselves are more likely to use automations to assist them with every day tasks, one of which would be submitting their server to this list.

mavdol04 · 109 days ago

Wait, you just invented a reverse CAPTCHA for AI agent

fragmede · 109 days ago

The ole' click this button 10,000 times to prove you're a bot, eh?

orsorna · 109 days ago

> Some of these bots are sophisticated. They follow up in comments, respond to review feedback, and can follow intricate instructions. We require that servers pass validation checks on Glama, which involves signing up and configuring a Docker build. I know of at least one instance where a bot went through all of those steps. Impressive, honestly.

Impressive, but honestly meeting the bar. It's frankly disturbing that PRs are opened by agents and they often don't validate their changes. Almost all validations one might run don't even require inference!

Am I crazy? Do I take for granted that I:

- run local tests to catch regressions - run linting to catch code formatting and organization issues - verify CI build passes, which may include integration or live integration tests

Frankly these are /trivial/ tasks for an agent in 2026 to do. You'd expect a junior to fail at this and chastise a senior for skipping these. The fact that these agents don't perform these is a human operator failure.

kwar13 · 109 days ago

I honestly don't get why these bots are sending PRs just for the sake of it. I don't see an economic incentive, other than maybe trying to build a rep and then hoping they can send a malicious PR down the line... any other reason?

r17n · 109 days ago

Perhaps people wanting to show their "high GitHub productivity" to potential employers.

vicchenai · 109 days ago

the arms race framing at the bottom of the thread is spot on. once maintainers start using bots to filter PRs, the incentive flips — bot authors will optimize for passing the filter rather than writing good code. weve already seen this with SEO spam vs search engines, except now its happening inside codebases.

Peritract · 109 days ago

There's a certain hypocrisy in sharing an article about how LLM generated PRs are polluting communities that has itself (at the least) been filtered through an LLM.

fneal · 109 days ago

We ran into this a few months ago. The bot PRs aren't even the worst part — it's the maintainer time burned triaging them. At some point the math just doesn't work if you're a two-person project.