How Cloudflare responded to the “Copy Fail” Linux vulnerability

sammy2255 · 60 days ago

Any Cloudflare employees reading this, your network map has a few PoPs missing from it https://www.cloudflare.com/network/ notably, Perth (PER) Australia. Hobart (HBA) Australia. Wellington (WLG), New Zealand. Christchurch (CHC), New Zealand. Nausori (SUV), Fiji.

electra2012 · 60 days ago

> Despite our practice of deploying Linux patch updates every two weeks, we remained vulnerable because a month-old mainline fix had yet to be backported to our primary kernel line.

Hopefully a wake-up call to those who believe older distro LTS kernels are getting all the security fixes Canonical and Redhat would want you to believe.

skinfaxi · 60 days ago

Would love to learn more about their internal behavioural detection program.

> One of the first things our security team did was confirm that our existing endpoint detection would catch this exploit. Our servers run behavioral detection that continuously monitors process execution patterns. It doesn't rely on knowing about specific vulnerabilities; it watches for anomalous behavior across the fleet.

srcreigh · 60 days ago

It’s fascinating that already had a system which could identify the exploit at runtime. How can I learn more about that?

mkj · 60 days ago

If they're already running a custom Linux kernel build, why did they have AF_ALG enabled? Seems the perfect situation to limit features to only those actually being used.

PunchyHamster · 60 days ago

for us it was

* Get list of modules from Puppet's facts, confirm module isn't used anywhere (it wasn't) * `install algif_aead /bin/false` in /etc/modprobe.d/disable-algif.conf * Run a check using exploit code to check it is no longer working

I imagine CF runs more stuff that could use it I guess but apparently it's not often used API

cluckindan · 60 days ago

Has anyone figured out whether this CVE was intentional?

tptacek · 60 days ago

This is an interesting post from Cloudflare, as usual, but it's not clear to me why they would have been vulnerable to CopyFail. Did I miss the point in this blog where that's addressed? What triggered the threat hunting and mitigation exploit? At what points in their architecture were they reliant on Linux user-based access control?

robotbikes · 60 days ago

I would assume it was about protecting their servers from internal sources escalating privileges vs. them providing publicly accessible Linux shells.

kjt6 · 60 days ago

Yeah, both honestly. We ran into this at a previous job where the bigger concern was lateral movement from compromised internal services, not just external shells. If one service gets popped, you don't want it escalating on the same host. Defense-in-depth regardless of exposure.

jmclnx · 60 days ago

> Linux kernel build based on the community's Long-Term Support (LTS)

CopyFail only highlights why Companies want LTS. If there was a supported kernel built prior to 2017, most large companies would still be on that version, avoiding this issue all-together.

The corporate mindset is usually "never upgrade unless there is new hardware needed or critical software failure". All CopyFail did was reinforce that mindset.

I wonder if CopyFail will cause enterprises put pressure on the Linux Foundation to maintain a "ultra LTS" were it is supported for 20 years ?

PunchyHamster · 60 days ago

> CopyFail only highlights why Companies want LTS. If there was a supported kernel built prior to 2017, most large companies would still be on that version, avoiding this issue all-together.

Sadly not really how it works for say Red Hat. They routinely backport features while keeping whatever "stable" number on kernel. We even had displeasure of them backporting a bug... same bug to 2 different RHEL versions

dcn5 · 60 days ago

The backport bug is somehow worse than the original. At least upstream you get one fix. Red Hat gives you the same broken code labeled "stable" across three release streams simultaneously.

tempest_ · 60 days ago

The longer you wait the more painful the switch will eventually be.

dboreham · 60 days ago

The "Hunting for Exploitation" section is unclear to me: "The exploit leaves a distinctive trace in kernel logs when it runs." Hmm. Wouldn't a system with a compromised kernel also log exactly what the attacker wanted logged?

QuantumNoodle · 60 days ago

Also 48 hours prior the disclosure is a very narrow window? I wonder if their logs don't go back further or if there was another reason to look back only two days.

rithdmc · 60 days ago

The attack itself creates the logs, which - reading between the lines - are shipped to a central log server. A compromised server might not send any new indicators to the logs, but existing logs moved off device would still be available.

I'd like to know what those distinctive traces are, which is also missing :(

PunchyHamster · 60 days ago

Your exploit would have to get root and kill/exploit the logging daemon near instantly, else the log will already be sent to remote before you can change it locally

cobalt14 · 60 days ago

Minor correction: the compromised kernel wouldn't necessarily control what's already been written to the log. The exploit triggers the log before privilege escalation completes, so you'd need to retroactively scrub it. Though yeah, a sufficiently clever exploit could account for that.

cube00 · 60 days ago

I guess the hope is the kernel has been able to successfully transmit that log message to the immutable central logging infra before it gets compromised.

Although given the tendency for end point logging agents to run on buffers to reduce their network chattiness I do wonder if a fast acting exploit could dump that buffer before it manages to be transmitted.

I don't think any of the agents are complex enough to immediately transmit permission elevation log messages over the regular background noise.

john_strinlai · 60 days ago

this is a techincal dive into how cloudflare responded, not a confirmation that they responded

for whatever reason, unknown to me, hn automatically strips "how" from the start of titles. i cant remember ever seeing a title where this was an improvement.

dang · 60 days ago

Of course you can't, because the cases it improves don't get noticed, while the remainder stick out like sore thumbs.

gamegoblin · 60 days ago

I learned a few years ago that HN also editorializes by dropping "world's" from titles

Before: Teens break record for world's longest kickball game

After: Teens break record for longest kickball game

dpoloncsak · 60 days ago

Interestingly, there's a current post on the front page with "How" at the start of the title.

> https://news.ycombinator.com/item?id=48018715 "How do I inform Windows that I’m writing a binary file?"

I wonder if it ending in a '?' has anything to do with it?

edit: Upon review, at the time of posting it was actually on the 2nd page

varun_ch · 60 days ago

I'm yet to see a good example of the title stripping, at least for "how" and "how to" (although perhaps this is survivorship bias).

trollbridge · 60 days ago

Starting a title with “How” is standard clickbait.

anvil44 · 60 days ago

Slashdot did the same thing with "Ask:" and "Poll:" prefixes back in the early 2000s. Some worked, some didn't. Editors have always made these calls and they're always wrong about half the time. The survivors just fade into background noise.

cube00 · 60 days ago

> At the time of the "Copy Fail" disclosure, the majority of our infrastructure was running the 6.12 LTS version

That could be as low as 50.1%, I wish they'd provide an actual percentage.

kmc98 · 60 days ago

"Two-week patch cycle" and they were still a month behind — classic.