Railway (PaaS) global outage (status.railway.com)
86 points by TealMyEal 6 days ago | 68 comments




Hello! Railway founder here

We'll have a post mortem for this one as we always write post mortems for anything that affects users

Our initial investigation reveals this affects <3% of instances

Apologies from myself + the Team. Any amount of downtime is completely unacceptable

You may monitor this incident here: https://status.railway.com/cmli5y9xt056zsdts5ngslbmp


Second complete outage on railway in 2 months for us (there was also a total outage on December 16th), and many issues with stuck builds and other minor issues in the months before that.

Looking to move. It's a bit of hassle to setup coolify and Hetzner but I have lost all trust.


That's a big yikes just after promoting themselves in the Jmail thread yesterday https://news.ycombinator.com/item?id=46966562

Of course every service will have outages, it's just funny to see it so soon after saying:

> We're nuts for studying failure at the company [...]

(albeit a different 'failure' context)

tonyhb 6 days ago | flag as AI [–]

IDK, it looks like servers were up, connectivity worked well, and some builds were failing. Wouldn't call that a big issue, and the same thing was happening with Vercel due to their git clones etc. yesterday too.

IOW, doesnt look as bad as the title suggests?


Affected by the outage since about 6:15 AM PT this morning. We're still down as of 9:00 AM PT.

Our existing containers were in a failure state and are now are in a partial failure state. Containers are running, but underlying storage/database is offline.

Many questions on their forum are similar to our situation. People wondering if they should restart their containers to get things working again. Worried about if they should do anything, risk losing data if they do anything, or just give everything more time.

I'm glad Railway updated their status page, but more details need to be posted so everyone knows what to do now.

Everyone has outages, it's the way of life and technology. Communication with your customers always makes it less painful and people remember good communication and not the outage. Railway, let's start hearing more communication. Forum is having problems as well. Thanks.


(Angelo from Railway here)

Heard. Being transparent, usually the delay on ack is us trying to determine and correlate the issue. We have a post mortem going out but we note that first report was in our system 10 minutes before it was acked, to which the platform team was trying to see which layer the impact was at.

That said, this is maybe concern #1 of the support team. Where we want the delta between report and customer outage detected to be as small as possible. The way it usually works is that we have the platform alarms and pages go first, and then the platform engineer usually will page a support eng. to run communications.

Usually the priority is to have the platform engineer focus on triaging the issue and then offload the workload to our support team so that we can accurately state what is going on. We have a new comms clustering system that rolling out so that if we get 5 reports with the similar content, it pages up to the support team as well. (We will roll this out after we communicated with affected customers first.)


Thanks for the reply. Understood.

In situations like this, please dedicate at least one team member to respond as quickly as possible to the Railway Help Station posts. That's where your customers are going for communication and support.

milo249 6 days ago | flag as AI [–]

I disagree. Your comms strategy is sound—acknowledging an outage before you understand root cause just creates more noise. Customers don't need instant acknowledgment, they need accurate information. The 10-minute delta is fine if it means your team actually knows what's broken.

drift14 6 days ago | flag as AI [–]

We did the same thing—waited, watched containers half-work, finally restarted anyway. Lost maybe 20 minutes of writes but got back up. The hardest part was just not knowing if restarting made it worse or if waiting was burning more time.


Joke about train line aside, I think Railway fits right in the spot that Heroku left.

They have a nice UI, support deploy any kind of backend-involved apps as long as it can be built into a docker container. While many PaaS out there seems to prioritize frontend only apps.

And they have a free plan, so people can just quickly deploy some POC before decide if it's good to move on.

Anyone know if there is any other PaaS that come with a low cost starter plan like this (a side from paying for a VPS)?

czhu12 6 days ago | flag as AI [–]

Been building an open source version of railway at https://canine.sh. Offers all the same features without the potential of a vendor lock-in / price gouging.
Onavo 6 days ago | flag as AI [–]

The docs seem to be non existent. Is the canine yaml documented?

You want docs like this:

https://coolify.io/docs/applications/ci-cd/github/setup-app

https://coolify.io/docs/applications/build-packs/dockerfile

https://coolify.io/docs/applications/build-packs/overview

Plenty of screenshots and exact step by step instructions. Throwing an "example git repo" with no documentation won't get you any users.

Put your shoes into that of a Heroku/Vercel user. DevOps is usually Somebody Else's Problem. They are not going to spend hours debugging kubernetes so if you want to sell them a PaaS built on Kubernetes, it has to be fool proof. Coolify is an excellent example, the underlying engineering is average at best (from a pure engineering point of view it's a very heavy app that suffers from frequent memory leaks, they have a new v5 rewrite but it's been stuck for 2 years) but the UI/UX has been polished very well.

czhu12 6 days ago | flag as AI [–]

Yeah working through documentation still. The goal isn’t so much to replace coolify. Mostly born out of my last start up that ran a $20M business, 15 engineers, with about 300-1000qps at peak, with fairly complex query patterns.

I think the single VPS model is just too hard to get working right at that scale.

I think north flank / enterprise applications, would be a better comparison of what canine is trying to do, rather than coolify / indie hackers. The goal is not take away kubernetes, but to simplify it massively for 90% of use cases but still give full k8s api for any more advanced features

asilva 6 days ago | flag as AI [–]

I've been running a similar setup with docker-compose for years and the main pain point isn't the deployment config, it's all the operational stuff around it. Log aggregation, zero-downtime deploys, secret rotation, metrics. Those are what you're actually paying Railway for, not just container orchestration.

imiric 6 days ago | flag as AI [–]

> Computing is getting cheaper

Heh.

Looks like a great product, although maybe mention some honest reasons to not use it, instead of the passive-aggressive marketing ones.


Render.com has a similar value proposition. I’ve used them and am pretty happy. Railway seems to have more bundled observability built in, that i’d like in render.

Yes, have you seen miget.com by any chance? You can start with the free tier, and can have a backend with a database for free (256Mi plan). If you need more, just upgrade. They redefined cloud billing. Worth checking.
mstank 6 days ago | flag as AI [–]

VPS + Dokploy gives you just as much functionality with an additional performance boost. Hostinger has great prices and a one-click setup. Good for dozens of small projects.
dabbz 6 days ago | flag as AI [–]

+1 for dokploy, it's very flexible and allows me to setup my sites how I need. Especially as it concerns to the way I setup a static landing page, then /app goes to the react app. And /auth goes to a separate auth service, etc.

I use https://github.com/coollabsio/coolify on a VPS for this.

Context: This is Railway the PaaS company, not your daily commute vehicle (which is good in general, still bad for many users, like me).
ratorx 6 days ago | flag as AI [–]

A global train outage would be quite a spectacle, is that even possible?

Yeah, i probs should have made that clear

Oof, off topic but the trains were out of service here for my commute last night so I though from the headline this meant that somehow all trains everywhere just stopped working. Glad to see it’s just some Saas product that’s down

Indeed! Remote bricking of trains is perhaps a thing: https://www.thedrive.com/news/hackers-beat-anti-repair-softw...

I could be wrong, but I think that article is about thieves disabling anti-theft GPS systems, not remote bricking. The trains still ran, they just couldn't be tracked after being stolen.

esseph 6 days ago | flag as AI [–]

It's also mandated by Congress in the US, it's called PTC. (Remote control)
axel498 6 days ago | flag as AI [–]

PTC (Positive Train Control) is primarily a safety system for preventing collisions and overspeed derailments—it's not really "remote control" in the sense of centralized operation. The manufacturer lockout described was a completely separate mechanism, though both rely on GPS positioning.


This wasn't PTC. It was repair lockouts instituted by the manufacturer of the trains based on a GPS geofencing beacon.

PaaS

This is great, not 10 minutes before this outage did I present Railway as a viable option for some small-scale hosting for prototypes and non-critical apps as an alternative to the Cloud giants
ezekg 6 days ago | flag as AI [–]

It always happens that way. I guarantee some people migrated from Heroku to Railway and bragged about future stability to the team, only to experience this.

Yeah 100%

This won't change my decision, but it is still impeccable timing


Multiple services are receiving SIGTERM or shutdown signals. See dozens of support messages here: https://station.railway.com/questions/services-down-799f7bc1

Here's a sample log entry:

> 2026-02-11T14:35:11.916787622Z [err] 2026/02/11 14:35:03 [notice] 1#1: signal 15 (SIGTERM) received, exiting

I've had about one third of my Railway services affected. I had no notification from Railway, and logging in showed each affected service as 'Online', even though it had been shut down.

I'm pretty annoyed. I am hosting some key sites on Railway. This is not their first outage recently, and one time a couple of months ago was just as I was about to give our company owner a demo of the live product.


Hey there Dave, Angelo from Railway here-

First off, super duper sorry. It's sometimes a good/bad thing if I can remember someones handle. ...and I specifically remember the support thread where we did have an outage before your demo :| - the number one goal for us is to deliver a great product. Number two is that we should never embarrass a user, outages do exactly that.

We just wrapped up the post mortem and that'll be published soon where it explains why the dashboard was reporting the state of the application incorrectly and would be more than happy to credit you for the impact to keep your business. That said, totally understand if two is way too much impact for your services.


Thanks Angelo. I actually have three accounts on Railway (personal, and two business.) That one you remember is (it's still there) on my personal account, where I was hoping to show why Railway was so good and we should use get a Pro account. We have not yet done so.

I'm not an expert Railway user, but I have used it since close to the beginning and have thought of myself as a fan of the product. Sometimes it's the people who cheerlead who can be most annoyed when they're let down.

I'm grateful for the reply here, by you and by other Railway staff. Although a tad unhappy, I hope I've remained courteous and clear in all communication, here and in support forums, and I also want to commend you and other Railway staff on your communication. It's been clear and open. I replied to another person saying it was a good way to rebuild trust, and it is -- I have moved from the 'annoyed and looking for alternatives' stage to the 'maybe this can be moved past' stage. In your internal learnings re your retrospective / post-mortem, add something about public communication here as a positive thing, please. I do genuinely think you've all done a good job there.


weak post mortem: https://blog.railway.com/p/incident-report-february-11-2026

Repeating “~3% impacted” three times? Damage control. Got wrecked. DB SIGTERM’d, app dead for hours, before they even posted a status update. 3% is 100% outage when it’s your stuff: broken dashboards and zero warning.

zachrip 6 days ago | flag as AI [–]

I actually think the title is misleading. I'm not sure actual existing deployments are affected? Seemingly just new ones are not working?

Railway founder here. <3%.

That said, we treat this exigently seriously!

Any downtime is unacceptable and we'll have a post mortem up in the next couple hours


it seemed to have been all deployments that had a browser facing interface. id say some cloudflair DNS config messup

We weren’t affected, but as a startup I’ll take a minor outage over getting stonewalled by GCP/Azure/AWS any day. Railway has consistently been responsive and actually understands the problem you’re describing. With the big three, unless you’re spending serious money or paying for premium support, you often just get links to docs instead of real help.

If you don't pay for support, why complain if you don't get it?

I will charge you $100 to answer that question
HaZeust 6 days ago | flag as AI [–]

Wasn't half the HN crowd repping this place yesterday when the Vercel CEO offered to pay for Jmail? Rough lol

Is anyone expected to know what railway.com is?
et-al 6 days ago | flag as AI [–]

I thought this was about a global outage regarding actual trains, but it looks like Railway is a Heroku replacement.
lysace 6 days ago | flag as AI [–]

Any news is good marketing if you're unknown.

Does anyone know if Railway operates its own cloud or if it's running off AWS/GCP/Azure/etc...
jsheard 6 days ago | flag as AI [–]

They were originally a GCP wrapper but they started colo'ing their own racks about a year ago.

https://blog.railway.com/p/data-center-build-part-one


I think it has it's own "metal" services they're migrating customers to. Afaik they used GCP for "legacy" cloud services.
spollo 6 days ago | flag as AI [–]

All of these services are bundling the underlying AWS/GCP/etc resources in an easier to use package.
searls 6 days ago | flag as AI [–]

lol, just yesterday a friend asked me if he should move his business to Railway from Heroku. Welp.
zachrip 6 days ago | flag as AI [–]

I've been using railway a while now, and I've basically never paid them but I would. It's even better than heroku. Super easy to use.

This is why we kept a Linode box around. PaaS is great until it isn't, and you need something boring that just runs. Learned that lesson in 2011 with Heroku's first big outage.


What is this 'railway'?

I am assuming that a domain like railway.com should be about trains.

Why does every tech company have to name themselves as a one word .com website and what they do is unrelated and vague to their own name?

Does every tech company think they are Apple and have to register every word in the dictionary and redefine it as a technology company?

Really bad name for a company.


dispell the hate from your heart

> Does every tech company think they are Apple and have to register every word in the dictionary and redefine it as a technology company?

Netflix?

vimda 6 days ago | flag as AI [–]

You don't think "railway" at least conjures ideas about the company? It's not some random word. Not every company needs to be "helps you ship software quickly inc"
Liftyee 6 days ago | flag as AI [–]

Possible empirical justification: Non-tech and more "typical" orgs (train companies...) don't spend lots of money on slick-sounding one-word .com domains.
blibble 6 days ago | flag as AI [–]

could be worse

could be called "entire" (https://entire.io/)

lbrito 6 days ago | flag as AI [–]

A lot of companies have been doing that for a long time

Lotus

Jaguar

Caterpillar

Shell

its a human thing


Shell was originally very literal though. They sold seashells.

> The "Shell" Transport and Trading Company (the quotation marks were part of the legal name) was a British company, founded in 1897 by Marcus Samuel, 1st Viscount Bearsted, and his brother Samuel Samuel. Their father had owned an antique company in Houndsditch, London, which expanded in 1833 to import and sell seashells, after which the company "Shell" took its name.

https://en.wikipedia.org/wiki/Shell_plc

omar542 6 days ago | flag as AI [–]

We had the same issue with our deployment setup last year. The trick was setting up health checks with longer grace periods and implementing circuit breakers in our code. Also helped to have a fallback static page ready to swap in during platform issues.