Saturday 4 October 2014

A lesson learned the very hard way

Two nights ago, I took a quick look at a website I run with a few friends. It’s a sort of book recommendation site, where you describe some problem you’re facing in your life, and we recommend a book to help you through it. It’s fun to try to find just the right book for someone else, and it really makes you consider what you keep on your shelves.

But alas, it wasn’t responding well—the images were all fouled up, and when I tried to open up a particular article, the content was replaced by the text "GEISHA format" over and over again. So now I’m worried. Back to the homepage, and the entire thing—markup and everything—has been replaced by this text.

First things first: has anyone else ever heard of this attack? I can’t find a thing about it on Google, other than five or six other sites that were hit by it when Googlebot indexed them, and one of them at least a year ago.

So anyway, I tried to SSH in, with no response. Pop onto my service provider to access the console (much as I wish I had the machine colocated, or even physically present in my home, I just can’t afford the hardware and the bandwidth fees), and that isn’t looking good, either.

All right, restart the server.

Now HTTP has gone completely nonresponsive. And when I access the console, it’s booted into initramfs instead of a normal Linux login. This thing is hosed. So I click the “Rescue Mode” button on my control panel, but it just falls into an error state. I can’t even rescue the thing. At this point, I’m assuming I’ve been shellshocked.

Very well. Open a ticket with support, describing my symptoms, asking if there’s any hope of getting my data back. I’m assuming, at this point, the filesystem’s been shredded. But late the next morning, I hear back. They’re able to access Rescue Mode, but the filesystem can’t fsck properly. Not feeling especially hopeful, I switch on Rescue Mode and log in.

And everything’s there. My Maildirs, my Subversion repositories, and all the sites I was hosting. Holy shit!

I promptly copied all that important stuff down to my personal computer, over the course of a few hours, and allowed Rescue Mode to end, and the machine to restart into its broken state. All right, I think, this is my cosmic punishment for not upgrading the machine from Ubuntu Hardy LTS, and not keeping the security packages up to date. Reinstall a new OS, with the latest version of Ubuntu they offer, and keep the bastard thing up to date.

Except that it doesn’t quite work that well. On trying to rebuild the new OS image… it goes into a error state again.

Well and truly hosed.

I spun up a new machine, in a new DC, and I’m in the process of reinstalling all the software packages and restoring the databases. Subversion’s staying out; this is definitely the straw that broke the camel’s back in terms of moving my personal projects to Git. Mail comes last, because setting up mail is such a pain in the ass.

And monitoring this time! And backups! Oh my God.

Let this be a lesson: if you aren’t monitoring it, you aren’t running a service. Keep backups (and, if you have the infrastructure, periodically try to refresh from them). And keep your servers up-to-date. Especially security updates!

And, might I add, many many thanks to the Rackspace customer support team. They really saved my bacon here.

No comments:

Post a Comment