Wednesday, February 11, 2009

How to REALLY back up like a pro

Over at Jay Lake's blog, he has written an entry on how to do backups like a pro. I cannot whitewash this; this is sheer poppycock. What Jay describes is absolutely the safest way to back up your files... circa 1990. The IT industry solved this problem years ago. It's over, done. And the software you need to do so is in the "cheap," "free," or "already paying for it" category.

That's an audacious claim, so let me start by debunking the notion that this baroque sequence of events is actually safe. There's one fundamental problem with it, and that is it relies on humans to not make errors. There are a lot of steps in there. And I don't know about other people, but after I've just spent a couple hours pounding out a few thousand words, I'm not at my most mentally keen. Should you really be willing to bank your security on habitually executing all those steps correctly under those circumstances?

In a word, no. Going through these motions is tantamount to thinking that the TSA guys who make us take off our shoes, and restrict liquids to no more than 2 oz (so we can't make a very BIG bomb?) are actually protecting us from terrorists. They counter what Bruce Schneier calls "movie terrorist plots"--threats that seem large, but in fact are not very likely--while against the real issues, they protects us little, or not at all.

An example, you say? Well, have any of you ever done any of these?
  • Saved a backup file with the wrong filename, so you can't find it later, or you accidentally overwrote a version you wanted to keep?
  • Had your email account hacked? For example, by a spammer, who gets your gmail account permanently shut down within a matter of hours?
  • Sent your backup dvds to the wrong relative, or had the right relative not correctly file them, making them impossible to find should you need them?
Et cetera, et cetera. "Impossible!" you say, "because I really CARE about my data." Consider this article from 2007, which cites a researcher who discovered that human error is the most common cause of security breaches.

"So, Mr. Smarty," you say, "You are not really being part of the solution here." Fair enough. Safely backing up your stuff requires two systems to cooperate:
  • Revision control (also called version control or source control), and
  • Off-site backups.
Revision control is software that is designed for tracking changes to program source code. No reasonable development shop works without it these days. It works thus: when you make changes to a file, you push those changes over to a revision control server (this is called "committing" the file), which remembers ABSOLUTELY EVERYTHING YOU HAVE EVER DONE TO THAT FILE. In the blink of an eye, you can revert to an older version, without destroying the data you've subsequently stored in your revision control system (hence, RCS).

Couple this with off-site backups. The best way to achieve this is to run your RCS on your internet hosting. When you save your changes, you tell the RCS to push the changes to your repository of files on your ISP. (Yes, there are some risks in doing this. There is no such thing as "no risk," only "manageable risk," and it's wise under these circumstances to get someone who knows about such things to advise you when first setting up your RCS, to mitigate this risk.) At any point in the future, you can restore every single file you've ever stored there, to any version you've ever committed.

Meanwhile there are guys who work for your ISP who get paid to do nothing but think about how to keep data from being lost. They use RAIDs, which protect systems from drive loss. They do regular tape backups. Some of THEM do multisite backups, automatically mirroring your data to another node in their network to protect against catastrophic failure.

If you can't see how that's better than gmailing yourself all your files, I have failed at this argument.


Other benefits
Not only is this a solid backup strategy that requires minimal manual intervention, there are several side benefits it gives you for free.
  • If you are working on a project collaboratively, how does your collaborator know they have the most current revision of the file? With revision control, you push changes up to your revision server, give your partner access and let them pull the most current changes using the same software. Unlike email, this works in real-time. You can even lock your files to show that you are working on them, so neither of you stomps on the other's changes.
  • Every checkin to revision control allows you to add a comment. So when you are looking for a particular past revision of a file, you can read "Road Trip: Changed protag from a man to a woman" instead of digging through a bunch of files called "RoadTrip_v132.doc," "RoadTrip_v133.doc" and so on. Additionally, the system automatically tracks commit times and revision numbers, so even if you don't add comments, it's no harder than looking through a pile of hand-versioned files.
  • If you work on multiple machines, like I do, it's a snap to keep them in sync: on your desktop, push changes up to the server, on your laptop, pull them down.
  • If you must have multiple backup sites, revision control makes it painless to keep them in sync, as well. You can even configure one revision control server to automatically push changes over to another (though this takes a little black magic; however, there are plenty of people who will gladly help you set this up for not very much money or free--including me).
All right, enough of my ranting. If I have even cracked your resolve on this, I encourage you, not to take my word on it, but do more research. Talk to your programmer friends. Google for some of the terms I've thrown around in this post. Go look at the web sites of some of the systems I'm talking about; the one I personally recommend for people getting started with RCS is Subversion. It is free, it's widely-adopted throughout the open-source community (lots of people to answer your questions), there are a number of easy-to-use clients for it (such as Tortoise SVN), and it's pretty easy to set up. (In fact, some ISPs that cater to developers, such as Joyent, the one I use, actually have a control panel that will greatly simplify the process.) I used to use Subversion, but if you're feeling ambitious, you might have a look at Bazaar, which is the RCS I use nowadays. (Word of caution: it's a more complex piece of software, so don't let that sour you on the whole RCS strategy.)

Lastly, if you're interested in hearing more about this, please comment. I will be happy to reply privately or answer peoples' questions here.

TimK
Saving the world from arcane backup strategies, one writer at a time.

Labels: , , ,

Saturday, July 21, 2007

Deathly Hallows & Me: It's Like I'm Some Kind of Frickin Genius

I wanted the final Harry Potter book; wanted it now, stamping my little foot exactly like Veruca Salt. But standing in line for three hours at midnight, surrounded by the berobed, bespectacled and be-wanded? Not so much. My dignity was at stake. I'm umpty-ump years old, and those people are nerds.

(For a movie? Sure, that's an inherently social experience, and standing in line only adds to it. But standing in line to get a book that I'm then going to take home and read by myself is just moronic.)

Normally, this wouldn't be a problem. I would just go the following morning when the bookstores reopened. I know they'll have ordered eleventy-hojillion of the things, so availability wouldn't be a concern. However, this wasn't really an option. The morning of, we were leaving the house at 7 AM to go to SeaWorld for the day, followed by a Cub Scout overnight in the park. I resigned myself to having to go and do the midnight thing after all; no way was I going to wait until Sunday afternoon, allowing some jackass to ruin it for me.

Then I had an idea. It remained my plan up until July 19th: go to Wal-Mart. Surely there wouldn't be long lines there -- nobody's that dumb! (Except me, I guess.) Only there was this niggling thought in the back of my head -- too obvious. Sure enough, after the fact, I found out that (BIG SPOILER ALERT) it would have been a bad idea.

So I concocted a new plan: I learned that my local Randall's store (a Safeway-owned grocery chain) would have copies of the book at launch day, and they would be open at 6. Ah-ha! I would get up early, go to Randall's, buy the book and some dramamine, and read it in the car on the way to SeaWorld. Brilliant! I would get the book a mere couple of hours later than the schmucks who waited in line

The alarm went off at 6 AM. My wife, already awake (prepping for SeaWorld, remember) said "It's 6:00". Redundancy, that's the key to a successful plan. I leapt out of bed; I am not a morning person at all but if a new Harry Potter book came out every day I'd never be late to work.

Drove down the street (more or less) to the Randall's (drive drive drive), pulled into the parking lot and what did I see? About 5 cars. Yes! Brilliant! I jumped out of my car and ran to the door, and as I entered the store I saw that it was practically vacant. Brilliant! I crossed to the books area, up front near the register, and what did I see?

Not one copy of the book.

I am thrown by this, but not all the way off the horse. OK, it was early, I could see they were stocking stuff, maybe they just hadn't cracked open their inventory yet. I decided to give them a few minutes to get to it before I started kicking someone's ass . . . maybe my own. I went to grab the Dramamine. Inevitably it takes me ten minutes to find whatever I'm looking for in that place anyway.

Next stumbling block: no Dramamine. No motion sickness meds of any kind whatsoever. And I need it, too. When I was a kid, I could read Dune (the old paperback edition with about a 30 degree bend to the spine and little tiny cramped type) with my head bouncing off the metal frame of the school bus window. Nowadays? Read one article in the paper and bleaugh!

I was frustrated, as you might imagine. But undaunted! I had just made up my mind to go talk to someone about where the hell my book was when I spotted generic Safeway motion sickness stuff. Score! Then I had another genius insight: the door I came in was the side door, not the main one. If they were going to have a display of the book, they might have put it at the other end of the store.

Indeed, as I ran down there, I saw they had about 20 copies of the book in the middle of -- get this -- a giant castle made of Coke. As if I was going to look at the book and go "That's right, reading is thirsty work. I better grab a case of Coke Zero." I picked up my copy of the book, skipped the Coke, checked out and headed home.

Total round trip time: 30 minutes. I would call the plan a success. I read roughly the first 150 pages on the trip to Seaworld, grabbed a few pages here and there during downtime in the overnight program, and finished it reading by booklight at 1:30 the following morning, tucked away in my sleeping bag in the shark tank at SeaWorld.

Labels: ,