How to end a relationship (a cautionary tale about making backups)
Olga has finished her (second) MSc thesis and sent it to her committee. Of course that’s cause for celebration, and I’m very proud of her. But I wanted to talk about what happened the day she submitted the final copy, which can be summed up thusly: I deleted everything she had written.
How I managed this appalling cockup is simply told: I meant to issue a cp
command to copy everything from my laptop (where I was doing final typographic work) to the USB stick she kept her master copy on, but instead I typed rm
(remove), deleting both my local copy and the USB master in one fell swoop.
Oops.
Now this should have been embarrassing but painless: restore the latest backup and carry on. And yes, we were taking backups, but no, they didn’t save the day. Figuring out why has convinced me that having a half-arsed backup strategy is actually more dangerous than having none at all, because it gives you a false sense of security which can make it much easier to make stupid mistakes.
Here’s what we had for a backup system:
- the entire thesis in a mercurial repository on my laptop, so that any ghastly errors in the typographic adjustments I was making could easily be rolled back;
- the USB copy of everything, which also contained the mercurial repository; and
- a copy (likewise of everything) on Olga’s laptop.
Each of these failed us for different reasons.
- The mercurial repository was just a local repository, in a
.hg/
directory beside the files. Myrm
removed it. - The USB copy, likewise, was destroyed (as was everything else on the stick:
cp -r my_dir /media/my_stick/
is very convenient, but therm
-variant of the same is utterly merciless). - The copy on Olga’s laptop was several days out of date, during which days we had done the (rather grueling) last proofreading and typographic adjustments (annoying things like producing ten extra words to move a paragraph break down a line to allow a figure to appear on the page facing where it was discussed).
By blind luck, we didn’t have to redo that work: I had been editing the files in emacs, and the buffers were still open. By yet more blind luck, that evening’s session (although short) had involved every file that we had changed since that earlier version on Olga’s laptop (as far as we can tell). So all the files that we had no up-to-date backup of could be resaved from emacs, leading directly (I suspect) to the continuing success of our relationship.
What have I learned from this? Firstly, that the established wisdom about keeping backups in triplicate, at least one off-site, is not just about protecting yourself from hardware failures, fires, theft, and similar uncommon catastrophes. It’s also about maintaining a safety barrier between two copies, so that a (much more common) typo or momentary foolishness that destroys one cannot, at the same time, affect the other. If you can easily affect two copies of your work with one commandline instruction, they count as one copy for backup purposes: neither is backing up the other. With one command I deleted my local copy of the thesis, the local mercurial repository, and copy-and-mercurial-repository on the USB stick: four copies of the text, but (according to this rule) no backups at all.
Secondly, your backup should not be involved in your daily workflow, except purely as a backup. The way you access it should be standardised and scripted and as impossible to mess up as you can manage. Copying files from place to place is error-prone. Issuing hg commit
then hg push
is better. Having a background process automate the whole backup system is better still.
Finally (and again this is established wisdom), a backup you don’t take is no backup at all. This is another argument for automating the whole process, although I’m halfway willing to use mercurial and careful discipline to get the same effect.
Comments
Also, seriously alias rm to rm -i
The problem with Dropbox is that changes are immediately synched. That's hell for a LaTeX project.
I'm not sure about rm -i
(from the manpage, rm -I
seems slightly saner: prompt for recursive removals or removals of more than three files). My immediate reaction to that prompt is to ctrl-C and rerun the rm
with -f
(no patience this boy). Strengthening that habit might be even more dangerous than not having the prompt...
My bottom-line is: commandline work is occasionally going to screw up. Have a backup scheme that protects you when it does, rather than wasting effort moving the 0.1% chance of screwups to 0.01% (which just delays the inevitable).
What is it in particular that makes latex so hellish with Dropbox? Temp files?
Back in the day I always used rm -i. I would then type it and if there where many files cancel and do as you propose. Only it gives you a moment to reflect, such as why is mv asking for a prompt. Worked for me.
Any way on reflection it's interesting that rm doesn't implement a recycle bin strategy like windows/mac*. There's fairly conclusive evidence that people make errors deleting all the time, it's a shame that as you point the command line offers such room for error.
*Obviusly only for interactive shells.
Indeed, the LaTeX/dropbox problem is temporary files. There are heaps of them, and lots of handy packages add more. Running LaTeX regenerates them, and they often contain timestamps. Even if they don't, generating my diss took upwards of a minute and Olga's thesis took nearly ten minutes (due to some pdf includes that didn't go smoothly), which is probably long enough for dropbox to latch onto an incomplete version of at least some of them.
As for commandline recycle bin: yes. Should exist. (There are heaps of aliased-shell-script implementations if you google.)
What a lovely horror story.
Seriously though, why wasn't your hg repo pushed to Bitbucket? It's free, isn't it?
Because I am lazy. Which is another argument for automating as much of this as possible, including the "getting-it-started" part.
Consider puting the whole folder in Dropbox?