This is the second in a three part series on technology in dissertation and thesis writing.
As a reminder, there are three critical things that I think should be done when writing your thesis or dissertation:
- Develop a robust backup routine, and then make it even more redundant.
- Learn how to use the styles feature in your preferred word processor.
- Start using a reference or citation manager.
Reasons to backup
Most people, it seems, know that having a backup of important computer data is important. But too many of us don’t take the threat of data loss as seriously as we should. Think you’ll never lose your dissertation? It happens . A lot .
Early on in the process of writing my masters thesis, I suffered some data corruption and was forced to restore from my backup. Unfortunately, I discovered that my backup protocol had propagated that corruption across my backups–I had no clean backups. This caused me to rethink my approach to backups.
Document level precautions
To start, there are a couple of things that you can do at the document level to reduce the risk of losing your work–what I’m calling here “Document level precautions.” These precautions, while important, are not backups, and will do little to protect you in the event of hardware failure.
Track Changes in Word
Depending on the word processor you are using, you may have the option of turning on a versioning or track changes feature. In Microsoft Word, the track changes feature allow you to see changes made to your document and then more easily roll changes back. This can be helpful if you’ve completely rewritten a section and then decide you want it back the way it was before.
A word of caution, however. Microsoft Word has been known to corrupt documents, particularly large documents when the track changes feature is turned on .
Each day as I start working on a research project, I open the main document in Word and immediately choose “save as” from the file menu and update the file name with the current date. After I have saved the new file, I only work from that file. If I make a mistake, corrupt the document, anything, I’ve only done it to today’s document, and I can always revert to yesterdays. I NEVER edit the previous day’s document.
I followed the same rule with my SPSS data, syntax, and output files as well. I would then move older copies of these files into an archive folder that I could go back and reference at any time. One of my basic assumptions with this practice was that the files aren’t particularly big, and I had a significant amount of free space on my hard drive. If you look at the sizes of the data files, you can see that this can quickly grow out of control, depending on your sample size and number of variables.
The first level of backups are those that are made at the local level–meaning that the backup exists in the same general vicinity of your original data.
I wrote both my masters thesis and dissertation on a MacBook Pro, so my baseline backup consists of incremental backups to an external hard drive using Apple’s Time Machine feature. Time Machine works by scanning your computer every hour and saving any changes made to files. The software keeps these hourly backups for 24 hours, then keeps a daily backup for the previous month, and weekly backups before that as far back as you have space on your external drive . Newer versions of macOS also make local snapshots to the internal drive of MacBook or MacBook Pro while you are not attached to the Time Machine drive .
One of the advantage to this type of backup is that once set up, there is very little maintenance that needs to be done. It just works. Also, because it only copies files that have changed, there is actually very little to copy after the initial set up.
A big risk with this type of backup is that your original and backup are store next to each other. In case of a fire, you computer and your backups would both be destroyed. Time Machine also allows you to save to a networked drive , but this drive, without some tinkering, must be on your local network, meaning it too is at risk if your main drive is damaged or destroyed.
Although I’ve talked only about Time Machine, there are similar options for Windows and Linux.
Although no longer true, early versions of Time Machine were not able to boot your computer. This meant that in the even of a catastrophic drive failure, you would have to reinstall the operating system, and then restore your data from the back up. This means a lot of wasted time while massive amounts of data are copied back to your computer. When writing, there is often a looming deadline, and no time to wait for hours to be able to work again. To address this issue, I started keeping a bootable clone of my computer. A bootable clone is exactly what it sounds like, a harddrive that is exactly the same as the internal drive, and is capable of booting the computer. In the event of catastrophic drive (or computer) failure, you can plug in the clone, boot up, and have every thing exactly as it was.
To accomplish this I bought a Seagate Backup Plus Portable Drive in the size of my internal hard drive. There are several options for creating bootable clones including Carbon Copy Cloner, SuperDuper, and Chronosync (There is a great article over at MacWorld reviewing each of these). I’ve tried all of these are various times, but used Synk Pro, a fantastic and robust syncing application for macOS (It appears that Synk is no longer available) since I found it best fit my needs (and you’ll see further on I used it for other portions of my backup scheme). Anytime I was working on my project, I would plug in the clone. Synk would start doing its work in the background. Once I was done working (and the cloning process had completed), I dismounted the drive and locked it in a fireproof safe.
Fortunately I never needed to use the clone, but it was there if I had needed it.
One of the risks with all of the backup methods discussed so far is that they all rely on a physical hard drive that is stored onsite with the original. In the event of a fire, flood, or theft, chances are that the backup would be damaged or taken as well. Online backups provide security against those risks.
There were two different ways that I used online tools to back up my data
Microsoft’s OneDrive is a cloud-based document storage application that syncs with your local hard drive. I actually worked in this folder so that each time I saved my dissertation, it was seamlessly uploaded to Microsoft’s servers. Then, in the event that my laptop was every stolen or destroyed, I could log in to my OneDrive and redownload the files, or sync the dissertation folder with my new computer.
DropBox is another cloud-based drive system. I used the free version of DropBox, which provides limited storage space, to store a synced copy of my dissertation folder on DropBox. This was another time I used Synk Pro. In the event of a failure of OneDrive, or unexpected downtime, I could also log into DropBox and download or sync and have a current version of my dissertation. I never actually worked on my documents directly from OneDrive to prevent any accidental corruption of the data stored there.
Failsafe–or paranoid steps
By the time I was several years into my dissertation, I started to worry more (too much?) about what it would mean if I lost my data. Although I had implemented all of the measures above, I still felt like I needed a failsafe to ensure that there was no way that I could lose my work. So I had a couple more tools up my sleeve.
Noddlesoft’s Hazel is a tool that watches folders on your Mac and performs customs actions based on rules. I used Hazel to watch my dissertation folder, and once a week, archive the entire folder and then copy the resultant zip file to my personal server and OneDrive. Further, Hazel watched the archived folder, and when any archive got over 90 180 days old, it would move it to a folder for me to review and delete, if I needed the space.
As I hit milestones in the process, I would also email a copy of the dissertation document itself (not all the supporting data files) to both my spouse and a trusted friend. This way critical copies of the document existed in unrelated email accounts as well as my own sent box.
Finally, when I flew from Washington to Florida to defend my dissertation, I also copied my entire dissertation folder to two thumb drives. One that I carried and one that my wife carried. Just in case anything else happened that I hadn’t forseen.
It is a good idea to actually verify the data on your backups on a regular basis, including booting up from your clone to ensure it actually works. A backup does you no good if you can’t actually use the recover the data when you need it.
Ultimately, your backup strategy is a balancing act between laissez faire and paranoia. No matter how thoroughly you’ve planned your backups, there are always potential weak spots. And it is possible to spend too much time and money creating too many copies of the data. If you take the time to plan your protocol before you start, you’ll be prepared in the event of a disaster, and you won’t lose as much time as you would otherwise.
How did you manage backups for a big project? Comment below.
Cite this article as:
Robert Allred, "Backups," in Robert P. Allred, PhD, April 12, 2017, http://doctorallred.com/2017/04/backups/.
Allred, R.P. (April 12, 2017). Backups [Weblog post]. Robert P. Allred, PhD. Retrieved June 24, 2017 from http://doctorallred.com/2017/04/backups/