Help me implement version control and offsite backups for my thesis
September 8, 2011 2:06 PM Subscribe
I'm just starting my undergraduate thesis, and would like a backup method a bit more reliable then my standard 'ThesisV3.tex' + emailing it to myself method. I've never worked with version control before, though I'm sure I can figure that out. I am however worried about *where* to have my offline backup sync too.
I'm just starting my undergraduate thesis in chemistry and would like to keep a current backup of all my files as I go. Some form of version control seems obvious, but I'm not sure how I would go about setting this up, and where I would backup my files to.
Alright, here are my requirements:
-Has to handle all types of data: I will have tex files, which are plain text, and would like change tracking in those. I'll also have PDF files of papers, image files, and folders of NMR data. I need these all to be preserved, including directory structure and everything. I'd like it to keep versions of the images and binary blobs as well, so that I don't have to change the filename in the LaTeX file every time I make a new version of the spectrum or diagram, I can just save over and recompile, but still be able to go back to the old version if I want to for some reason.
-I need to be able to work on all my files while offline, then sync them back. It is not uncommon for me to work when I don't have internet, so I don't want to be locked out of my files.
-Related, if I get booted off the sever during a synchronization I don't want everything screwed up. Uni wireless kicks me off every 3 hours. I really don't want to lose everything due to it kicking me off halfway through a sync.
-I'm not sure where/how to put the remote copy. I've got a friend who has some webspace with Powweb.com, is there anyway I can set up a version control system in that? If not are there cheap/free services for this?
-My laptop is Windows 7, though I've got experience in *nix environments (Probably obvious from the fact I'm using LaTeX) Her websapce is a Unix environment of some sort. I have no problem with installing Cygwin or Minsys or whatever.
If there isn't an easy way to set this up without paying $$$ for server space somewhere I guess I could use her webspace with an FTP client and my traditional naming system or something.
Thank you for your help,
--Canageek
I'm just starting my undergraduate thesis in chemistry and would like to keep a current backup of all my files as I go. Some form of version control seems obvious, but I'm not sure how I would go about setting this up, and where I would backup my files to.
Alright, here are my requirements:
-Has to handle all types of data: I will have tex files, which are plain text, and would like change tracking in those. I'll also have PDF files of papers, image files, and folders of NMR data. I need these all to be preserved, including directory structure and everything. I'd like it to keep versions of the images and binary blobs as well, so that I don't have to change the filename in the LaTeX file every time I make a new version of the spectrum or diagram, I can just save over and recompile, but still be able to go back to the old version if I want to for some reason.
-I need to be able to work on all my files while offline, then sync them back. It is not uncommon for me to work when I don't have internet, so I don't want to be locked out of my files.
-Related, if I get booted off the sever during a synchronization I don't want everything screwed up. Uni wireless kicks me off every 3 hours. I really don't want to lose everything due to it kicking me off halfway through a sync.
-I'm not sure where/how to put the remote copy. I've got a friend who has some webspace with Powweb.com, is there anyway I can set up a version control system in that? If not are there cheap/free services for this?
-My laptop is Windows 7, though I've got experience in *nix environments (Probably obvious from the fact I'm using LaTeX) Her websapce is a Unix environment of some sort. I have no problem with installing Cygwin or Minsys or whatever.
If there isn't an easy way to set this up without paying $$$ for server space somewhere I guess I could use her webspace with an FTP client and my traditional naming system or something.
Thank you for your help,
--Canageek
Best answer: git, and your own git server or github (micro plan, perhaps).
I set this up for a researcher, for his tex files, pdf, data, images, source code, etc.
I set up a git server on one of my domains, but github might be less trouble for you.
Works fine for the researcher in Windows XP and 7.
posted by the Real Dan at 2:14 PM on September 8, 2011 [6 favorites]
I set this up for a researcher, for his tex files, pdf, data, images, source code, etc.
I set up a git server on one of my domains, but github might be less trouble for you.
Works fine for the researcher in Windows XP and 7.
posted by the Real Dan at 2:14 PM on September 8, 2011 [6 favorites]
Dropbox can do most of what you need really easily. It will automatically keep your local copy in sync (the whole directory structure) with its servers, and you can even access it from multiple machines as well as through a web interface. Dropbox will also let you access previous versions of a file.
It won't however, let you revert the entire directory back to a previous version. For that you will need a real version control system. One option (if you're willing to invest a little time to learn) is to put it up on github. You only need to learn 4-5 git commands to get this working. This will give you a remote backup as well as full version control.
posted by Idle Curiosity at 2:14 PM on September 8, 2011
It won't however, let you revert the entire directory back to a previous version. For that you will need a real version control system. One option (if you're willing to invest a little time to learn) is to put it up on github. You only need to learn 4-5 git commands to get this working. This will give you a remote backup as well as full version control.
posted by Idle Curiosity at 2:14 PM on September 8, 2011
Seconding dropbox. I wish it had been around when I was writing my thesis!
posted by unlaced at 2:14 PM on September 8, 2011 [1 favorite]
posted by unlaced at 2:14 PM on September 8, 2011 [1 favorite]
I also use a date in file names, but I use a prefix (YYYYMMDDfilename) for more convenient sorting. It's quick and dirty and doesn't require me to learn anything. For backups, I use a chain of external hard drives that are all paired (backing up all data twice in case one drive in the chain fails), but I don't recommend it for other people. I like the dropbox idea.
posted by yeolcoatl at 2:15 PM on September 8, 2011
posted by yeolcoatl at 2:15 PM on September 8, 2011
Do you have access to any university computing resources?
At my school, I have access to a group of linux servers, and it's really easy to set up git/svn repositories on them. This is the standard way for students in my department to collaborate with each other / their professors on papers. (And it sounds like you really want to use a true version control system, rather than dropbox)
posted by Metasyntactic at 2:16 PM on September 8, 2011
At my school, I have access to a group of linux servers, and it's really easy to set up git/svn repositories on them. This is the standard way for students in my department to collaborate with each other / their professors on papers. (And it sounds like you really want to use a true version control system, rather than dropbox)
posted by Metasyntactic at 2:16 PM on September 8, 2011
I run git locally on my computer, and git's repository is in my dropbox folder. Works like a charm, fits all your requirements except tracking changes in binary data (pdf, images, etc.). I don't think any source control system can handle anything other than text.
posted by tempythethird at 2:44 PM on September 8, 2011 [2 favorites]
posted by tempythethird at 2:44 PM on September 8, 2011 [2 favorites]
tempythethird's suggestion was exactly what I was coming here to suggest. You make little changes and working commits in a local folder, then push to your dropbox folder. If you go to another computer, you install dropbox, git pull from dropbox to another local folder, then push when done.
GitHub is probably more user-friendly though, so if you don't mind coughing up the small amount of cash, that's likely the way to go. Probably easier to solicit feedback too, since it's online.
Also of note: all version control systems can handle binary data, in fact. Most don't have very useful diffs, but that's really a tooling problem. For images, e.g., my SVN client just shows the two revision side-by-side, but I've used clients that try (with varying success) to do a visual "diff," drawing attention to the changed areas. In the end it won't really matter, though.
As for Windows Git clients, I use and like Git Extensions.
posted by Jacen Solo at 3:00 PM on September 8, 2011 [1 favorite]
GitHub is probably more user-friendly though, so if you don't mind coughing up the small amount of cash, that's likely the way to go. Probably easier to solicit feedback too, since it's online.
Also of note: all version control systems can handle binary data, in fact. Most don't have very useful diffs, but that's really a tooling problem. For images, e.g., my SVN client just shows the two revision side-by-side, but I've used clients that try (with varying success) to do a visual "diff," drawing attention to the changed areas. In the end it won't really matter, though.
As for Windows Git clients, I use and like Git Extensions.
posted by Jacen Solo at 3:00 PM on September 8, 2011 [1 favorite]
Just a word about the value of using a proper version control system with a large LaTeX project... You will inevitably do something to mysteriously fuck up a tex file so that the project no longer compiles. At that point you can ask version control, "What did I change since yesterday?"
I don't think any source control system can handle anything other than text.
Depends on what you mean by "handle". You can certainly check in binary files and roll back to previous versions. But "merge" won't work, and "diff" will just tell you whether two versions of a file are identical.
I've never used any of the many free version control hosting services, but if you have ssh access to any Unix box that has subversion installed, you can use the svn+ssh repository access method to access a remote repository. No special server setup needed.
posted by qxntpqbbbqxl at 3:09 PM on September 8, 2011
I don't think any source control system can handle anything other than text.
Depends on what you mean by "handle". You can certainly check in binary files and roll back to previous versions. But "merge" won't work, and "diff" will just tell you whether two versions of a file are identical.
I've never used any of the many free version control hosting services, but if you have ssh access to any Unix box that has subversion installed, you can use the svn+ssh repository access method to access a remote repository. No special server setup needed.
posted by qxntpqbbbqxl at 3:09 PM on September 8, 2011
Best answer: Whoa. Git and dropbox? I suppose that's not bad. Feels weird, though. Especially if you have dropbox doing its version tracking stuff, too.
I came in to suggest github with their small private plan.
Works like a charm, fits all your requirements except tracking changes in binary data (pdf, images, etc.). I don't think any source control system can handle anything other than text.
git tracks binary objects. And it does it just fine. If I check out an old version, I get my old binary files too.
What git won't do, and what basically no program can do, is human-useful binary diffs. So, you can tell what files have changed, but git won't tell you what inside those files changed. Not git's fault, though. How's it supposed to know what the fuck a ".fraggle" file is?
posted by Netzapper at 3:11 PM on September 8, 2011 [2 favorites]
I came in to suggest github with their small private plan.
Works like a charm, fits all your requirements except tracking changes in binary data (pdf, images, etc.). I don't think any source control system can handle anything other than text.
git tracks binary objects. And it does it just fine. If I check out an old version, I get my old binary files too.
What git won't do, and what basically no program can do, is human-useful binary diffs. So, you can tell what files have changed, but git won't tell you what inside those files changed. Not git's fault, though. How's it supposed to know what the fuck a ".fraggle" file is?
posted by Netzapper at 3:11 PM on September 8, 2011 [2 favorites]
Seconding Dropbox and Github.
Dropbox is a little easier to just get going with, and it's automatic too.
GitHub is very sweet for having a little control and oversight over your edits of the text-based documents, such as the TeX stuff. You can have a private repository, and it doesn't cost much.
I'd just enable your thesis directory as a Dropbox folder right away. I don't think it'd even cost anything. You can have it autosync down to another computer if you like; You'll never be locked out of your files.
Then have a quick look at Github to determine if it adds anything for you.
posted by krilli at 3:13 PM on September 8, 2011
Dropbox is a little easier to just get going with, and it's automatic too.
GitHub is very sweet for having a little control and oversight over your edits of the text-based documents, such as the TeX stuff. You can have a private repository, and it doesn't cost much.
I'd just enable your thesis directory as a Dropbox folder right away. I don't think it'd even cost anything. You can have it autosync down to another computer if you like; You'll never be locked out of your files.
Then have a quick look at Github to determine if it adds anything for you.
posted by krilli at 3:13 PM on September 8, 2011
Slightly different than Git is Mercurial. Very similar(distributed source control) but (subjectively) easier to use with a friendlier user community. Also there is Bitbucket, which is the mercurial version of GitHub, that has a free entry account.
So for (paranoid?) me my setup is Tortoisehg(a frontend for Mercurial) using a repository sitting in Dropbox, occasionally pushing the changes to Bitbucket.
posted by Folk at 3:14 PM on September 8, 2011 [1 favorite]
So for (paranoid?) me my setup is Tortoisehg(a frontend for Mercurial) using a repository sitting in Dropbox, occasionally pushing the changes to Bitbucket.
posted by Folk at 3:14 PM on September 8, 2011 [1 favorite]
Best answer: (One of Git's advantages is that you have the entire history with you - offline! - on any computer you clone the Git repository to. So you can instantly rewind and shuffle revisions even when online. I don't think DropBox allows you to do this.
posted by krilli at 3:14 PM on September 8, 2011 [2 favorites]
posted by krilli at 3:14 PM on September 8, 2011 [2 favorites]
Best answer: I've written lots of tex documents lately with git, and been pretty happy with it. You might also take a look at Scribtex which is a google-docs-like web-based editor for latex documents. I've used it when I have co-authors who aren't comfortable with git or tex. It's backed by git, so if you want to use git you can, but you can also just pretend it's not there and use it like google docs. But it's all git underneath and you can do basically anything you want to with the git interface in terms of rolling back / comparing versions / whatever. Memail if you have specific questions about it - wrote a whole paper with that workflow and didn't have any major issues.
But when it's just me, I use github. If you take this route, you should know about --color-words. It's not as good as track changes, but is a big help in managing versions of non-code documents.
posted by heresiarch at 3:22 PM on September 8, 2011 [3 favorites]
But when it's just me, I use github. If you take this route, you should know about --color-words. It's not as good as track changes, but is a big help in managing versions of non-code documents.
posted by heresiarch at 3:22 PM on September 8, 2011 [3 favorites]
Response by poster: I want to avoid Dropbox after hearing about them giving police access to people's files WITHOUT a warrant, despite having said they would *only* give other people access with a warrant. It just gives me bad vibes about the company as a whole- If they break their word about one thing, they could very well break is about other things.
What Krilli says sounds good to me, exactly like what I want.
I wish I had a unix account somewhere, but my uni only gives us *15 MB* of email space with a web interface that dates back to the 90s. Good luck convincing them to give us anything fancy like SSH accounts to a sever. Some classes have that kinda stuff, and there are individual machines around campus you can ssh to, but nothing I could use for this.
posted by Canageek at 3:45 PM on September 8, 2011
What Krilli says sounds good to me, exactly like what I want.
I wish I had a unix account somewhere, but my uni only gives us *15 MB* of email space with a web interface that dates back to the 90s. Good luck convincing them to give us anything fancy like SSH accounts to a sever. Some classes have that kinda stuff, and there are individual machines around campus you can ssh to, but nothing I could use for this.
posted by Canageek at 3:45 PM on September 8, 2011
github+crashplan or jungledisk would be my recommendation
posted by iamabot at 4:04 PM on September 8, 2011 [1 favorite]
posted by iamabot at 4:04 PM on September 8, 2011 [1 favorite]
Best answer: Your concerns about Dropbox aside, I do not recommend letting Dropbox synchronize a Git repository. I've had issues with Dropbox mangling line-endings and other such weirdness, even when I was only using a single computer and Dropbox as a lightweight backup tool.
I wish I had a unix account somewhere
The AWS Free Usage Tier is an option. It's basically one year of free VPS service. Its capabilities reflect its price tag, but for hosting some Git repositories, it should be more than adequate.
Alternatively, you could use Mercurial and BitBucket. They're similar to Git and GitHub, respectively, though I think Mercurial is noticeably easier to use. BitBucket provides an unlimited number of free private repositories (in contrast to GitHub).
posted by ddbeck at 4:16 PM on September 8, 2011 [1 favorite]
I wish I had a unix account somewhere
The AWS Free Usage Tier is an option. It's basically one year of free VPS service. Its capabilities reflect its price tag, but for hosting some Git repositories, it should be more than adequate.
Alternatively, you could use Mercurial and BitBucket. They're similar to Git and GitHub, respectively, though I think Mercurial is noticeably easier to use. BitBucket provides an unlimited number of free private repositories (in contrast to GitHub).
posted by ddbeck at 4:16 PM on September 8, 2011 [1 favorite]
Response by poster: Mercurial seems interesting and I'm reading about it: Does it store offline copies in the same way that Git does? i.e. If I lose internet will I still be able to work on all my files?
posted by Canageek at 4:38 PM on September 8, 2011
posted by Canageek at 4:38 PM on September 8, 2011
Mercurial is a distributed version control system and, while I've never used it (I'm a git person), it *should* be able to work offline.
posted by the dief at 5:36 PM on September 8, 2011
posted by the dief at 5:36 PM on September 8, 2011
Ubuntu One is but one of many open-source Dropbox alternatives.
If you use git, consider using flashbake to handle automatic check-ins.
posted by LogicalDash at 6:27 PM on September 8, 2011
If you use git, consider using flashbake to handle automatic check-ins.
posted by LogicalDash at 6:27 PM on September 8, 2011
git is perfectly happy working offline all the time. If you don't want to set up a git server, you could just put the "source tree" in a local folder, then keep said folder backed up by other means, like rsync.
posted by LogicalDash at 6:36 PM on September 8, 2011
posted by LogicalDash at 6:36 PM on September 8, 2011
Best answer: One point which I don't think has been made explicitly above (though it's implicit all over the place): version control and backup are essentially orthogonal. Deal with them separately.
For example: Run a version control system on your personal computer. Edit your files, commit them, revert them, go to town. (No need for the network.) Then, when you want to back up, tar up your version control repository, encrypt it, and upload the encrypted file somewhere. (And/or copy it to a USB key and put that USB key someplace different than your computer.) (Of course you need some off-site backup of your encryption keys.) (And of course to save space and upload time, maybe you want to tar up the diffs only, or an rsync batch file, or something.)
posted by stebulus at 7:55 PM on September 8, 2011 [3 favorites]
For example: Run a version control system on your personal computer. Edit your files, commit them, revert them, go to town. (No need for the network.) Then, when you want to back up, tar up your version control repository, encrypt it, and upload the encrypted file somewhere. (And/or copy it to a USB key and put that USB key someplace different than your computer.) (Of course you need some off-site backup of your encryption keys.) (And of course to save space and upload time, maybe you want to tar up the diffs only, or an rsync batch file, or something.)
posted by stebulus at 7:55 PM on September 8, 2011 [3 favorites]
Best answer: Mercurial seems interesting and I'm reading about it: Does it store offline copies in the same way that Git does? i.e. If I lose internet will I still be able to work on all my files?
Yep. I use Mercurial and Git on a regular basis. They're nearly equivalent in terms of what they're capable of (and it's possible to losslessly convert repositories between the two). The differences between them are pretty minor (Git is reportedly faster and handles files larger than 2GB, Hg has a friendlier CLI and better Windows support).
posted by ddbeck at 8:57 PM on September 8, 2011 [1 favorite]
Yep. I use Mercurial and Git on a regular basis. They're nearly equivalent in terms of what they're capable of (and it's possible to losslessly convert repositories between the two). The differences between them are pretty minor (Git is reportedly faster and handles files larger than 2GB, Hg has a friendlier CLI and better Windows support).
posted by ddbeck at 8:57 PM on September 8, 2011 [1 favorite]
I don't know how the Git tools are on Windows, but I know that for Subversion, TortiseSVN on windows was pretty sweet when I last used it a couple of years ago.
If you use Subversion, just set it up with a local filesystem repository (that way you have your full version history on your laptop) and use something like Crashplan or rsync run from command scheduler or something to back it up offsite.
Or, do the same with Git. Paying for git hosting is unnecessary.
posted by Good Brain at 11:03 PM on September 8, 2011
If you use Subversion, just set it up with a local filesystem repository (that way you have your full version history on your laptop) and use something like Crashplan or rsync run from command scheduler or something to back it up offsite.
Or, do the same with Git. Paying for git hosting is unnecessary.
posted by Good Brain at 11:03 PM on September 8, 2011
Best answer: Try this and see if you like Mercurial (Hg) or not: HgInit, by Joel Spolsky. You are doing mostly 'straight-line' versioning, and Hg/Bitbucket will be splendid for that.
I happen to prefer Git for arcane, pedantic reasons* (and GitHub to Bitbucket), but for what you use, they would be both be splendid, and I would probably recommend Hg for simplicity. While you're at it, use the bug tracker at each site as your punch list for what's 'left to do' for free :)
Being able to 'diff' with previous versions will change your life!
re: ddbeck, to echo, in either, your copy is always a 'full-copy' and which repo (yours or the online one) is 'authoritative' only because you claim it to be so.
* arcane reasons: I happen to do a lot of private history rewriting, easier in git, and I don't like how Hg treats branches as 'full copies', though this is changing. Bitbucket is also quite a bit 'rougher' in UI. Git is also *much more popular*.
posted by gregglind at 9:33 AM on September 9, 2011 [1 favorite]
I happen to prefer Git for arcane, pedantic reasons* (and GitHub to Bitbucket), but for what you use, they would be both be splendid, and I would probably recommend Hg for simplicity. While you're at it, use the bug tracker at each site as your punch list for what's 'left to do' for free :)
Being able to 'diff' with previous versions will change your life!
re: ddbeck, to echo, in either, your copy is always a 'full-copy' and which repo (yours or the online one) is 'authoritative' only because you claim it to be so.
* arcane reasons: I happen to do a lot of private history rewriting, easier in git, and I don't like how Hg treats branches as 'full copies', though this is changing. Bitbucket is also quite a bit 'rougher' in UI. Git is also *much more popular*.
posted by gregglind at 9:33 AM on September 9, 2011 [1 favorite]
This thread is closed to new comments.
posted by roofus at 2:11 PM on September 8, 2011 [1 favorite]