To git or not to git, that is the question.
May 4, 2011 1:16 AM   Subscribe

I need help understanding git. Or, deciding which version control system would be best for our needs.

My company is developing standards for document versioning (mostly Word files for now), and it's leading to a lot of manual effort (and mistakes) to keep the versioning consistent (we name things like "XYZ - v1.doc" and then increment the version as we make "major" changes (more than a few words). We also use Track Changes in Word. A complicating factor is that my team is mostly working off servers in Europe, but we're in the U.S. Adding and making changes to files and filenames is a slow and tedious process on these European servers (takes 60-90 seconds minimum to copy a file to our desktops, renaming files on the network is also a PITA due to the lag). Often, documents open in the midst of copying from the network to our desktops. Unfortunately, better performance is probably not an option at this point (we have to contend with some rather stodgy network admins).

So, I suggested (naively, as I don't know anything much about git) that we investigate a solution such as git. Perhaps there is even a better solution than git. We're all fairly techy people (we do some coding as part of our job), so we could take on a fairly techy solution, though obviously maximum ease-of-use is preferred.

So, my questions are:
1) What's the best way to get up to speed about git (preferably from a not terribly technical perspective, though we will obviously go to that level of detail if we want to actually install/configure/use)? This is so I can make a basic case with my boss for using git.

2) Does git require its own server? If so, this might not help us with the lag issue as we'd need to host it on the European server (or U.S., but that'd make things difficult for the folks in Europe, and we're the smaller team).

3) How does git handle versioning? Can we be specific about how we name our files? Or would this not even be necessary (some kind of background version saving (excuse the extremely non-technical vocab))?

4) What about check-in/check-out (file locking)? Can we work concurrently and merge changes (Word doesn't seem to do this terribly well, repeating words when we merge changes...suggestions welcome on how to avoid that)?

If there are solutions other than git we should take a look at given our needs, please feel free to mention them here! We'd like to keep things on our own servers, but if the case was strong enough, we could consider a secure web-based doc repository at a reasonable cost.

Finally: For the short term - I've been considering writing my own scripts to copy files to/from my desktop to a network folder (probably using AutoHotkey), so if anyone has any code/ideas for such, that would be a great help.

Thanks, Mefites!
posted by xiaolongbao to Technology (21 answers total) 6 users marked this as a favorite
 
I think git is complete overkill for what you need, plus you aren't going to be able to use most of it's more powerful (and useful) features because your documents are effectively opaque binary files as far as git is concerned: git is designed to work with plain text documents of one form or another.

To answer your questions:

1) http://book.git-scm.com/index.html
2) No, although you can have one if you want one.
3) Versions are identified hashing the SHA1 hashes of all the files in the stored directory tree at the time of checkin. This hash is stored alongside the hashes of the parent checkin(s) so that the full history can be read.
4) git doesn't lock files. Git will merge changes in text documents, but if it doesn't know how to merge changes in word documents (there may be a tool you can use to do this, but I'm not aware of one) then it won't be able to help you here. Here be dragons.

Git's Windows integration is not the best, perhaps unsurprisingly given it's origins. Of the open source version control systems, I think mercurial (which uses a similar hashing scheme to git) is probably the better bet if you need good windows support.

It does sound like some kind of revision control scheme would help your company, but if you need file locking and decent windows integration then you may be better off with something else.
posted by pharm at 2:07 AM on May 4, 2011


I agree that git really would be a lot more hassle than it's worth for your use case. It won't be able to diff or merge binary blobs, so essentially you would just be using it as a fancy object store. I also disagree about not needing a server. You can certainly use git locally without a server, but if you want to share your files with other people you need some kind of network protocol. You could in theory do this just by putting the git repo on a SMB/CIFS shared drive but given your bandwith and latency issues that would be insane. The typical ways that you share git repositories are with ssh or the git protocol. And since it sounds like Windows machines are involved, that probably means the latter. Running a ssh daemon on windows is really easy and straightforward with Cygwin -- there's even a nice script that sets everything up for you -- but it still seems to scare people off. You can also serve read-only over http.

I've been considering writing my own scripts to copy files to/from my desktop to a network folder (probably using AutoHotkey), so if anyone has any code/ideas for such, that would be a great help.

Assuming these are on a SMB/CIFS share, there's no reason to need any external software, you can just copy \\servername\sharename\dirname\filename.ext c:\local\path\ from a command prompt or batch file.
posted by Rhomboid at 2:19 AM on May 4, 2011


If you can work in text-based markup, like HTML or Latex, then you can make Git or other version control systems work really well for you in this situation and context.

Then also, Google Docs is pretty good for collaborative document editing.
posted by krilli at 2:28 AM on May 4, 2011 [1 favorite]


git doesn't do checkin/checkout, it assumes instead that any introduced conflicts will be merged. while it only ships with tools for doing this to text files, you can plug other mergers in using the merge.tool config item.

and, apparently you can tell Word to do some diff/merge thing it has built in - TortoiseSVN specifies diff-doc.js as the differ for doc/docx files, though I have no idea how useful it is in the real world. if you configured git with merge.tool pointing at that script you might (or might not) get something useful.
posted by russm at 2:58 AM on May 4, 2011


I think that script just invokes Office to show the difference between files. It can't handle merging russm.
posted by pharm at 3:33 AM on May 4, 2011


Everything above is correct. Git doesn't really meet your desired usage without a lot of 'maybe if you do/use X' to get it to do things like merging Word documents, or giving you a simple workflow without pushing and pulling and branches.

You probably want Subversion. It will have the same 'no merge for binary Word blob documents' problem, but gives you a simple centralized server and the ability to 'check out with a lock' a document (and I think it still does a binary diff attempt when transferring changes). This is what I remember from various Subversion vs Git wars, and for things like movies, media, binary blob, things that are not mergeable, that really can only be edited by one person at a time... Subversion wins.

There would be a server, then each 'client' (US/EU) would have a checked out copy of the tree and workflow goes basically like this:
svn update    # pull in any new changes from the server that you don't have
# your copy of 'stuff' now matches the server.
svn checkout -l a_file    # tell the server to not let anyone else edit this file for now.
# edit the file
svn commit -u a_file  # unlock the file and push your changes to the server
(Not sure of the locking commands, I haven't had to use them yet. Assume they're close to the RCS/CVS origins of Subversion)

Subversion will try its best to send the minimal amount of data across the network, it will at least attempt to find the smallest binary patch between the version you originally checked out and the one you are commiting and only send that data to the server.

Subversion supposedly has very good Windows support, at least on the client side of things (TortiseSVN) and in some cases can be setup as a plain WebDAV type of system.

For actual questions: Pro Git is a good place to start for the Git mechanics. No change in filename needed. Git doesn't do locking and can't really merge your Word doc changes well (bobs don't do merges well in any VCS).

In either Git/Subversion, the change information is in a commit. Git uses SHA1 hashes and Subversion uses a single integer that increases (in jumps). So you would be able to do the appropriate VCS info command to find the revision information. Say 'svn log a_file' to get the history of changes and 'svn info a_file' to find the latest revision number, change dates, change author, etc) The various Windows GUIs probably make this very easy but I've never used them.
posted by zengargoyle at 4:08 AM on May 4, 2011 [2 favorites]


Some of the guys that came from from big companies that I used to work with used to bitch and moan that we didn't have Clearcase, which can apparently handle word docs pretty well. We had SVN, which isn't great for binary blobs.
posted by Joe Chip at 4:18 AM on May 4, 2011


To add to the chorus, version control systems (VCS) are for versioning text files. Word docs are binary files. You're looking for a document management system (DMS). I believe SharePoint is the Mircosoft solution to the problem, but I have no experience of it.
posted by Leon at 4:21 AM on May 4, 2011


Sharepoint is vile, but it does handle locks and checkouts reasonably well.
posted by scruss at 4:40 AM on May 4, 2011


You may want to try Text flow from Nordic River, we've had a lot of success at our company with it. It allows you to merge changes in Word in a better way that Word does.

It also integrates with Google App Engine.

As others have suggested you may want to see if Google Docs can do some of what you want.
posted by sien at 4:48 AM on May 4, 2011 [1 favorite]


I'd agree that you want something more akin to document management systems than version control systems. SharePoint is the big name, but Alfresco is a very solid open source replacement. You may still have some issues with responsiveness over that kind of distance, but being open source, you could certainly try it at a low cost.
posted by advicepig at 6:30 AM on May 4, 2011


Would it be possible to copy all the data from Europe to one desktop en masse, then use that desktop as a "server" to share with the peers in your office? Then several times a day, systematically write the locally cached copies from the desktop back to Europe. Seems like just getting the data closer and under more local control would open a lot of options. As long as you're each spokes from a distant hub, collaboration is going to be tedious.
posted by bendybendy at 6:55 AM on May 4, 2011


Sharepoint is in many, many ways meant for this.

Depending on your integration needs, it may or may not be a free download from MSFT. It's been ~4 years since I did anything with sharepoint, and I did not use/look at the Office integration, but saw that it was a powerful feature.
posted by k5.user at 7:10 AM on May 4, 2011


I agree with Leon (and others) above that this is less of a VCS/DVCS problem and more of a document management system (DMS) or enterprise content management (ECM) problem. An alternative to Sharepoint that I can vouch for is Jive SBS (formerly Clearspace).
posted by togdon at 7:23 AM on May 4, 2011


Jive SBS
I swear I got that right the first time...
posted by togdon at 7:25 AM on May 4, 2011


Yeah, SharePoint in general is a big pain, but it does handle versioning and collaboration on Office documents very well. If you have Office 2010 it even supports simultaneous editing; depending on your use case that may justify the whole thing.
posted by phoenixy at 7:43 AM on May 4, 2011


Adding to the chorus of "Sharepoint is kind of a pain, but this is what it's for."
posted by capnsue at 8:24 AM on May 4, 2011


You could also use Assembla for this, which has a nice plugin module to share and keep synced various media files, such as Word docs and images. It is basically a very-easy-to-use web-based VCS. Take a look, it might be what you need.

I use it for clients to make sure we all know what is the latest version. Nothing is worse than realizing you've been working off an old version of the statement of work.
posted by Invoke at 8:59 AM on May 4, 2011


Some of the guys that came from from big companies that I used to work with used to bitch and moan that we didn't have Clearcase, which can apparently handle word docs pretty well. We had SVN, which isn't great for binary blobs.

Indeed, CC does make this easier. The downside is that it usually requires more administration than git or SVN and it has a reputation for being slow and expensive (both of which were true when I was using it).
posted by Mad_Carew at 9:03 AM on May 4, 2011


Office365 from Microsoft would handle your versioning issues without needing to support a SharePoint environment.
posted by blue_beetle at 10:23 AM on May 4, 2011


Response by poster: Thanks y'all, for your extremely detailed and helpful responses. This has given us a lot to chew on. We haven't settled on anything yet, but I'll come back and mark the best answer(s) when we do (though they are all great answers, frankly).
posted by xiaolongbao at 12:12 PM on July 10, 2011


« Older "Lying by my side" in Hedwig?   |   Cabo Verde Newer »
This thread is closed to new comments.