Can a Git Branch Contain Only a Sub-Set of the Repository?
September 24, 2013 8:21 PM   Subscribe

I'm new to Git but as I understand it, a branch is typically for working with a copy of all the files. Would it work for what I have in mind, if the branch only contains a sub-set of those files?

Hello,

I hope this group is a good place to ask questions like this. I'm new to version control, but I'm quickly finding Git to be a great way to manage my collection of writings. I do have a question, though. I have, for example, a repository that contains 200 .txt files. Each one is the text of a poem. Now, I'd like to set aside 20 of those files, for example, to create a manuscript. While I'm working with those 20 files and thinking of them as a group, I may make changes to them. I would like, of course, for those changes to be reflected in my "master" repository of all 200 files. Is there a way to manage such a thing with a Git branch? As I understand it, a branch is typically for working with a copy of all the files. Would it work for what I have in mind, if the branch only contains a sub-set of those files?

Thanks for any advice you may have,

Dylan
posted by dylan_k to Computers & Internet (9 answers total) 4 users marked this as a favorite
 
Is there any particular reason the branch would need to not include all the files?
posted by entropic at 8:45 PM on September 24, 2013


If you're concerned about storage or some other performance overhead, don't be. While it is the case that git keeps track internally of every version of every file in the repository, when you create a branch, git doesn't create new internal copies of any of the files in your repository until you change something and commit that change, and then it will only store new copies for the files that you did change.

Otherwise, I'm not sure what the purpose of this would be. As far as it goes, you can just create a branch and only modify those twenty files and everything will proceed as normal. When you've reached a stable point, you can then check out your master branch and pull the changes from your manuscript branch, and as long as there no conflicting changes between the manuscript and master branches, this will also go perfectly smoothly.

Given all that, the strictly literal answer to your question is that you can create a branch, check out that branch (not sure if you're aware of the convenience method of creating a branch and checking it out all in one command with git checkout -b <branch_name>, but it's quite useful), and then use git rm <file> to get rid of the 180 files that you don't care about in the manuscript branch. This is tedious, and if you try to merge your changes from the manuscript branch back into your master branch it will mean that your master branch will have all of those files removed, too. Obviously they're still there if you check out an earlier commit, but it'll still be a pain. I recommend just creating a branch like usual and just making the modifications that you need to, without worrying about the files that you don't intend to change.
posted by invitapriore at 9:30 PM on September 24, 2013 [3 favorites]


Assuming it's for clarity of humans rather than computers, a git sparse checkout might do the thing you're looking for - this guide should be able to help you get started
posted by jaymzjulian at 9:30 PM on September 24, 2013 [2 favorites]


Usually, if you make a branch using git, and then try to merge changes from that branch to master, master will become identical to the branch. That's a problem for you, because when you merge the subset branch to master, your non-subset files will be deleted in master.

The solution is to convince the master branch that it's up-to-date with respect to the subset branch, even though they're not in fact identical. You do that by performing one merge from subset to master, but then modifying the merge so it doesn't actually delete your files. Now master knows that you want to ignore the deleting bit, and so subsequent merges will work fine.

Here's an example of how to do this. Let me know if there are parts that aren't clear enough.
posted by vasi at 10:00 PM on September 24, 2013 [1 favorite]


Another option would be to git mv the files to another directory, work on them, and then move them back and/or delete them. This would mess up your version history though -- Git doesn't directly track file renames (it will show them as such in the log, but only because it notices that one file disappeared and another file with the same contents was created). I probably wouldn't recommend doing it this way, though.

Other than that, yeah, what invitapriore said. If organization is your goal, just create a sub-branch for each task and merge them into the main branch when you're done. If you've deleted the files in the sub-branch (like for example, if you've combined them into a separate manuscript and you're done with them), they will be deleted in the main branch when you merge. For each branch, Git can tell you which files you've changed in case you've lost track: git diff --name-only master..my-branch

(re: vasi's answer -- I'd been wondering how to do that myself!)
posted by neckro23 at 10:13 PM on September 24, 2013


It sounds like you really might want to fork a new repository, not branch the existing one. You'd remove files from the fork.

It is possible to push/sync changes from the master/"parent" to the new "child" fork, but extra steps are needed to go the other direction, so this reduces the likelihood of messing up the master.
posted by Blazecock Pileon at 10:21 PM on September 24, 2013


Best answer: vasi is basically right, but there's an easier way to do it using git's merge strategies.

# (on master)
git checkout -b subset
git rm file1.txt file2.txt # (remove the files you don't want on this branch)
git commit -m 'removed some stuff'
git checkout master # (go back to master)
git merge --strategy ours subset # (record a merge from the subset branch, but make no actual changes to master)
git checkout subset
# (edit file3.txt)
git add file3.txt
git commit -m 'edited file3'
git checkout master # (back to master again)
git merge subset # (will merge the change to file3.txt but still not the deletions)
posted by russm at 1:27 AM on September 25, 2013 [4 favorites]


Thanks russm, that does make it easier! I never knew about "--strategy ours". There's always something you've yet to discover in git-land!
posted by vasi at 11:22 AM on September 25, 2013


Response by poster: I think that russm's answer sounds like the best one, and I'll try it out tonight to be sure but can anybody suggest a good source for learning more about merge strategies? I'm not sure I totally understand the concept.
posted by dylan_k at 2:13 PM on September 30, 2013


« Older Why did this weird sensation happen?   |   Chicago activities with kids... Newer »
This thread is closed to new comments.