End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Git Submodules: What is the Ideal Workflow?

Last week, I asked some coworkers at End Point about the normal workflow for using git submodules. Brian responded and the discussion turned into an overview on git submodules. I reorganized the content to be presented in a FAQ format:

How do you get started with git submodules?

You should use git submodule add to add a new submodule. So for example you would issue the commands:

git submodule add git://github.com/stephskardal/extension1.git extension
git submodule init

Then you would git add extension (the path of the submodule installation), git commit.

What does the initial setup of a submodule look like?

The super project repo stores a .gitmodules file. A sample:

[submodule "extension1"]
        path = extension
        url = git://github.com/stephskardal/extension1.git
[submodule "extension2"]
        path = extension_two
        url = git://github.com/stephskardal/extension2.git

When you have submodules in a project, do you have to separately clone them from the master project, or does the initial checkout take care of that recursively for you?

Generally, you will issue the commands below when you clone a super project repository. These commands will "install" the submodule under the main repository.

git submodule init
git submodule update

How do you update a git submodule repository?

Given an existing git project in the "project" directory, and a git submodule extension1 in the the extension directory:

First, a status check on the main project:

~/project> git status
# On branch master
nothing to commit (working directory clean)

Next, a status check on the git submodule:

~/project> cd extension/
~/project/extension> git status
# Not currently on any branch.
nothing to commit (working directory clean)

Next, an update of the extension:

~/project/extension> git fetch
remote: Counting objects: 30, done.
remote: Compressing objects: 100% (18/18), done.
remote: Total 19 (delta 9), reused 0 (delta 0)
Unpacking objects: 100% (19/19), done.
From git://github.com/stephskardal/extension1
   0f0b76b..9cbb6bd  master     -> origin/master

~/project/extension> git checkout master
Previous HEAD position was 0f0b76b... Added before_filter to base controller.
Switched to branch "master"
Your branch is behind 'origin/master' by 5 commits, and can be fast-forwarded.

~/project/extension> git merge origin/master
Updating f95a2d5..9cbb6bd
Fast forward
 extension.rb                                    |   10 +
 README                                             |   36 +
 TODO                                               |   11 +-
...

~/project/extension> git status
# On branch master
nothing to commit (working directory clean)

Next, back to the main project:

~/project/extension> cd ..
~/project> git status
# On branch master
# Changed but not updated:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working directory)
#
#       modified:   extension
#
no changes added to commit (use "git add" and/or "git commit -a")

Now, a commit to include the submodule repository change. Brian has made it a convention to manually include SUBMODULE UPDATE: extension_name in the commit message to inform other developers that a submodule update is required.

~/project> git add extension
~/project> git commit
[master eba52d5] SUBMODULE UPDATE: extension
 1 files changed, 1 insertions(+), 1 deletions(-)

What does git store internally to track the submodule? The HEAD position? That would seem to be the minimal information needed to tie the specific submodule-tracked version with the version used in the superproject.

It stores a specific commit SHA1 so even if HEAD moves the super project's "reference" doesn't, which is why updating to the upstream version must be followed by a commit so that the super project is "pinned" to the same commit across repos. You'll see in the example above that the submodule project was in a detached head state (not on a branch) so HEAD doesn't really make sense.

It is critical that the super project repo store an exact position for the submodule otherwise you would not be able to associate your own code with a particular version of a submodule and ensure that a given submodule is at the same position across repos. For instance, if you updated to an upgraded version of a submodule and committed it not realizing that it broke your own code, you can check out a previous spot in the repository where the code worked with the submodule.

Hopefully, this discussion on git submodules begins to show how powerful git and submodules can be for making it easy for non-core developers to start sharing their code on an open source project.

Thanks to Brian Miller and David Christensen for contributing the content for this post! I reference this article in my article on Software Development with Spree - I've found it very useful to use git submodules to install several Spree extensions on recent projects. The Spree extension community has a few valuable extensions including that introduce features such as product reviews, faq, blog organization, static pages, and multi-domain setup.

7 comments:

Adam Vollrath said...

The 1.7.1 Release Notes describe a few improvements to submodule handling:

* "git diff --submodule" notices and describes dirty submodules.

* "git status" notices and describes dirty submodules.

petterb said...

Great post!

Thanks for making this!

LeenaOfDalaran said...
This comment has been removed by the author.
Matt Petrovic said...

When you add a submodule, you need to git add the .gitmodules file as well. That's where all the information about submodules is stored, and if you don't commmit it, any repositories pulling from your's won't know that it's a submodule.

Thanks for the otherwise helpful post, though. Really helped me in figuring out how to work with git's submodules.

hodaddy said...

I have projects that share a common library as a submodule. When I make any changes to a project, I always create a git branch to do my work in. After testing and given another set of eyes, I merge the branch back into master which is always ready to deploy. Sometimes I need to make changes to the submodule library as well. Would the best practice be to first branch the main project and then go into the submodule and branch it also? I realize then I would need to test the modified submodule branch in the other projects where used.

Brian J. Miller said...

hodaddy...

I don't think there is necessarily a "best practice" here. If you are comfortable working with git's branches then that same workflow will continue to work at the submodule level. After doing a submodule update the submodule itself is in a detached head state (IOW, not on any branch) at a specific commit recorded in the superproject so the branch that that commit was developed on is of no bearing as long as it is in the repo somewhere. The only gotcha to be aware of is that you'll need to make sure to checkout to an appropriate branch before doing further development, but that is always the case regardless of whether you are working on "master" or some other branch.

rofro said...

Now there is easier way to clone repo:

git clone --recursive git://github.com/foo/bar.git

If your submodule was added in a branch be sure to include it in your clone command...

git clone -b --recursive



http://stackoverflow.com/questions/3796927/how-to-git-clone-including-submodules