Thursday, June 20, 2013

Fun with various workflows (1)

Even though Git is distributed, you can still use it for projects that employ the centralized workflow, where there is a single central shared repository. Everybody pulls from it to obtain everybody else's work, and after integrating his own work with others' work, everybody pushes into it so that everybody else can enjoy the fruit of his work.

In the simplest workflow, you can start by cloning from the central repository:
  $ git clone our.site.xz:/pub/repo/project.git myproject
and the myproject directory becomes your working area, where you will have the standard configuration, perhaps not very different from this:
  [remote "origin"]
    url = our.site.xz:/pub/repo/project.git
    fetch = +refs/heads/*:refs/remotes/origin/*
  [branch "master"]
    remote = origin
    merge = refs/heads/master
and your "master" branch, which was copied from the "master" branch of the central shared repository, is ready for you to build your work on it.

If you run "git pull --rebase" (without any other argument), the configuration above left for you by "git clone" will tell Git that you would want to obtain the latest work from the central shared repository, and you would want to rebase your own work on top of their master branch.

If you say "git push" (without any other argument), the current default mode of pushing is to look at your local branches, and look at the branches the repository you are pushing to has, and update the matching branches. In this "simplest" case, you only have the 'master' branch, and the central repository does have its 'master' branch, so you will update its 'master' branch with the work you did on your 'master' branch.

In Git 2.0, this default mode will change to 'simple', which will push only the current branch to the branch at the central repository you integrate with, but only when they have the same name (so the example of working on 'master' and pushing it back to 'master' will still work).

If your project employs the centralized workflow, after learning Git enough to be comfortable with it, you may want to do
  $ git config push.default upstream
to choose to always update the branch at the central repository you integrate with, even if the branch names are different.  Note that you can do this (or use 'simple' instead of 'upstream'), and indeed you are encouraged to do so, without waiting for Git 2.0.

That will allow you to work on different things on different branches, e.g.
  $ git checkout -b my-feature -t origin/master
  $ git push
The first "checkout" will create a new "my-feature" branch, that is set to integrate with the master branch from your central repository. When using the upstream mode, you will push "my-feature" back to update the "master" branch over there.

An interesting thing to notice is that in the centralized workflow, because there is no central project maintainer (aka integrator), everybody is responsible for integrating his own work to advance the mainline of the project. The job of integration is indeed distributed when you use centralized workflow. It is a bit funny when you think about it.

But that is exactly why the upstream mode makes sense. In order to fully appreciate it, you need to realize what it means to have forked the "my-feature" branch out of the "master" branch of the central shared repository.

The purpose of the master branch at the shared central repository is to advance the state of the project in general, but the purpose of your local branch, my-feature, is a lot more specialized. It may be to fix this small bug, or add that neat feature. You would only be working on a small part of the project while on that branch.

But because you are the one who plays the top-level integrator role when you run "git pull --rebase" just before you "git push", when that "git pull --rebase" finishes, the tip of your my-feature branch is no longer about your small fix or neat feature. It temporarily becomes about advancing the state of the overall project. And that is the reason you would "git push" it to update the master branch, not the "my-feature" branch, at the central repository. Of course, if you want to publish it as "my-feature", perhaps because you want to show it to others before really updating the shared master branch, you can explicitly say:
  $ git push origin my-feature
Pushing my-feature that was forked from and still integrates with their master is not usually what you want to do every time in the centralized workflow, though. In fact, it often is the case that administrators of a project with centralized workflow flown upon people making random branches at their shared central repository willy-nilly (exactly because the central shared repository is a common resource and a feature branch like "my-branch" is often not of general interest).

Common things require less typing, and uncommon things are possible but you need to explicitly tell Git to do so.

The Git core itself is very much agnostic to what workflow you use, and you can also use it for projects that use "I publish my work to my public repository, others interested in my work can pull my work from there, and there is an integrator who pulls and consolidates good work from others and publishes the aggregated whole" distributed workflow. That will be a topic for a separate post.