Wednesday, September 18, 2013

Fun with first parent history

If your history is cleanly maintained, the output from "git log --first-parent" will consist only of merges of completed topics and trivially correct updates made directly on top of it. It will give you a birds-eye view that shows what features and fixes are made during given period without going into too much details. A history, each of whose merge shows work done for a specific topic (theme, purpose, objective; use whatever word you prefer) into it, means that whoever made these merges is the integrator, the keeper of the main history. The first-parent view of the history is useful only when the keeper of the main history takes good care of the main history.

People who use the central repository workflow where there is a single repository used for everybody to fetch from and push to complain that "git pull" they do merges the history taken from their central repository into their own development history and the merge is made in the wrong direction. They often wish for an option to flip the order of parents around for this reason, but they do not realize that a first-parent-clean history needs a lot more than that.

When you are using the "central shared repository" workflow, if you had and used such an option to flip the heads of a merge to record what you have done so far as a side branch of what everybody else did, the first-parent view would make a bit more sense than what you currently get. For example, if you worked on a specific topic that required six individual commits to complete since you forked from the mainline, your history in your repository and the project's main history in the central repository may look like this:

     x---x---x---x---x---x     Your history
    /
---X---o---o---o---o---o       Project's history

If you try to "git push" at this point, it will stop you, lest you lose these commits represented with o by overwriting the history. Git will tell you to first integrate the project's history with yours with "git pull", but if you actually pull to merge, the commits x will form the first-parent chain of the resulting merge, and the sequence of commits (most likely, merges of topics unrelated to each other) o will appear as its side branch:

     x---x---x---x---x---x---M     Your history
    /                       /
---X---o---o---o---o---o----       Project's history

This is bad, and "flip the order of parents" may help to produce a history of this shape instead:

     x---x---x---x---x---x     Your history
    /                     \
---X---o---o---o---o---o---M   Project's history

However, there is another half of the problem that is not solved by such an option. People, especially those who work with the centralized workflow, tend to pull too often, just to catch up. Even with such a "flip the order of parents" option, what they would end up with in reality would often look more like this:

     x---x   x---x---x   x     Your history
    /     \ /         \ / \
---X---o---M---o---o---M---M   Project's history

The result fragments otherwise a logical and clean "single strand of pearls to fully address the issue, consisting of 6 commits", into three separate and seemingly unrelated pieces. Imagine that other people are working the same way, and the commits marked with o are merges of side branches they add their half-way work to the main history similar to what happened in the illustration above. You would get this history:

     x---x   x---x---x   x     Your history
    /     \ /         \ / \
---X---M---M---M---M---M---M   Project's history
      / \     / \ /
  ---y   y---y   y             Your colleague's history

Now, in "git log --first-parent" of the project's mainline history, there is nothing that links these six commits marked with x together and differentiates them from commits marked with y, and there is nothing that groups these M (merges) that pull in your disjoint steps to achieve a single goal and separates from other merges. Unless people stop doing that too many "pull"s that are used only to "catch up", even with the "flip the parents of a merge" option, you will not get a history that yields a good first-parent view.

As I wrote in an earlier entry (Fun with various workflows), when you "pull" and then "push" to the central repository, you are playing the role of the integrator, the keeper of the main history, and you are responsible for taking a good care of it yourself. If you make a 2+3+1=6 mess as depicted in the last illustration above, you are failing to do so. People who later read "git log --first-parent" would not be able to see that these six commits you did were to achieve a single coherent goal and should be read together to understand it.

One obvious way to solve it is to use a topic branch workflow, and you do a "git pull" from the shared repository while you are on your 'master', which is free of your 'x's until that 6-commit series is complete and ready. Then you locally merge that topic branch to your 'master' and push it back for everybody to see, which will give you the third picture in this message.

Incidentally, by doing so, you do not need the "flip the order of parents" option, either.