Wednesday, February 29, 2012

An Experimental Git Bug Tracker

Historically we in the Git development community never felt a need for a formal bug/issue tracker for various reasons, among which the most cited one is that a bug tracker will soon become a mere nuisance because it is bound to be left uncurated, accumulating duplicate, invalid, or already fixed issues.

Compared to the downside of a typical bug tracker, the current system of sending bug reports via e-mail to the mailing list has worked fairly well for us. If the issue is invalid (e.g. a user error), responses come in the form of education, from which other people on the mailing list can also benefit (as opposed to being buried in an entry marked as INVALID in the bug tracker nobody reads because it is full of cruft). If the issue is real and important (e.g. affects many people), the discussion thread becomes large and will inevitably get attention of the developers, and interactive back and forth that is necessary in order to ask for more details and to discuss the best solution can be done on the mailing list, i.e. in the same communication channel the developers who can fix the issue are already on. Even if the issue is real, if it is not important enough for the original reporter (or other users) to nag the list with a simple "Has anything happened to this issue?" message, it can be safely forgotten and everybody can move on.

An old collection of messages from the kernel mailing list also supports this view.

But the thing is, nobody seriously has tried an alternative of actually trying to set up, maintain and curate an instance of a bug/issue tracker for real to be used for Git development. From time to time, somebody new to the community comes and asks "do we have a bug tracker?" because a bug tracker has never failed for us. We haven't even tried. Every time the question is asked, the answer is the same: "Nobody has volunteered to actually do the work, and that is why we don't have one. Are you volunteering?"

So far.

Today, we had another round of the same discussion, with the same answer. Then a fellow by the name of Andrew Ardill stepped up and offered to set up a JIRA instance here. It is not (at least not yet) the official tracker for Git development, but it may become one some day, if it is maintained and curated properly.

Hurray!

Andrew says that you would need to register an account and ask him to be approved to edit the contents of the issue tracker, but for a browse-only access you shouldn't have to even register.

Right now, the system seems to be empty and it is very understandable, at least to me. It was set up only hours ago, and besides, there is no bug in Git ;-)

Nah, the last part I am only kidding.

Let's see how this experiment goes. We wouldn't know how effective this thing will be until we try it for at least a few months.

And thanks again, Andrew!


Sunday, February 26, 2012

Git 1.7.8.5

While busily preparing for the next release Git 1.7.10, and preparing users for upcoming new features with a few blog posts, I didn't forget people who are for whatever reason stuck in the past.

Fixes to a couple of old problems are applied to an old maitenance track and 1.7.8.5 is out.

The same fixes also appear in the 'maint' and 'master' branch, so if you are running Git from either 'maint' (which is slightly newer than 1.7.9.2 and will become 1.7.9.3) or 'master' (which will become 1.7.10 someday), you do not have to pay attention too much attention to this maintenance update ;-)

Saturday, February 25, 2012

Updates to "git merge" in upcoming 1.7.10

The previous entry about a potential incompatible update to the behavior of the git merge command may have been unnecessarily alarming for casual readers and it deserves a bit of clarification. The update is designed to primarily target the interactive use, and not to negatively affect typical uses of the command in scripts.

Let's look at a few example use cases of the git merge command in end user scripts, and how well these scripts will work with the upcoming version.

(1) Your script may be designed to be run by your end users (e.g. it may be an implementation of the git my-merge command) to implement extra checks that are specific to your project or corporate environment before invoking the actual merge, perhaps like this:

#!/bin/sh
# Make sure you do not have uncommitted changes
test 0 = $( ( 
  git diff --name-only
  git diff --cached --name-only
 ) | wc -l ) &&
# Perhaps other project specific checks here...
# ... and finally
git merge "$@"

There is no need to update the above script in preparation for 1.7.10. The script is directly facing the end user, and with 1.7.10, the user can take advantage of the updated behavior that automatically opens the editor to make it easy for him to explain the merge. Also note that in a user-facing script like this, the underlying git merge command is typically passed the command line arguments given by the end user to the script (notice the use of "$@" in the above example). The end user can pass the --no-edit option to the script if he wants to silently conclude the merge.


(2) You may be responsible for the overall QA after receiving various topics from your contributors and co-workers, and may be using a script like the following to regularly merge many topic branches into an integration branch for testing:


#!/bin/sh
if test -f topics
then 
  # Prepare topics to be integrated just once
  list-topics >topics &&
  # And checkout the tip of the master to integrate them all
  git checkout -B test master || exit
fi
while read topic
do
  git merge "$topic" || exit
done <topics
# All done
rm -f topics

In such a case, you do not have to explain each and every such merge to rebuild the test integration branch, and you do not want the change in 1.7.10 to affect the above script. 


And we didn't want to break such a script, either. Notice that the git merge command in this example is not facing the end userits standard input is connected to the file that lists the topics, from which the surrounding while loop is reading, and not to the user's terminal. Because of this, the command will not automatically open an editor to ask your user to explain the merge, and there is no need to worry about the compatibility.


(3) Your project may run integration test every time its members update the master branch of the central shared repository, and as a part of the integration test, it may merge the updated master to the test repository that has mock data for testing. Such a script may look like this:

#!/bin/sh
# Called from the post-update hook of the central repository
unset GIT_DIR
cd /srv/test.repo &&
git pull /srv/central.repo master || {
  logger "The updated master does not cleanly merge -- no automated test done"
  git reset --hard exit 1}
make test || {
  logger "The updated master does not pass test"
  git reset --hard
  exit 1
}


If you run such a script interactively, the git pull command will invoke the underlying git merge command interactively, which in turn will open the editor and ask you to explain the merge. But if it is run from a post-update hook, the standard input is not likely to be connected to any interactive terminal (this is also true if such an automated merge is done as part of a continuous integration testing). Hence, in practice, 1.7.10 will not change the behaviour of the git merge command used in such a script, and there is no need to worry about the compatibility.

(4) If your script uses git merge, is run interactively by the end user, and it keeps the standard input to the git merge command connected to the terminal, but for whatever reason you do not need your users to explain the merge, you do need to worry about the compatibility. You could add the --no-edit option to the git merge command invocations in your script as necessary, but there is a quicker way:

#!/bin/sh
GIT_MERGE_AUTOEDIT=no
export GIT_MERGE_AUTOEDIT
# whatever your script did originally
...
git merge foo
git merge bar
...

You can set GIT_MERGE_AUTOEDIT environment variable to no at the beginning of your script, and all the git merge command invocations will work as if they were given the --no-edit option.

Hopefully it is now clearer that there is not much to fear in the update to the git merge command planned in the upcoming 1.7.10 release.

Happy Gitting!

Thursday, February 23, 2012

Anticipating Git 1.7.10

According to the Git Calendar, we still have a few more weeks until the feature freeze for the next release Git 1.7.10, in which there will be one backward-incompatible improvement that potentially can bite people who used the git merge command in their scripts.

[Update: the above phrasing is a bit too alarming; here is a clarification you may want to read after finishing this entry]

Following the advice given in an article at LWN.net by Jake Edge:
Most free software projects discuss planned changes well in advance of their implementation and give users lots of opportunities to try out early versions. But engaging the project is best done with well-reasoned, specific descriptions of problems, missing features, and so on—not endless streams of "Project XYZ sucks!" messages to mailing lists or comment threads.
let's describe what we have decided, why, and how users can use the upcoming release in their work.

Traditionally, when the git merge command attempted to merge two or more histories, it prepared a canned commit log message based on what is being merged, and recorded the result in a merge commit without any user intervention, if the automatic merge logic successfully computed the resulting tree without conflict. When the automatic logic did not manage to come up with a clean merge result, it gave up, leaving the conflicted state in the index and in the working tree files, and asked the user to resolve them and run the git commit command to record the outcome.

Most merges do cleanly resolve, and this behavior resulted in people making their merges too easily and lightly, even when the reason why the merge was made in the first place should be explained. Nobody explained why the merge was made in a merge commit, because in order to do so after git merge already made the commit, they have to go back and run git commit --amend to do so.

Recently in a discussion on the Git mailing list, Linus admitted (and I agreed) that this was one of the design mistakes we made early in the history of Git. And in 1.7.10 and later, the git merge command that is run in an interactive session (i.e. both its standard input and its standard output connected to a terminal) will open an editor before creating a commit to record the merge result, to give the user a chance to explain the merge, just like the git commit command the user runs after resolving a conflicted merge already does.

There are two recommendations we give to end users to adjust to this behaviour change.

(1) When using git merge interactively, there are two cases:
  • Merging updated upstream into your work-in-progress topic without having a good reason is generally considered a bad practice. Such a merge in the wrong direction should be done only when it is absolutely necessary, e.g. your work-in-progress needs to take advantage of recent advancement made on the upstream.  Otherwise, your topic branch will stop being about any particular topic but just a garbage heap that absorbs commits from many sources, both from you to work on a specific goal, and also from the upstream that contains work by others made for random other unfocused purposes. So if you are merging from upstream into your topic branch, you can use the command without any option to take advantage of the new behavior, let it open an editor for you, and justify the merge. You no longer have to amend the commit after you made the merge to explain it.
  • When you are merging your topic into your own integration (or testing) branch, on the other hand, such a merge is often self-explanatory.  The topic is fully cooked, and ready to be pushed out, and that is the reason why you are merging. In such a case, it will be sufficient to run the command with the --no-edit option and accept the canned commit log message without editing.
(2) If you have a script that runs git merge, and you left its standard input and output kept connected to the terminal, it may start to ask whoever is running the script to edit and explain the merge. This may or may not be a desirable thing depending on the reason why the script is making a merge.  Often, scripts are used to merge many branches into temporary testing branch in bulk, and want to run unattended if they merge cleanly. In such a case, you do not want to change the old behavior of the script. You do not have to add --no-edit to all your invocations of the git merge command to update such a script. Instead, you can export GIT_MERGE_AUTOEDIT=no at the beginning of your script, and the git merge command will silently make commits when there is no conflict.

Please try out the updated git merge before 1.7.10 release is tagged from the master branch, and adjust your scripts and work habit as needed. Also, please suggest improvements to our development mailing list, git@vger.kernel.org, regarding:
  • the documentation,
  • error, advice and general messages output from the program,
  • logic to detect the interactiveness.
One thing that is not open to discussion is that the merge command will launch the editor when interactive by default. This will not change, because, as Linus says, the default matters, and the behavior we had so far was a bad default.

Finally, spread the word so that other users can start adjusting their use of the command early.

[EDIT]

Linus has this to say in his re-share:

This change hopefully makes people write merge messages to explain their merges, and maybe even decide not to merge at all when it's not necessary. 
I've been using that git feature for the last few weeks now, and it has resulted in my merges from submaintainers having various notes in them (well, at least if the submainter gave me any). So I'm trying to lead by example.
But if you don't like explaining your merges, this might be annoying. Of course, if you don't explain your merges, you are annoying, so it all evens out in the end. "Karmic balance", so to say.
Note that this new feature will not be in 1.7.9.3 release.

Tuesday, February 14, 2012

Git 1.7.9.1

I just tagged and pushed out the first maintenance release to the 1.7.9 released earlier.

It also contains fixes to older bugs and misfeatures, but this release, from my point of view, is primarily to fix user experience kinks in new features introduced in the 1.7.9 release, namely:

  • Typo in "git branch --edit-description my-tpoic" was not diagnosed.
  • "git merge --no-edit $tag" failed to honor the --no-edit option.
  • "git merge --ff-only $tag" failed because it cannot record the required mergetag without creating a merge, but this is so common operation for branch that is used only to follow the upstream, so it was changed to allow fast-forwarding without recording the mergetag.
  • When asking for a tag to be pulled, "request-pull" did not show the name of the tag prefixed with "tags/", which would have helped older clients.

Hopefully with this maintenance release, there no longer is a reason (or excuse) for users to stay at a version older than 1.7.9 release.

Sunday, February 12, 2012

The Double Helix by James D. Watson


This is a classic.

The scientific race between James D. Watson (the author) and Francis Crick at Cambridge vs Linus Pauling at Cal Tech to solve the structure of DNA is vividly described. The book was written not as an objective science history, but as a record of what the author thought, felt and experienced in the midst of that race, and begins its preface with this:
Here I relate my version of how the structure of DNA was discovered.
Because the author was in the centre of this adventure, there is no other way for him to tell his story than as a personal recollection. Nobody can be an objective third-person observer and reporter of important events around himself that changed the world. And because the book is written from that perspective, the author's adrenaline rush during the fierce competition feels even more real to the readers.

After examining the draft of a paper sent to Peter Pauling (Linus's son, who was then at Cambridge) from their competitor, Linus Pauling, in which Linus described his solution to the puzzle of DNA structure, the Cambridge group concludes that Linus's solution cannot possibly be correct, and congratulates that they haven't lost their race yet. Then:
On our way to Soho for supper I returned to the problem of Linus, emphasizing that smiling too long over his mistake might be fatal. The position would be far safer if Pauling had been merely wrong instead of looking like a fool. Soon, if not already, he would be at it day and night.
When I read this passage, this somehow reminded me of the excitement and tense sense of competition I felt during the early days of Git development. Of course, I was competing with the other  Linus (Torvalds, who is known for his Linux operating system, originally wrote Git and was actively developing it with many other brilliant software developers in collaboration) back then.

When there was an issue to solve, everybody rushed to present his own bright idea, and it was a race to show a clean, clever and useful solution to improve the system. When other guys went in a wrong direction and wasted their time, you had more time to polish your work and beat them to your better solution.

I do not think that the similarity between the way how the scientific race and the open source race work stops there. Even though the participants all want the glory of being the first to reach the right solution, at the highest level, everybody is working collectively towards the same goal, be it the advancement of their scientific field, or the improved user experience of their software. The subtle balance between competition and collaboration is the same in both endeavours.

The book depicts Maurice Wilkins of King's College, the other scientist who shared the Nobel with Watson and Crick, as somebody who had access to good X-ray crystallography data that eventually helped the discovery by Watson and Crick, but didn't solve the puzzle himself even though he was an expert in the field of DNA research.

Given that the way Watson's book is written from his own perspective, I suspect that the aptly titled book The Third Man of the Double Helix : An Autobiography by Wilkins himself is a must-read for anybody who reads this book to see both sides of the coin. It is already on my "To Read" list.

The book was a very satisfying read and I really enjoyed it. It was given by a happy Git user Ben (thanks!) as a present to me the other day, picked from my Amazon wishlist.