Wednesday, August 31, 2011

How to inject a malicious commit to a Git repository (or not)

[Note: there are follow-up articles here and there]

[Note: some site seems to have misreported that I outlined how one can forge a history stored in Git here, but the point of this article is how impractical and unrealistic it is for anybody to do so without letting other people take notice.]

Suppose if you momentarily gained write access to other people's public repositories at a large distribution point, such as kernel.org. What damage can you inflict on their projects if you wanted to?

You could create a malicious commit on top of the tip of "master" branch of linux.git repository of Linus Torvalds. Nobody prevents you from pretending that you are Linus:

$ GIT_AUTHOR_NAME="Linus Torvalds" \
  GIT_COMMITTER_NAME="Linus Torvalds" \
  GIT_AUTHOR_EMAIL=torvalds@linux-foundation.org \
  GIT_COMMITTER_EMAIL=torvalds@linux-foundation.org \
  git commit -s


Your English may be good enough to fool readers into believing that the log message may have come from Linus himself. Perhaps you may have done this around August 12th, when the tip of Linus's true "master" branch was commit M and X is the malicious commit you created on top of it. The resulting history may look like this:

--M
   \
    X

If an unsuspecting victim pulls regularly from Linus's repository, he may run a git pull before your malicious commit is discovered in security audit. And he may have already based his derivative product based on this malicious version of the kernel.

Is this a big "Oops"? We'll see what happens to this unsuspecting victim later.

When Linus tries to upload his updated work, however, the history on his development machine (which is not the distribution point you managed to add your malicious commit) does not have your commit X. In Git terms, the history you tweaked and the history Linus has now diverged:


--M---o---o---o---o---o---o---o---L
   \
    X

where M is the original tip of the "master" branch at the public repository, X is the malicious commit you created and updated the "master" branch to point at, and L is the tip of the history Linus is about to upload. We say "L does not fast-forward to X", as X is not part of L (time flows from left to right).

What happens now is that "git push" Linus runs to upload to his public repository notices that updating the "master" branch at the public repository with the tip of his history will lose commit X you created (it does not notice that the commit that is about to be lost is a malicious one, nor does it notice it was not made by Linus, but it does not have to notice either at all for this protection to work), and refuses to do so. Linus would definitely notice that something fishy is going on, because he needs to do something he usually never does to push his changes as his next step.

If this were a shared repository setting, Linus may say "Ah, somebody else beat me to it", then runs "git pull" to merge work by other people who share the same public repository (i.e. you) to his tree to create a merge commit Y, and then pushes the result again:


--M---o---o---o---o---o---o---o---L
   \                               \
    X-------------------------------Y


In the end, your malicious commit X could end up in the resulting history this way, provided if he does such a merge, and if he does not inspect the merge Y.

But Linus (or any kernel people with publishing repositories at kernel.org in general) does not work using a shared repository with other people to begin with. The repository at kernel.org is his publishing repository and his alone, so you cannot sneak your malicious commit into his history through this avenue.

Linus could choose to be careless and force his push, without bothering to investigate why his push does not fast-forward (in real life, this is not going to happen, but for the sake of mental exercise, imagine that he chose to be careless and let's see what happens). This will eliminate your malicious commit from his public repository. If he did so, the repository would look like this:


--M---o---o---o---o---o---o---o---L

Your malicious commit X would not have any effect to people who pulled from Linus's public repository after this happens, but what about the unsuspecting victim who pulled X before Linus forced this push? Is he contaminated with your malicious commit and will not notice it forever?

Remember, as far as he is concerned, Linus's history he pulled earlier, which is kept in his origin/master remote tracking branch, was X, and then it is being updated to L, which does not fast-forward. His "git pull" (actually it is "git fetch" that is invoked as part of "pull") will notice and would report:

From git://git.kernel.org/.../torvalds/linux.git/
 + 9d901d9...ad4d968 master     -> origin/master  (forced update)


Notice "forced update"? The unsuspecting victim can notice that the side branch lead to X is no longer part of Linus's history.

One security tip I would offer here is this. If you know that your upstream (in this illustration, Linus) never rewinds his history, you can tweak your .git/config file (open it with your favorite $EDITOR, it is a simple text file and is designed to be editable by hand) and drop the '+' sign from the "fetch" line. Find a line that looks like this:

[remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*


And edit it to make it look like this:

[remote "origin"]
        fetch = refs/heads/*:refs/remotes/origin/*


This will make your "git pull" (again, it is actually "git fetch" that is invoked from the command) to fail when the upstream rewound the history, like this. You will see that the command fails like so when you pull from Linus:

From git://git.kernel.org/.../torvalds/linux.git/
 ! [rejected]        master     -> origin/master  (non-fast-forward)


We might want to revisit the default settings "git clone" leaves in your new repository to make it harder for upstreams to rewind their branches by dropping the '+' (which means "allow non-fast-forward), but that will have to be discussed on the Git mailing list (git@vger.kernel.org), not in this blog post. There is a reason we didn't make it default to insist on fast-forwardness.

By the way, it does not make an iota of difference to the above story if you rewrote the commits that lead to M (i.e. the old tip of the "master" branch of Linus's history) using "rebase" or "commit --amend". The only difference is that such a change will move the fork point of the diverged histories from M (in the above story) further back to a different commit that is older than M in the ancestry chain. The history Linus will try to push to his public repository L will not fast-forward to the commit you place at the tip of the "master" branch that contains your malicious version, and that is the only thing that matters.


Wednesday, August 24, 2011

1.7.6.1 is out

Git 1.7.6.1 is out with 88 small fixes from 29 people.

Git v1.7.6.1 Release Notes
==========================

Fixes since v1.7.6
------------------

 * Various codepaths that invoked zlib deflate/inflate assumed that these
   functions can compress or uncompress more than 4GB data in one call on
   platforms with 64-bit long, which has been corrected.

 * "git unexecutable" reported that "unexecutable" was not found, even
   though the actual error was that "unexecutable" was found but did
   not have a proper she-bang line to be executed.

 * Error exits from $PAGER were silently ignored.

 * "git checkout -b <branch>" was confused when attempting to create a
   branch whose name ends with "-g" followed by hexadecimal digits,
   and refused to work.

 * "git checkout -b <branch>" sometimes wrote a bogus reflog entry,
   causing later "git checkout -" to fail.

 * "git diff --cc" learned to correctly ignore binary files.

 * "git diff -c/--cc" mishandled a deletion that resolves a conflict, and
   looked in the working tree instead.

 * "git fast-export" forgot to quote pathnames with unsafe characters
   in its output.

 * "git fetch" over smart-http transport used to abort when the
   repository was updated between the initial connection and the
   subsequent object transfer.

 * "git fetch" did not recurse into submodules in subdirectories.

 * "git ls-tree" did not error out when asked to show a corrupt tree.

 * "git pull" without any argument left an extra whitespace after the
   command name in its reflog.

 * "git push --quiet" was not really quiet.

 * "git rebase -i -p" incorrectly dropped commits from side branches.

 * "git reset [<commit>] paths..." did not reset the index entry correctly
   for unmerged paths.

 * "git submodule add" did not allow a relative repository path when
   the superproject did not have any default remote url.

 * "git submodule foreach" failed to correctly give the standard input to
   the user-supplied command it invoked.

 * submodules that the user has never showed interest in by running
   "git submodule init" was incorrectly marked as interesting by "git
   submodule sync".

 * "git submodule update --quiet" was not really quiet.

  * "git tag -l <glob>..." did not take multiple glob patterns from the
   command line.

Wednesday, August 17, 2011

Didn't I already say I am no longer a youngster?

Now the k.org machine(s) seem to be getting updated, it is time for me to update a set of VMs I keep to build-test Git and cut RPM packages for their use.

Prepared an empty VM and installed FC14 (last time I somehow got an impression that they only use odd-numbered releases at k.org, so I had a spare FC15 prepared and have been practicing RPM generation on it, although I never deployed the packages anywhere).

  • Chose "Software Development" target (earlier in the day I tried "Minimum" but I had too many troubles configuring it);
  • Use fixed network configuration - make it available at boot and for everybody to prevent network manager from getting in the way;
  • Add myself as a user, with UID/GID that match what I use on the main machine;
  • Added entries for /home and /git NFS mountpoints in /etc/fstab, like so:
    mothership:/home /home nfs defaults,noatime 0 0
    mothership:/git /git nfs defaults,noatime 0 0
  • Disabled SELinux by editing /etc/sysconfig/selinux and saying SELINUX=disabled there;
  • Disabled X by editing /etc/inittab and saying id:3:initdefault: there.
That got me a basic working environment. It seems that it went a lot smoother than FC15 which I wrote about earlier. Then added:
  • screen
  • redhat-lsb (needed for /usr/bin/lsb_release)
It wants to use ccache and wants to put temporary in $HOME! Sheesh - caching over NFS? With this:

 $ yum remove ccache.i686

regular build starts working. Documentation build needs a few more packages:
  • asciidoc
  • xmlto
I choose not to install perl-SVN-Simple package so that I don't have to spend time running the git-svn tests on this VM.