Nikolaus Rath's Website

Mercurial for Git Users (and vice versa)

Until a few weeks ago, the only version control system that I used regularly (and was reasonably proficient in) was Mercurial. This changed when I took over maintainership of libfuse and sshfs at the end of 2014. Both are maintained in Git and have plenty of forks, so converting the repositories to Mercurial would have been silly. For a while, I tried to use the Hg-Git extension that lets you access a Git repository with Mercurial. However, I found that not to be working very well for more complex use-cases. So I finally bit the bullet and learned to use Git. Here is what I learned.

(To make search engines happy, the title of this article should really have been Mercurial for Git users and Git for Mercurial users. Unfortunately that does not sound particularly witty, so I am mentioning Git for Mercurial users a few times in this paragraph instead. Hopefully, this will make this post show up in searches for both phrases).

Fundamental representation

Both Git and Mercurial maintain a directed, acyclic graph of commits (the "commit DAG"). For Mercurial, this is the fundamental representation of the repository. For Git, there is a lower-level layer (called the "plumbing layer", more on that below). However, on the layer of the DAG the two systems are quite similar. A commit consists of a description of changes made to files in the repository and some associated metadata (like author, date, or a message describing the changes). Every node in the commit DAG represents a commit and has a unique id that depends on both the changes made in the commit and the previous state of the repository (the "hash"). Pulling and pushing to other repositories means effectively the exchange of DAG nodes.

The first area where Git and Mercurial differ (and where a lot of confusion can arise) is how elements of the DAG are called. Both systems allow to assign user-defined names to specific nodes which are called tags - but this is where the similarities end.

Mercurial commits have ephemeral handles

In Mercurial, every commit has, in addition to its hash, a unique numerical id that is valid until the next non-trivial (i.e., anything but a simple commit with one ancestor) change. This number is obtained by simply enumerating the id's starting from the root on a "best effort" basis. Typically, successive commits have successive numerical ids. This is especially handy in interactive use (e.g. hg log followed by hg diff), but comes with the risk of accidentally re-using a numeric id after it is no longer valid (or, worse, refers to a different commit). In Git, the only identifier that every commit has is it's alphanumerical hash - but luckily, the hash can be abbreviated as long as it remains unique within the repository.

Git branches are Mercurial bookmarks

In Git, a branch is a pointer to a node of the DAG with a user-defined name. In Mercurial, the same thing is called a bookmark. Technically, a bookmark and a branch are both the same thing as a tag (though living in different namespaces), but the expected usage is different: while a tag is typically assigned to one commit and stays there, bookmarks/branches generally move to different commits over time. A bookmark/branch identifies the "most recent" commit in a particular line of development. Both Git and Mercurial allow the user to declare one branch/bookmark as active, which means that every time the user commits, the bookmark/branch will be moved to from the parent commit to the freshly-created descendant.

Mercurial does not have the concept of branches - every mention of branches in the context of Mercurial actually refers to named branches.

Git does not have named branches

The concept of named branches exists only in Mercurial, and does not have an equivalent in Git. Much time has been wasted by people trying to use Mercurial's named branches like Git's branches.

Every DAG node in Mercurial belongs to exactly one named branch (which may be the default branch, in which case it is often not displayed explicitly). The branch name is part of the commit's metadata. A named branch thus refers not to a specific commit, but to a specific set of commits that may grow (but not shrink) over time. Branch names are often used to distinguish different lines of development, for example the Python Mercurial repository has named branches for each major Python release (2.7, 3.4, 3.5 etc).

Git DAG leaves must be referenced to survive

In Mercurial, leaves of the commit DAG (i.e., nodes without descendants) do not have any special status other than their name. Mercurial calls them heads, and has a hg heads command to list them (so that it is easy to switch to and pick up development from them). Mercurial allows to "close" a head, but technically this is implemented as an additional commit (which becomes the new head) that has a flag in its metadata which causes it to be ignored by commands like hg heads.

In Git, leaves of the commit DAG are only kept alive by references. A reference can be a tag, a branch, or the automatically managed "HEAD" pointer that references the commit that is checked out in the working directory. A leaf commit that is not referenced by anything will eventually be garbage-collected and disappear. For that reason, Git repositories typically have a number of branches which ensure that "important" DAG leaves do not disappear. For the same reason, Git also makes it difficult to create DAG leaves without at the same time starting a new branch. This is often confusing to Mercurial users who are used to easily making new commits starting from any commit in the DAG. Correspondingly, Git users should keep in mind that they do not need to create a bookmark (or, even worse, named branch) every time they want to start a new development line: Mercurial allows to start new heads anywhere, anytime.

Mercurial does not have annotated tags

In both Git and Mercurial, tags can principally be moved from one commit to another (called "re-tagging"). Git, however, has the additional concept of an "annotated tag". An annotated tag is not just a name assigned to a specific commit, but an object that lives in Git's plumbing layer. This object contains not just the hash of the commit and the name of the tag, but can carry additional information like a GPG signature, allowing to sign a tag (and the associated state of the repository). There is no such thing in Mercurial.

Git supports changing history on remotes

Once you push a commit to a remote Mercurial repository, there is no way back. Unless you have additional channels (like SSH access or a web interface), you cannot make a commit disappear from the remote server. This can be a source of a lot of frustration, but it also ensures that once you have cloned a repository you can be sure that every commit you have locally will also persist on the server - i.e., there is no chance that someone else will rebase some of the commits and leave you with two divergent heads on the next pull.

In Git, this is not the case. This is a consequence of the requirement for DAG leaves to be branches. If you have accidentally pushed a commit to a remote server, all you have to do is re-assign the branch names (which can be pushed as well) so that the commit is no longer reachable from any branch and it will eventually be garbage collected. You can also delete a branch completely by pointing it at a special "this does not exist" node. Either operation can be very useful and very dangerous for the reasons given above.

Git tracks upstream status in the revision graph

As mentioned before, Git requires every commit without descendants to have a branch name pointing at it. This means that when you have done (and committed) some work in your local repository, and then pull new commits from a remote repository, Git needs to ensure that both your most recent commit and the most recently pulled commit have associated branch names (the latter are called "remote tracking branches"). On the plus side, this means you can always tell which commits had not been pushed to a particular remote repository the last time you pulled from it. On the minus side, this means that you have to juggle with a lot of branches (keep in mind that you may be interacting with multiple remote repositories). To help prevent name clashes, Git allows the local branch names to be different from the branch names of the remote server, i.e. you can instruct Git to always assign the branch name "foo" to the most-recent commit in the "bar" branch on the remote server (In practice, one would obviously not use "foo" but something more informative like "origin/bar" or "remotename/bar"). Similarly, you can tell Git to assign a different branch name on the remote server when pushing commits.

Since Mercurial heads (commits without descendants) are not required to have bookmarks pointing at them, the situation is simpler. Pulling and pushing commits simply adds the commits to the respective repository without the need to create or move any bookmarks. This means that the Mercurial DAG does not tell you what commits are available in which remote repository. Instead, you can use the hg incoming and hg outgoing commands to connect to a specific repository and determine what would be pulled or pushed. The advantage of this is that in contrast to Git's remote branches, the information is always up-to-date. The drawback is that it requires a network connection to the remote.

Mercurial bookmarks can also be exchanged with remote repositories, but there is no way to set up a mapping between remote and local bookmarks: local and remote bookmarks always have the same name.

Git does not have phases

One advantage of Git's remote branches is that they can be used to determine if a given commit has already been pushed to a remote repository. However, the generality of remote branches makes this a little cumbersome: one has to examine the ancestors of every remote branch to determine if any of them contain the commit one is interested in.

While Mercurial does not have remote branches, it does have a different feature that makes answering this specific question very simple: phases. In Mercurial, every commit has an associated "phase" that's local to the repository. A commit can be either in "draft", "public" or "secret" phase. Draft commits have not yet been pushed to any remote, but will be included in the next push. Public commits have been pushed to at least one remote repository. Secret commits have not yet been pushed, and will not be included in any push until they have explicitly been marked as drafts.

Therefore, to determine if a Mercurial commit has been pushed anywhere it's sufficient to look at its phase. The primary use of phases in Mercurial is to prevent accidental modification of history: if a commit is public, Mercurial will complain very loudly if you attempt to change its history.

Git has a plumbing layer

In Mercurial, the only exposed representation of the repository is the DAG of commits. In Git, on the other hand, the DAG is actually constructed on top of a different data structure that is also exposed to the user - the so called "plumbing layer".

The plumbing layer is essentially just an object storage system that (internally) uses some techniques to efficiently store similar (or somewhat similar) objects. Every commit in the DAG is stored in a plumbing layer object. The plumbing layer does not know anything about how the different objects relate to each other.

All of the "version-control" layer commands (like commit) can be expressed as a series of plumbing-layer commands (and, in some cases, are even implemented as such). By working directly with the plumbing layer one can therefore do interesting things. For example, git-annex is a tool to store large files outside of the Git repository. To the version-control layer, all the files still appear to reside in the local repository - but on the plumbing layer, one can see that there is only a reference to the actual location of the data (which may be on an external hard disk, or on a remote server) that is resolved on-demand to retrieve the actual data. Alternatively, one can add arbitrary additional data in the object storage that is entirely ignored by the upper "version-control" layer.

Obviously, working at the plumbing layer also enables one to do all sorts of things to the DAG that the version control layer would never allow to happen - resulting in a Git repository that's valid as far as the plumbing layer is concerned, but has all sorts of inconsistencies or peculiarities when interpreted by the version control layer.

In Mercurial, none of the above is possible. The way the DAG is stored is an implementation detail, and there is only version-control layer commands to access it.

Git has a staging area

Git has something called the "staging area". Changes to a Git repository are not directly committed, but first moved to this staging area and then committed from there. The contents of the staging area are stored as objects in the plumbing layer, but are not yet part of the commit DAG. Many Git commands accept an option that tells them to act on the working directory, but that just means that they automatically first move all changes from the working directory to the staging area.

For newcomers to Git it sometimes feels appealing to entirely ignore the concept of the staging area and just use these options. That is generally not a good idea, because understanding the effect of commands (like git reset) requires understanding the staging area.

For Mercurial users, it can be helpful to think of the staging area as a mandatory head commit in "secret" phase. Adding and removing changes to the staging area corresponds to amending the secret head, and what Git calls "commit" corresponds to changing the phase of the head to draft (so now it can be pulled and pushed), and creating a new, secret head (which initially has no contents).

Git users coming to Mercurial may be used to the staging area as a way to commit only selected chunks of a file. In Mercurial, the way to do such "partial commits" is to use the --interactive option (which queries for each chunk whether it should be included in the commit). Alternatively, the TortoiseHG GUI (see below) offers an excellent GUI for both partial commits and selective stashing/unstashing of chunks.

Git does not have patch management

Mercurial comes with a build-in patch management extension called Mercurial Queues ("MQ"). MQ manages an arbitrary number of "patch queues", which each contain an ordered set of patches. A patch can either be "applied" (including its predecessors) or "unapplied". If it is applied, then it appears as a specially marked commit in the DAG that can be neither pushed nor pulled. When a patch is unapplied, it lives in a different area that is only visible to the MQ commands. MQ provides commands to "pop" (unapply) and "push" (apply) patches, to convert patches to regular commits, to refresh patches, and to reorder them. Strictly speaking MQ provides no functionality that could not be implemented by rebasing and history editing (rebasing and interactive rebasing in Git lingo) together with dedicated branches for unapplied patch queues. However, MQ automates the necessary bookkeeping (which commit is a patch commit and must not be pulled/pushed, to which patch queue does an applied patch belong, etc) and provides dedicated commands that don't require to re-express the desired operation in terms of rebases and history edits.

Git does not have dedicated patch management functionality. However, as explained above, patch management can still be done by treating patches as regular commits in dedicated branches and doing manual bookkeeping and rebasing.

Mercurial is less complex

This is probably the most subjective point in this article, but I believe that it nevertheless reflects consensus. Generally, Mercurial is less complex than Git and thus faster to learn and less likely to put non-expert users into perplexing situations. This is the result of several factors:

  • Mercurial has fewer concepts that the user needs to grasp to use it effectively. There is no staging area, no mapping between remote and local branches, no plumbing layer, no "fast-forward" merge, and no remote tracking branches.
  • Mercurial commands are more consistent. This is probably because most of them were designed together, while Git's user interface has grown and changed over time.
  • Mercurial documentation is easier to understand. This is partially a consequence of the last two points (there is less complexity that needs to be documented, and often the meaning of a command or option is intuitive), but also the result of a very different writing style. Compare, for example, the help for hg revert with the help for git reset.

Obviously, less complexity comes at a price: some functions that Git provides have simply no equivalent in Mercurial. Typically, the absence of these functions gets more noticeable when projects get bigger. For example, if there are many development lines (production, qa, bugfix, development) as well as many different interacting repositories, having distinct namespaces for the branch names in each repository becomes very handy.

Mercurial users coming to Git should expect a steep learning curve - it will take you a while to memorize the commands, and for a while you will occasionally encounter situations that require you to go back to the documentation to figure out what just happened.

Git users coming to Mercurial are in for a treat: you will be able to work productively pretty quickly and with few suprises. However, over time you may notice (and have to work around) the absence of some concepts that you used to rely on.

Mercurial has a nice GUI

If you are used to something like TortoiseHG under Linux, the GUIs that are available for Git will be sorely disappointing. The best that I was able to find is Emacs' Magit and Gitk (shipped with Git), but even together they are very far from what TortoiseHG provides.

Under Windows, the situation is somewhat better because there is GitHub Desktop and Sourcetree, but both are proprietory tools. When coming from Git to Mercurial, make sure to take a look at TortoiseHG (available Linux, MacOS and Windows) - you may be pleasantly surprised.

BitBucket sucks, GitHub rules

Comparing BitBucket and GitHub probably provides enough material for another article, but I'd like to at least briefly mention them here. Essentially, the situation here is the opposite of what I said about the available GUIs: even though they superficially provide the same service, GitHub is far ahead of BitBucket in terms of usability as well as features. For example, merging pull-requests in BitBucket always creates a named branch that one has to manually close after merging, and that stick around forever (note that there is no need to do that as far as Mercurial is concerned, the deficiency is in BitBucket).

Non-destructive history editing

In Mercurial, the evolve extension implements non-destructive history editing. This means that a later commit can "obsolete" one or more earlier commits. The obsolete commit is not removed from the DAG, but will be ignored by most Mercurial commands by default. With changeset evolution enabled, there are effectively two layers of history: the history of the managed content, and the history of the DAG itself. This means that just like you can use Mercurial to determine the contents of a file several commits ago, you can query for the status of the commit DAG prior to e.g. a rebase. The big advantage of this is that it allows history-mutating operations (like a rebase) to be pulled and pushed between repositories.

While it sounds exciting, the evolve extension is not yet sufficiently stable to be included in the Mercurial core so it needs to be downloaded and installed separately. It is possible that a similar out-of-tree extension exists for Git - if you know about it, please leave a comment below!

Things that are not different

In the past, Git and Mercurial had some additional significant differences. While these are no longer present in current versions, the internet doesn't quite have caught up with that so it's worth listing these non-differences here explicitly.

  • Mercurial repositories are not orders of magnitude bigger than Git repositories. This used to be the case, but has been reduced to about 30% in more recent Mercurial releases. The GNU Emacs repository takes 239 MB in a fresh Git clone and 336 MB when stored in Mercurial. Git repositories, however, need occasional "repack" operations to minimize space usage, so when the Git repository is actively used the above number is more of a lower bound for the actual size.
  • Mercurial lets you change history. This has always been the case, but the feature has to be enabled in Mercurial's configuration file first and is labeled as an "extension" (even though it is fully supported and shipped with Mercurial).
  • Git is well documented. The documentation of early Git versions was atrocious, but these days you can actually learn Git by reading the manpages.
  • Git works well under Windows. This used to be different, but has also been rectified quite some time ago.

Comments