posted 3 years ago in Dev Platform category by Ji Woong Jeong
In the world of existing version control systems (VCS), there are numerous cases in which more costs than expected are required in the process of merging branches due to frequent conflicts when the agile-style development method is used for development. In this situation, Distributed VCS (DVCS) is being highlighted as an alternative. In this article, we will compare two of the leading DVCS products, Git and Mercurial.
Migration to DVCS
The version control system ("VCS" hereinafter) is the tool that developers most frequently encounter in the Integrated Development Environment (IDE). Unlike IDE, VCS does not allow much freedom of selection for developers, as all members of an organization must use an identical VCS. But ironically, as hard as developers work, the existing VCS causes more problems. In the environment in which the function-unit change is simultaneously and paratactically made by applying the Agile development method, conflicts occur frequently. As many conflicts occur while merging branches and the costs required for merge are high, it is very rare for developers to use branches and merge function as the previous intention.
In this environment, DVCS is being highlighted as an alternative. In particular, as open tools such as Git and Mercurial which can replace commercial DVCSs such as Bitkeeper and Plastic SCM have been introduced and used in large-scale projects including Linux Kernel, open tools have started to rapidly spread into the industry. The advantages of DVCS are powerful branch and merge, even though it has a different history concept. The biggest advantage of DVCS is that it is based on the distributed environment; it can successfully run with a local repository and support diverse forms of cooperation with the existing VCS in order to compensate for its short history. These advantages allow developers to try it without high development costs.
If you want to use DVCS, you may wonder what kind of tools you can use. There are a number of open tools for DVCS, such as Darcs, Bazzar, arch, SVK, Fossil, Mercurial, and Git, and commercial tools, such as Bitkeeper and Plastic SCM. Of these, we will compare the most popular open tools, Git and Mercurial.
Git and Mercurial
Git and Mercurial have an almost identical philosophy. This may be because of the features of DVCS: a lightweight, and easy-to-scale VCS. It absolutely shows the characteristics of the 'D' in DVCS (Distributed VCS). In particular, the history of changes in both is shown in a graph format that displays multiple parent changes and multiple child changes, rather than a time-dependent linear structure.
The difference between the two products is that Git has been designed subject to numerous parallel branches from its origin. On the contrary, Mercurial has no such advantage, but is easy to learn and use. These are the most important differences to consider when deciding whether to use Git or Mercurial.
Git is completely written in C Language, bsh (Bourne Shell), and Perl. Most of Mercurial is written in Python.
In the above, I say that 'Most of' Mercurial has been written in Python because the parts related to Binary Diff are written in C language. I did not mention this above because it does not remove or reduce the advantage of Mercurial for cross-platform work.
In other words, Git is more Linux-friendly than Mercurial. Therefore, in Windows, Git requires an emulation environment such as mingw32, and Mercurial shows relatively better performance than Git in Windows. However, Git can expand its commands without the C language if developers can use shell script. On the contrary, in Mercurial, it is more difficult for developers to expand commands than on Git if the developers do not understand Python core code or command composition.
Mercurial provides a rich bundle extension. However, its freedom of extension is somewhat limited.
From the perspective of usability, Mercurial can be compared to a comprehensive tool set, while Git can be compared to Swiss Army Knife.
Mercurial is a comprehensive tool set, containing highly-complete tools. Of course, it can be extended, but it is not easy to make the extension commands. Instead, Mercurial provides the most frequently required functions in bundle extensions.
On the other hand, Git is a kind of Swiss Army Knife that includes the essential functions in a compact and fashionable style. Therefore, Git may be considered as limited, as it provides a few basic functions only. However, Git allows developers to create new commands by extending or combining basic commands using shell script, to create handy and customized tools. In this way, Git can sometimes provide usability equivalent to or better than that of Mercurial.
Mercurial has a well-arranged and pre-defined alias.
'st', the alias of 'status', 'ci', the alias of 'commit', 'up', the alias of 'update', etc.
In most cases, it provides a ready-to-start environment without any configuration, or simply by setting the user name. On the other hand, Git does not provide any general configuration, requiring more configuration than Mercurial.
Deeper and Deeper
As of now we have reviewed the differences between Git and Mercurial. Their differences can be found in the implementation method as well as in the implementation language. The biggest difference in the implementation method is the structure of repository and the branch.
First, the repository of Git is based on snapshots. Items that include all changes or files are expressed as Objects. The types of objects are Commit, Tree, BLOB, and Tag. BLOB, a leaf node, includes a managed file. The other objects create the following tree structure by referring to other objects.
Figure 1: Git Object Tree Structure
Here, the target to be managed by BLOB is the entire contents of the file at a specific time. Therefore, the size of the Git repository increases quickly. To solve the problem, gc command is provided. By executing gc command, the inaccessible branches are removed, and the old changesets are saved in the form of compressed diff files to improve repository efficiency.
On the other hand, Mercurial simply traces the changed parts of each file. A repository has a change recording file of which the extension is .i, the identical format as the file to be managed. In the change recording file, the change history of the file that the corresponding file indicates is saved in the binary file format. Therefore, the repository size is increased in proportion to the changed part at a relatively moderate speed compared to Git.
When gc command is executed, the size of Git repository is almost the same or smaller than that of Mercurial repository.
Figure 2: Mercurial Meta Data Structure
'_' in front of letters means that the character is uppercase. '__' is escaped to '_'. These are used to keep filenames in the file system that is not case-sensitive.
Therefore, its advantage is that no additional repository management work is required. It is easier to create patches or trace change history by using Mercurial than executing those by using Git. However, in most cases, creating, updating, or committing a snapshot by using Mercurial requires higher costs than using Git.
Merge and Change History
Figure 3: Difference between Linear Structure and Revision Structure
The latest changeset is always the only one in the linear structure. However, DAG can have several of the latest changesets.
Merge and branch are the biggest advantages of DVCS compared to the existing VCS, such as SVN. Since the system has been developed on the premise of the distributed environment, all change histories are basically displayed in DAG (Directed Acyclic Graph) form. This allows smarter merge work than the simple snapshot-based 3-way merge provided by SVN. This is because each changeset has the reference for the parent changeset, significantly reducing the potential for conflict. The difference lies not in implementation, but in operation. Git considers a large-scale branch, supporting n-Way merge. On the contrary, Mercurial basically considers the anonymous branch (or dynamic branch), supporting 2-Way merge. If there are N branches to be merged, Mercurial merges (N-1) branches.
Generally, the change history graph provided by Mercurial shows a narrower width than that of Git, and is significantly similar to the change history management of SVN. The most popular difference that results is that Mercurial provides a revision number identical with the revision number of SVN for the change history saved in the local repository. However, the revision number cannot be used as an identifier, since it is not guaranteed that the change history is kept as it is in the other replicated repository. Finally, the difference may be helpful when developers get accustomed to the product, but can be the cause of misunderstanding in Mercurial.
DVCS branch basically runs based on the replication of the repository. However, both Git and Mercurial provide a branch technique based on Tagging. When a specific commit that is branched is tagged, the tag runs as if it is the name of the branch. Almost the same, but the difference is that the operation for a branch is shown when a repository is reflected to the remote repository, or in reverse. Mercurial reflects all branches at once. On the other hand, Git only reflects the branch that it has been working on.
The size of Git repository increases relatively quickly compared to the project scale. However, Mercurial shows a moderate linear increase.
When Mercurial is used along with SVN, it saves the SVN meta data internally, so the repository may be increased at a speed faster than Git.
But Git is based on the snapshot, maintaining consistent performance in large-scale projects, and faster performance than Mercurial in most cases. However, Mercurial is based on difference and provides small disk I/O, showing relatively stable performance even when huge read/write occurs. However, the patch merge costs increase as the project scale gets larger and lots of change history is accumulated. In addition, there are two advantages of Mercurial; its repository runs based on append, and the repository is not greatly affected by disk error.
Link with Subversion
As mentioned previously, in most cases, developers cannot select their desired VCS. Therefore, developers must use the new VCS with the existing VCS. The most popular VCS is SVN (Subversion), and so we will compare Git and Mercurial in terms of interworking with SVN.
Both of the two basically receive each commit of SVN and commit it to the local repository again. In this process, for Git, sometimes the operation was stopped, but now it successfully operates for a certain scale of project. And because of the basic difference in performance, Git is a little bit faster than Mercurial. Mercurial saves SVN meta data in its meta data. Therefore, unlike the general repository, Git repository is more compact when it is used with SVN. However, Mercurial is more stable than Git, since Mercurial started interworking with SVN first and continued for longer than Git.
But this is limited to cases in which commands have been properly executed; once a command is incorrectly executed, recovery is more difficult than in Git.
Those kinds of problems occur because SVN linearly manages the list of changes. When SVN is used as the front end of the DVCS, the most frequent problem is changing the graph-type change history into a linear type. Rebase is used to solve the problem for both Mercurial and Git. For Mercurial, this rebase must be directly executed. For Git, rebase is included in the work of applying the change history to the remote repository.
The environment is generally configured as follows.
Figure 4: Central Repository Configuration
Of course, since it is DVCS, configuration without a central repository is possible. However, if SVN is used along with DVCS, this is the possible configuration. However, when both SVN and DVCS are used together, or the solution is migrated from SVN to DVCS, remember that DVCS does not provide some SVN functions such as svn:externals and checkout by subdirectory. Checkout by subdirectory must be addressed by reconfiguring the repository, and functions like svn:externals can be solved through a similar extension (Mercurial - SubRepos, Git - SubModule). However, it is recommended to fully consider this before using it, because the functional difference is not so small.
For Mercurial, conversion of svn:external to subrepo has been added to the version 1.9.1 (2011-08-01).
Git and Mercurial both represent DVCS and are actively developed. They are being used in many projects and migrations to large-scaled projects such as Kernel, Mozilla, Android, and OpenJDK. In addition, they can be configured to affect the individual environment of a developer only, and provide reasonable interworking with SVN. Like other software, the two are quickly taking after each other by absorbing each other's advantages. Therefore, you can choose the one you wish to use. However, to reduce the stress of selecting one, we summarized the comparable factors, including the above, in the following table.
For your reference, ++ means that the item is provided by default or cannot be changed in any case, + means that the item requires additional configuration, or can be changed or partially supported according to the situation.
Table 1: Comparison of Git and Mercurial
|Has vast experience using SVN and wants to reduce learning curve as much as possible||+|
|The OS: Windows or *NIX||*NIX||Windows||In Windows, Mercurial is overwhelmingly fast. Mercurial is recommended.|
|Wants to experience a new solution or examine the concept of DVCS.||+|
|Wants to use the solution for various purposes besides VCS, or require powerful customization and comment sets||+|
|Subdirectory checkout||- submodule||- subrepo||Included as bundle extension|
|Preservation of authority||Only the execution authority||Only the execution authority|
|Change history model||Snapshot||Patch|
|Language||C, Bourne Shell, Perl||Python||For Mercurial, only binary diff has been implemented in C language.|
By Ji-woong Chung, EC Display Service Development Team, NHN Corp.
Mercurial Bundle Extension List
The following table shows the bundle extensions frequently used among the Mercurial Bundle Extension List. For a more detailed list, see the official Wiki site (http://mercurial.selenic.com/wiki/CategoryBundledExtension).
|children||ChildrenExtension||Shows the child revision of which the parent revision is a specific revision.|
|churn||ChurnExtension||Shows the statistics on tasks by user.|
|convert||ConvertExtension||Changes a different VCS to Mercurial. Supports CVS, SVN, Git, Bazaar, and Perforce.|
|color||ColorExtension||Displays the output of some commands such as diff and status in color.|
|eol||EolExtension||Converts the end-line characters between the working file and the repository.|
|extdiff||ExtdiffExtension||Outputs the diff result by using an external program.|
|fetch||FetchExtension||Pull, merge, and update at once!|
|gpg||GpgExtension||Digests the change set by using GPG, and checks the result.|
|graphlog||GraphlogExtension||Outputs the revision graph in ASCII.|
|hgcia||HgciaExtension||Sends a notice to CIA (http://cia.navi.cx/).
A service that provides various statistical information on activities that use VCS.
|highlight||HighlightExtension||Provides syntax highlight in the display of file contents on Mercurial Web server.|
|mq||MqExtension||Allows management of the patch in the form of a queue.|
|progress||ProgressExtension||Displays the progress when executing some commands (1.5 or higher).|
|rebase||RebaseExtension||Changes the parent of the change set.|
|win32mbcs||Win32mbcsExtension||Allows the file name to be written as shift_jis/big5 on the window.|
- Git Official Site
- Mercurial Official Site
- Git vs. Mercurial: Please Relax
- The Differences Between Mercurial and Git
- The Git Object Model
- A few more great resources for comparison between Git and other VCS provided by one of the commenters
- A site dedicated to Git appraisal. A great place to start.