Concurrent XML Authoring with (Fairly) Easy Merges

Posted by

Here in suburbia on a quiet Sunday, the neighbors and I have been very productive. One’s been mowing the lawn; another’s cleaned his car; and I’ve figured out how to string some free and low-cost tools together in a way that makes collaborative authoring of XML content a bit easier.


I got into distributed version control a few years ago, when I needed an easy way to develop code collaboratively. If you’re not familiar with DVCS, basically it’s the approach used by Git (as in Github). If you haven’t heard of Git either, or its less-known but arguablysuperior cousin, Mercurial, they’re systems that cut out much of the frustration of old-style version control for software development, letting you continue to work without a network connection on the same bits of code as the rest of your team, merge them later without problems, and still get the benefits of really good tracking of your project history (even better, in fact). Four years ago, Joel Spolsky wrote that DVCS was here to stay. Actually, it’s rocketed through the roof since then, taking much of modern software development with it.

The idea with DVCS is that everyone has the whole repository locally and can work on it at any time. It’s really easy to merge your code with other developers’ versions (or changesets, to use a more accurate term, as distributed version control works with changes rather than monolithic, linear versions). There are a lot of good 3-way diff and merge tools (some free, some worth more than their modest cost) that often mean you don’t even have to manually merge anything. Even when they do need your help to resolve something (e.g. if people make incompatible changes to the same line of code), they make it fairly easy. At least, that’s for line-based content such as source code.

Though content developed in XML isn’t code, there’s loads of potential for managing it using DVCS. Adrian Warman gave a fascinating presentation on this last year, and a number of other people have been trying it on their content.

However, merging XML content isn’t as easy as it is with code. The trouble with regular merge tools is that they are only aware of changes at the line level. If a sequence of lines has changed in one version but not another, it’s automatically included in the merged file. That’s fine where it’s a simple case of an element being added, but how about if different changes are made by two authors to child elements such as list items within an unordered list? If there is no conflict on any one line, the resulting automatically merged file may have overlapping element tags. And if a conflict is detected (or if you enforce a manual checking stage), it may take a bit of copying and pasting to fix the XML and get the merge result you need. Even if no change is made to the content, but the XML authoring tool changes line breaks, a regular line-based tool will see that as a change. The following two elements are equivalent in XML but will not be seen as such by a line-based merge algorithm:

<ul>
    <li>
        Here's some good stuff here.
    </li>
    <li>
        And here.
    </li>
</ul>

<ul><li>Here's some good stuff here.</li><li>And here.</li></ul>

What’s really needed is a merge tool that’s aware of the tree structure of XML, and there aren’t many such tools around. DeltaXML is an amazing enterprise-level product with great potential for integration with authoring tools and CMSs. It’s not really designed for DVCS integration, though, at least at the moment. But another tool is. It’s called Project: Merge.

Project Merge has a very simple UI. For each element or attribute where there’s a difference between your version, the other new version, or the original, all three are shown and you pick the one you want to keep. If you prefer just to compare the two new versions, it can do that too, though I find it helps to have the original there as well.


Project: MergeProject: Merge

Project: Merge

It’s taken me a little while to get it hooked up to SourceTree, the GUI tool I’m using to manage my Mercurial repositories.


SourceTree, about to open external merge toolSourceTree, about to open external merge tool

SourceTree, about to open external merge tool

Being on a Mac made it more complicated, as Project Merge is Windows only, but the excellent emulator, CrossOver, got it running smoothly, including the crucial bit of passing command line arguments through to the app. A shell script sorted out some environment variables, and I was done. (If this paragraph sounds scary, it should be much easier to set things up in Windows. If you’re on OS X and want to try it, get in touch and I’ll be happy to list the steps!)

Now I have a very nice toolchain set up. Any merge conflicts on a file, and Project Merge either handles them itself silently (if all the changes are are on different XML nodes and don’t actually conflict), or brings up the window where I can pick the bits from my latest version / the other version / the original version that I want.

This isn’t for everyone who needs to author in XML, to be sure. The nuts and bolts on the XML side are somewhat exposed, and the different approach to version control takes some learning. There’s no checking in or out any more! But for someone willing to put in a couple of days to get familiar with Mercurial (and GUI tools such as SourceTree and TortoiseHG do make it super easy), the reward could be collaborative XML authoring that’s fast, reliable, and clear.