Tuesday, August 28, 2018 Eric Richards

CLRS Algorithm a Day: Mergesort


MergeSorting a hand of playing cards

So we're still not up to an algorithm a day, but I'm gaining... So far, we've been looking at relatively simple quadratic (i.e. O(n2)) sorting algorithms. These have been incremental algorithms, so that we sort one element, then we sort the next unsorted element, and so on. Another category of algorithms are those known as divide-and-conquer algorithms. In this style of algorithm, we take our entire data set and partition it into smaller, simpler portions, solve each subportion, and then integrate the subportions back together to yield the full result.

Mergesort is a simple example of a divide-and-conquer algorithm, described in Chapter 2.3.1 of CLRS. Perhaps this should really be called divide-conquer-combine, as that more accurately describes what we are doing:

  1. Divide: Split our items in half, giving us a left set and a right set each of n/2 items.
  2. Conquer: We then need to sort each of these smaller sets, so we recursively apply the divide step on each of the subsets. Trivially, once we have reached subsets of a single item, that subset is sorted, so that is our base case where we stop dividing.
  3. Combine: When the two subsets are sorted, we then merge them back together to produce the full sorted set.
You can see this procedure in action in the GIF to the left. Dividing into the left and right subsets is trivial, so the real key is the merging operation. Using cards as an example, this is quite intuitive; when we are merging, what we do is look at the first card in the left set and the first card in the right set. Whichever card is smaller, we strip it off and add it to the merged set, and continue in that fashion until one of the sets is empty, then add all the remaining cards in the non-empty set.

Simple, right? Conceptually, it certainly is, and if I were going to do this with some higher-level constructs like LINQ, it would be a snap to translate almost directly into code, but we're going to do this the hard way, and so that makes it a little more complicated ☺.

Simple MergeSort with Sentinels

In CLRS, the pseudocode for mergesort is somewhat obtuse, primarily because they stick to a math/Fortran-esque aversion to meaningful variable names and the convention of using 1-based arrays, rather than the 0-based species you will most likely be familiar with. Makes the exercise more difficult rather more difficult than it really ought to be...

As before, we'll start with the simple case of sorting a plain-old integer array, implemented as an extension method as we have done before. The first step is our top-level entry point:

public static void MergeSort(this int[] a) {
    a.MergeSort(0, a.Length - 1);
}
Because mergesort is a recursive algorithm, we'll be making use of a private helper function to do the actual work, and pass the indices of the first and last items in the array as parameters to this helper:
private static void MergeSort(this int[] a, int min, int max) {
    if (min < max) { // we have at least two items
        var mid = (min + max) / 2; // *naive way of finding the median*
        a.MergeSort(min, mid); // Sort the left half
        a.MergeSort(mid + 1, max); // Sort the right half
        a.Merge(min, mid, max); // Merge the left and right halves into the final sorted array
    }
}
The first thing we do is check that we have at least two elements in the portion of the array that we are looking at; remember that if we have only a single element, the array is already sorted.

Then we figure out the middle index of the array, which we will use to partition the array into the left (a[min-mid]) and right (a[mid+1, a[max]) sub-arrays. Finally we call a second helper function, Merge, which we'll get to in just a second, to integrate the two sorted sub-arrays back together into the final sorted array.

One of the tricky bits that can cause some confusion here is understanding how the partitioning actually works here. When we divide the array, we are not actually creating new arrays yet, we are in a sense virtually partitioning via index pointers into the original array. In some sense, this is a limitation of restricting ourselves to using low-level constructs; this could be written more intuitively if we had access to constructs like easy array slicing, and wrote this in a more functional style; I'll do that towards the end as an example.

Another stumbling block may be understanding how the control actually flow, due to the recursive nature of the algorithm. I've tried to mimic in the animation above how the actual call-stack would be constructed for the example. You may find it helpful to pick an example and work through on paper what the recursion looks like by inlining each function call, like so:

var a = new[] { 4, 6, 1, 3, 2 };
a.MergeSort()=> {
    a.MergeSort(0, 4) => {
        a.MergeSort(0,2) => {
            a.MergeSort(0,1) => {
                a.MergeSort(0,0);
                a.MergeSort(1,1);
                a.Merge(0,0,1);
            a.MergeSort(2,2);
            a.Merge(0,1,2);
        }
        a.MergeSort(3,4) => {
            a.MergeSort(3,3);
            a.MergeSort(4,4);
            a.Merge(3,3,4);
        }
        a.Merge(0, 2, 4);
    }
}

Merge function

The real meat of this type of MergeSort implementation is in the Merge function, which takes the sorted subarrays and merges them back into a bigger sorted array.

private static void Merge(this int[] a, int min, int mid, int max) {
    // calculate the size of the left and right subarrays
    var lSize = mid - min + 1;
    var rSize = max - mid;
    // create the left and right subarrays
    var left = new int[lSize + 1];
    var right = new int[rSize + 1];
    // set sentinel values
    left[lSize] = int.MaxValue;
    right[rSize] = int.MaxValue;
    // populate the subarrays
    int leftIndex;
    int rightIndex;
    for (leftIndex = 0; leftIndex < lSize; leftIndex++) {
        left[leftIndex] = a[min + leftIndex];
    }

    for (rightIndex = 0; rightIndex < rSize; rightIndex++) {
        right[rightIndex] = a[mid + rightIndex + 1];
    }

    // merge the left and right subarrays back into the original array
    leftIndex = 0;
    rightIndex = 0;
    for (var i = min; i <= max; i++) {
        if (left[leftIndex] < right[rightIndex]) {
            a[i] = left[leftIndex];
            leftIndex++;
        } else {
            a[i] = right[rightIndex];
            rightIndex++;
        }
    }
}
Here is where we actually create temporary arrays to hold the left and right subarrays. We are using a bit of a trick here to make the code a bit simpler, by making each subarray one element larger than it should be, and sticking a sentinel value in the final position. This is a value larger than anything that the array would actually hold. This allows us, when we are merging the arrays back together, to avoid worrying about indexing outside either of the subarrays, because when we reach the sentinel value in one subarray, all of the remaining non-sentinel values in the other subarray will be less than it. And because the original array is smaller than the combined lengths of the sentinel'd subarrays, we will stop before we try to compare the two sentinels against each other.

Complexity Analysis

I will try to spare you the pain of drawing out recursion trees and doing proofs... If you are here for a quick big-O runtime of MergeSort, then the long and the short of it is that it's runtime is always going to be O(n lgn). For SelectionSort, we had a runtime that was always O(n2), and for InsertionSort, an average runtime also of O(n2), although a best case of O(n) if the array was already sorted. So from a purely theoretical standpoint, MergeSort is the superior algorithm, since O(nlgn) is smaller than O(n2). This is of course, not always strictly true in the real-world, or for all input sizes, as constant factors are real; with luck I'll have some time to explore that at a later date. But your general rule of thumb will be that MergeSort should sort things faster than either InsertionSort or SelectionSort.

Why is that? I'll try to keep this succinct...

Looking back to our MergeSort() function, we have an array of n elements, so we'll give it a placeholder runtime that we'll call T(n). Within MergeSort, we are doing essentially three operations:

  1. MergeSort(n/2) - sort the left half recursively.
  2. MergeSort(n/2) - sort the right half recursively.
  3. Merge(n) - merge the two halves of the array back together.
In our base case, where there is only one element of the array, we don't actually do any work, so we can count that as a constant. So T(1) = c.

If MergeSort of n elements is T(n), then MergeSort of half the elements should be T(n/2). So the time for the left and right sorts adds up to being 2T(n/2). That leaves us to figure out what the runtime of the Merge() function is.

I am really not great at proving these sorts of things formally and rigorously, but just examining the code and following some rules-of-thumb get you a long way. Probably the most basic rule of thumb is to count how deep your loops are nested; if you've got a loop one level deep, then you are probably O(n) - if you've got nested loops, as we do in Insertion and Selection Sorts, then you are likely O(n2). Caveat emptor - you do need to know about the runtime of any code you call into inside your loops... So let's think about what we're doing in the Merge() function:

  • We're allocating two arrays, and between the two, we're copying in n elements from the original array. That's basically an O(n) operation.
  • Then we have a loop where we go from the start of the original array to the end of it, and copy elements from one or the other of the subarrays back in. That also looks like O(n)
  • We do some other additions and calculations, but that's always the same, regardless of the number of items, so we can fudge it and call that a constant c.
So if we just add that all up, then we've got about 2n + c, which reduces down to being O(n). So that means our Merge() function is O(n) == cn.

Getting back to looking at MergeSort(), our total is now up to T(n) = 2T(n/2) + cn. And we still have T(1) = c. So let's plug some numbers.

Imagine that we have a list of four elements:

  • T(4) expands out to: T(2) +T(2) + 4c
  • T(2) +T(2) + 4c == T(1) + T(1) + 2c + T(1) +T(1) + 2c + 4c
  • T(1) + T(1) + 2c + T(1) +T(1) + 2c + 4c == c + c + 2c + c + c + 2c + 4c
  • Rearranging things slightly, this gives us 4(c) + 2(2c) + 4c

So, off the bat, you may notice that each time we expanded the T(n) fragment of the equation, we ended up adding 4c or n*c. That would suggest that in our big-O runtime we have at least one n term, so the next thing to determine is how many times we end up having to expand out the recursive term to reach the base case. One example isn't really enough to get a good feel for how that grows, so let's work a couple more examples:

  1. n = 2: 2(c) + 2c = 2 expansions
  2. n = 8: 8(c) + 4(2c) + 2(4c) + 8c = 4 expansions
  3. n = 16: 16c + 8(2c) + 4(4c) + 2(8c) + 16c = 5 expansions
At this point, a bit of a pattern starts to become evident. The number of expansions matches the base-2 logarithm of n, plus one => log(2) = 1, log(4) = 2, log(8) = 3, log(16) = 4. So this gives us a total runtime of n*(log(n) + 1), or n*log(n) + n, which for purposes of big-O notation, reduces down to O(n*log(n)).

Generic Mergesort without Sentinels

In the previous version, we used sentinel values to simplify some of the logic in our Merge routine. There are a couple of drawbacks to this kind of an implementation:

  • We need to have some kind of reasonable value that we can use as a sentinel value. When we are only considering numeric types, this is not too big a problem - particularly in .NET, where we have MaxValue constants that we can make use of. But what would the reasonable value be for a string? Or for a more complex type? We could possibly fudge it and abuse null as a sentinel, but then we run into problems if we are trying to use such a sort with value-types or structs.
  • It's also somewhat wasteful of space. Allocating two extra elements on every call to Merge, even if they are temporary, and we have a garbage collector to clean up after us, can amount to a lot of wasted memory.
Fortunately, it is not wildly more complicated to code a Merge routine which can avoid allocating the extra sentinel values. The main difference is that we now have to make sure that when we are merging the arrays back together, we do not index past the end of our temporary arrays. For simplicity, we'll just show the implementation that works on IComparables, although you could use the same strategies as shown before with Insertion Sort to use IComparers or comparison lambdas.

FYI, this covers Exercise 2.3-2 of CLRS.

public static void MergeSort<T>(this T[] a) where T : IComparable<T> {
    a.MergeSort(0, a.Length - 1);
}

private static void MergeSort<T>(this T[] a, int min, int max) where T : IComparable<T> {
    if (min < max) {
        var mid = min / 2 + max / 2 + (min & max & 1); // *slightly more sophisticated way of finding the median for very large sets*
        a.MergeSort(min, mid);
        a.MergeSort(mid + 1, max);
        a.Merge(min, mid, max);
    }
}

private static void Merge<T>(this T[] a, int min, int mid, int max) where T : IComparable<T> {
    var lSize = mid - min + 1;
    var rSize = max - mid;
    var left = new T[lSize];
    var right = new T[rSize];
    int leftIndex;
    int rightIndex;
    for (leftIndex = 0; leftIndex < lSize; leftIndex++) {
        left[leftIndex] = a[min + leftIndex];
    }

    for (rightIndex = 0; rightIndex < rSize; rightIndex++) {
        right[rightIndex] = a[mid + rightIndex + 1];
    }


    leftIndex = 0;
    rightIndex = 0;
    for (var i = min; i <= max; i++) {
        // We now have to do bounds checking to make sure that we stay within our right and left arrays
        if (leftIndex < lSize && rightIndex < rSize && left[leftIndex].CompareTo(right[rightIndex]) < 0 || leftIndex < lSize && rightIndex >= rSize) {
            a[i] = left[leftIndex];
            leftIndex++;
        } else if (rightIndex < rSize) {
            a[i] = right[rightIndex];
            rightIndex++;
        }
    }
}

You may also notice that we are changing the way that we calculate the midpoint; before we were using the simple (a + b) / 2 form that we all know and love from elementary school. The problem with that is if we have very large arrays that we are sorting, this calculation could potentially overflow. In reality, on .NET, this is not actually a problem, however - due to framework limitations, arrays are limited to at most int.MaxValue entries, and there is additionally a restriction on the total amount of memory that the array can allocate (around 2GB), which makes the effective size of the largest array that you could create much lower. Still, this is worth remembering in case things change in the future, or you are working with large arrays on a platform that doesn't have these restrictions. For more information, check out this StackOverflow post (also, to find an example of how trivial and all-encompassing software patents can be...).

Functional MergeSort with LINQ

Earlier, I mentioned that if we were using some slightly higher-level constructs, we could describe MergeSort in a very succinct way, so I took a crack at it using LINQ also. For the back-to-basics approach I'm targeting, this is kind of cheating, but this is also very similar to the classical functional style of approach that you might see in languages like Lisp or Haskell.

public static List<T> MergeSort<T>(this List<T> a) where T:IComparable<T> {
    // base case
    if (a.Count == 1) {
        return new List<T>(a);
    }
    // find the split point
    var mid = a.Count / 2;
    var left = MergeSort(a.Take(mid).ToList());
    var right = MergeSort(a.Skip(mid).ToList());

    // merge the sorted subarrays
    return Merge(left, right);
}

private static List<T> Merge<T>(List<T> left, List<T> right) where T : IComparable<T> {
    if (right.Count == 0) {
        return left;
    }
    if (left.Count == 0) {
        return right;
    }
    var l = left.First();
    var r = right.First();
    return l.CompareTo(r) < 0 
        ? new List<T>{l}.Concat(Merge(left.Skip(1).ToList(), right)).ToList() 
        : new List<T>{r}.Concat(Merge(left, right.Skip(1).ToList())).ToList();
}

Well, that's enough MergeSort, at least for today. Next up I'll continue churning through the exercises and problems from CLRS Chapter 2. There are some fun ones coming up; I'm finding that I'm enjoying this a whole lot more than I did originally ten years ago. It's certainly a lot more fun working through them with code and unit tests than it was on paper and chalkboard, even when I'm eleventy thousand times more busy and have less time than I did then.