Recursion: Merge Sort

A recursive algorithm for sorting arrays

As mentioned in the previous set of notes, recursion involves noting how a problem can be described in terms of smaller versions of the same problem. This can be illustrated by considering a recursive algorithm to sort an array as opposed to the previously considered iterative sorting algorithm.

If we have an array of n items, we can "break" it into two arrays, and then sort each of those arrays. Then we can put the two arrays together by merging them. This means taking an item from the two sorted arrays, at any time whichever is smallest and has yet to be taken, and putting it into our final array.

This is recursive, because we have described sorting in terms of sorting. So we can do it by making a recursive call to our original algorithm. With recursion we always need a base case, a case which is solved without recursion. The base case here is when we are sorting an array of length 1. The idea of an array which has only one element in it is a bit strange, but is allowable in Java. Obviously such an array is already in order because there is only one possible order for it.

This sorting algorithm is known as merge sort. Here is a start at designing a Java method to sort an array of integers using merge sort (we will assume, as we did before, that we are sorting arrays of integers, although the same algorithm could be used to sort an array of anything so long as we have a way of ordering the array contents):

public static void sort(int[] a)
{
 if(a.length>1)
    {
     Break a into two halves
     Sort each of the two halves
     Merge the two halves back into one
    }
}

Breaking an array into two

Now let us be a little more precise about what we mean by "breaking an array into two". Java doesn't allow us to do this directly (though we could imagine a programming language that did: renaming the space used to store one array as two arrays). So what we will do is create two new arrays, each of length half that of the original array. We will then copy the first half of the items of the original array into one of the new arrays, the second half into the other. So, for example, if we start off with this situation:

we will end up with the situation:

Note that we cannot guarantee the two new arrays will be of identical lengths, since if the original array is of odd length, we will have to break it into two arrays, one of length one greater than the other.

Here is some Java code to do this breaking task:

 int i,mid = a.length/2;
 int[] half1 = new int[mid];
 int[] half2 = new int[a.length-mid];
 for(i=0; i<mid; i++)
    half1[i]=a[i];
 for(; i<a.length; i++)
    half2[i-mid]=a[i];
I haven't expressed this in terms of a separate method, since that would raise the problem of how to get the method to return two arrays. The code fragment here assumes the initial array is called a and names the two new arrays half1 and half2. Note that because variable i is declared outside the for loops, the value it has when the first for loop finishes (which must be equal to mid) remains when the second for loop starts.

Actually, for this algorithm it doesn't matter how the items in the original array are distributed into the two new arrays, so long as they are of equal size (with the possibility that one is one longer than the other, as mentioned above), and each item in the original array is found just once in just one of the new arrays. The following would be a perfect acceptable alternative:

 int mid = a.length/2;
 int[] half1 = new int[mid];
 int[] half2 = new int[a.length-mid];
 for(int i=0, j=0; i<a.length-1; i+=2,j++)
    {
     half1[j]=a[i];
     half2[j]=a[i+1];
    }
 if(a.length%2==1)
    half2[mid]=a[a.length-1];
which would lead to the two new arrays in the case above being:

This code fragment shows how a for loop can have multiple variable declarations in the first part of its header, and multiple variable updates in the last part.

Merging two sorted arrays into one

Sorting the two arrays we obtained from the original one is easy:
 sort(half1);
 sort(half2);
In the usual way with recursion, we will assume our recursive call(s) work, and if under this assumption the call that made them works, then the whole recursive algorithm must work. So, assuming the breaking of the array into a front and back half as first proposed, after the recursive calls we have the situation:

Our algorithm does sorting in place, as we did with selection sort, so these two arrays will be named half1 and half2, the recursive calls will have moved their contents around. Since we want sorting in place, we now have to copy the contents of half1 and half2 back into the array a, making sure that they keep numerical order. So after this, a will have the same contents it initially held, but in sorted order.

The merge part of the algorithm involves having three variables as indexes to the three arrays a, half1 and half2. We will use the names i, j and k respectively for these. At any point in the merging, i will index the cell in a into which the next integer from either half1 or half2 will be put, while j and k will index the integers which are being considered for putting into a. Note that what is in a before the merging, in other words the original integers, is irrelevant at this point and plays no part in the merging. Since half1 and half2 are in order, j and k can both start off at 0 and be increased by 1 each time they index the lowest number next to be put in a.

So here is our initial position (the cells in a are shown blank because what is in them is irrelevant):

At this point it can be seen that half2[k] is less than half1[j] so half2[k] is put into a[i] and i and k are increased by one, giving:

Now half1[j] is less than half2[k], so half1[j] is put into a[i], and i and j are increased by one, giving:

The lines across the diagrams of the arrays are added, similar to the way we added lines across the diagram of the array when demonstrating the the selection sort example, to show that i can be considered as the index dividing a into the part to which the sorted integers have been written and the part to which sorted integers are going to be written, while j and k divided half1 and half2 into the parts from which integers have alresdy been copied into a and the parts from which integers are going to be copied into a.

The next step also takes the next integer from half1, giving:

This demonstrates it is not the case, as might naïvely be assumed, that integers are taken alternatively from half1 and half2.

Now we have a choice between putting 7 and 8 as the next integer into a, and 7 is the lowest of these, so that gives us:

Next we put in the 8, giving:

But now we are in the situation where there is no half1[j] to compare with half2[k], because j has been increased to the point where it is equal to half1's length, so it no longer indexes a cell. At this point, the main merge loop stops, but an extra loop is needed to copy the remainder of half2 into a. So the point at which the main merge loop halts is when either j is the equal to length of half1, or k is equal to the length of half2.

The loop invariant in the merge loop is that each integer that occurs in the array half1 from half1[0] up to but not including half1[j], and each integer that occurs in the array half2 from half2[0] up to but not including half2[k], also occurs in the section of the array a from a[0] up to but not including a[i], and this section of the array a is ordered in numerical ascending order.

Code for merge sort

Having gone through the merge part of the algorithm in detail, we can now write code for it. Putting it with the previous code, gives the following complete code for sort to give sorting in place implemented by the merge sort algorithm:
 public static void sort(int[] a)
 // Sort the contents of array a in ascending numerical order
 {
  if(a.length>1)
     {
      int i,mid = a.length/2;
      int[] half1 = new int[mid];
      int[] half2 = new int[a.length-mid];
      for(i=0; i<mid; i++)
         half1[i]=a[i];
      for(; i<a.length; i++)
         half2[i-mid]=a[i];
      sort(half1);
      sort(half2);
      int j=0, k=0;
      for(i=0; j<half1.length&&k<half2.length; i++)
         if(half1[j]<half2[k])
            {
             a[i]=half1[j];
             j++;
            }
         else
            {
             a[i]=half2[k];
             k++;
            }
      for(; j<half1.length; i++, j++)
         a[i]=half1[j];
      for(; k<half2.length; i++, k++)
         a[i]=half2[k];
     }
 }
The code for this method could be put in the place of the code for the sort method using selection sort developed previously to give a complete class with a main method to test it, and the program would compile and appear to run exactly the same. As the method has the same header, it is called in the same way, and as it sorts the integers in the array in ascending order in place, it has exactly the same effect when execution of the method concludes. But in the computer, the sorting is done in a completely different way. This shows that you can have more than one algorithm to solve the same problem.

Constructive merge sort

It is very easy to convert the merge sort method we have developed into one that is constructive rather than destructive. All that is needed is for a new array of length equal to the original array to be constructed, then the numbers from the arrays resulting from the recursive calls are merged into this new array rather than back into the space of the original array. The recursive calls will return new arrays rather than sort in place the arrays resulting from breaking the original array, but the local variables can be assigned to refer to these new arrays, so with half1=sort(half1) takes the place of sort(half1) for example.

Below is the code for a constructive sort method using the merge sort algorithm, contained inside a class which includes a main method so that it can be demonstrated. The main method shows that the original array is unchanged by the sorting:

class SortIntArrayRec2
{

 public static void main(String[] args)
 throws IOException
 // Illustrates constructive merge sort to sort an array of integers
 {
  int n;
  int[] array, array1;
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  System.out.print("Enter number of integers : ");
  n = Integer.parseInt(in.readLine());
  array = new int[n];
  for(int i=0; i<n; i++)
     {
      System.out.print("Enter integer "+i+": ");
      array[i] = Integer.parseInt(in.readLine());
     }
  System.out.println("The array you entered is:");
  System.out.println(arrayToString(array));
  array1 = sort(array);
  System.out.println("After sorting, the array is:");
  System.out.println(arrayToString(array));
  System.out.println("The new array is:");
  System.out.println(arrayToString(array1));
 }

 private static String arrayToString(int[] a)
 {
  String str="[";
  if(a.length>0)
     {
      str+=a[0];
      for(int i=1; i<a.length; i++)
         str+="|"+a[i];
     }
  return str+"]";
 }

 public static int[] sort(int[] a)
 // Sort the contents of array a in ascending numerical order
 // and return in a new array
 {
  if(a.length>1)
     {
      int i,mid = a.length/2;
      int[] half1 = new int[mid];
      int[] half2 = new int[a.length-mid];
      for(i=0; i<mid; i++)
         half1[i]=a[i];
      for(; i<a.length; i++)
         half2[i-mid]=a[i];
      half1=sort(half1);
      half2=sort(half2);
      int j=0, k=0;
      int[] b = new int[a.length];
      for(i=0; j<half1.length&&k<half2.length; i++)
         if(half1[j]<half2[k])
            {
             b[i]=half1[j];
             j++;
            }
         else
            {
             b[i]=half2[k];
             k++;
            }
      for(; j<half1.length; i++, j++)
         b[i]=half1[j];
      for(; k<half2.length; i++, k++)
         b[i]=half2[k];
      return b;
     }
  else
     {
      int[] b = new int[1];
      b[0]=a[0];
      return b;
     }
 }

}
Actually, this code does some unnecessary copying, but we'll leave off considering that until we have covered binary search.
Matthew Huntbach

Last modified: 8 August 2001