Iteration and Recursion: the Powers Example

What is an Algorithm?

An algorithm (the word is derived from the name of the scholar Mohammad bin Musa al-Khawarizmi) is a clearly defined set of steps to perform some task. A recipe in a cookery book is an example of an algorithm, it involves a set of steps which tells you how to create some dish. Recipes take some food items, and tell you how you can perform operations on them to construct your finished dish. The instructions will be simple ones which it is assumed you understand how to do without further explanation, like "add an egg", "cook at gas mark 4 for 30 minutes" and so on.

Algorithms on computers will take some data and perform operations on them to produce some final data which is the answer to some problem. If the data consists of a collection of data items, we also have the related issue of how the data is structured, that is data structures. An algorithm is not a program - it will generally be expressed in terms of operations which we humans understand can be done on computers. Converting those instructions to something written in a programming language is another task. So an algorithm is a concept that can be translated to any programming language.

Where do algorithms come from?

If we want to solve a problem on a computer, we must first find an algorithm to do it. Having found an algorithm, we must write a computer program that implements that algorithm.

Computer scientists have spent a lot of time working out algorithms. One of the most common things that is done on a computer, for example, is to sort some data in some order. No computer programmer ought to sit down and work out from scratch how to sort data. He or she could go to a book on algorithms and find a sort algorithm there. Actually sorting is so common that any good programmer ought to know several sorting algorithms, but there is nothing wrong with looking up reference books (or web sites) for other algorithms if you are asked to solve a problem and you believe someone else has worked out how to solve that problem or something similar before.

Your reference will probably not tell you how to solve exactly the same problem as you have. For example, if you wanted to sort a list of student records in order of the birth date of the student, you wouldn't find an algorithm telling you how to sort a list of student records. You would find an algorithm telling you how to sort a list in general. One of the steps in that algorithm would involve comparing two of the items in the list being sorted. If it were a list of numbers, it would involve checking whether one number is less than another, with your student records it would involve checking whether the birth date in one student record is after the birth date in another. This idea of generalising, so that we express something in a general way and can fit it into a variety of different situations as needed is known as abstraction, and is one of the key concepts in computer science.

It is important as a computer scientist to understand how algorithms work so that you can make good use of them. You may also reach a point where you have a problem to solve that hasn't already been researched and so you need to devise an algorithm. You will need to appreciate that often there is more than one algorithm to solve a problem, but one algorithm may be better than another in that it takes less time, or less computer memory to solve the problem. Some would say that the study of algorithms is the most fundamental aspect of computer science, the spirit of computing.

Dividing problems into sub-problems

So, given a problem, how do we solve it? One way is to divide the problem into smaller problems and solve those. It may be that we can solve the subproblems separately and then put their answers together to get the solution to the whole problem. This is concurrent problem solving, and one reason it is interesting (not one we will have time to explore though) is because it opens up the possibility of solving the subproblems on separate computers all at the same time (parallel processing) thus saving time. In other cases it may be that the solution of one subproblem is needed as the data to start solving another subproblem, in this case we have no alternative but to solve one subproblem after the other has finished (sequential problem solving).

Of course, if you have divided a problem into subproblems, you still have to work out how to solve those subproblems. It may be the case that you can go to your algorithms reference and find algorithms that solve each of your subproblems. Or you may have to break a subproblem up into smaller pieces (subsubproblems?) and continue in that way. In a recipe book, you will often find something similar. In one recipe you may find a reference to another recipe, for example in a recipe for lasagne, you will probably find it telling you to make a cheese sauce as described elsewhere in a recipe for cheese sauce. This is obviously helpful: we save space by having just the one recipe for cheese sauce which is referred to in all those other recipes that involve cheese sauce, rather than copy out the same instructions for making cheese sauce time and time again. It also helps us understand recipes and maybe gives us the confidence to try them if we can see how they are made up out of other recipes we already know.

In programming we talk of bottom-up and top-down programming. Bottom-up programming is when we put together solutions to small problems we already know to make a solution to a big problem. Top-down programming is when we divide a big problem into smaller problems and carry on dividing it up until we get to problems which we know how to solve, or are so simple that we don't have to think any further how to solve them.

In computer programming terms, both bottom-up and top-down programming mean that we don't make the code to solve a problem just one big complicated method (in the object-oriented paradigm of Java we talk of "methods", in other languages we might talk of "functions", "procedures" or "subroutines", the basic idea is a separate named piece of code). Rather we will have it broken up into smaller simpler methods. Dividing code into logically connected separate parts like this will often make it easier to understand, just like it was easier to understand the recipe for lasagne when we saw that one part of it called on you to use the recipe for cheese sauce.

When devising an algorithm to solve a problem, we will quite often find the subproblems we divide it into are versions of the main problem. The interesting thing is that we then have a complete solution to our problem, we don't have to do any more work in devising one. In programming terms, this involves a method "calling itself", we don't have to do any more coding once we have got this in place. Newcomers to programming often find this extremely mysterious and hard to understand, maybe it isn't helped by the old joke about recursion, which is the name used for this technique of solving a problem by breaking into down into subproblems of the same type and solving those in the same way. We will leave off discussing this important topic and why and how it works until later.

Iterative Algorithms

Iteration means repeatedly doing something. In computing terms it must involve some state in the computer, and there will be a condition on the state which is tested each time the something is done to it, giving the point at which the iteration should stop. Many algorithms are expressed using iteration. Here, for example, is an algorithm for evaluating n^m where m is a non-negative integer:

Start off with store locations acc initially set to 1 and count initially set to 0.
While count is not equal to m, multiply acc by n and increase count by 1.
The value of acc is now n^m

A state refers to some setup in the computer which can be changed, here just two variable acc and count. The change made in the iteration must bring the state closer to the condition which causes the iteration to stop, otherwise it will never halt. Here the change is to increase the value of count by 1, which will bring it closer to the value of m, the stop condition being when it is no longer less than m.

We can convince ourselves informally that this will work by noting that as count is increased by 1 on each iteration, there will be exactly m iterations. On each of these iterations acc which is initially 1 is multiplied by n, so after m iterations, it must have the value 1×n×... ×n, with m × signs, which is the definition of n^m.

A more formal way of convincing ourselves this will work, goes as follows. After each iteration, we can say acc×n^(m-count)=n^m. This is because if acc×n^(m-count)=n^m, then (acc×n)× n^{(m-(count+1))}=n^m. So what happens in the iteration makes no difference to the equation, since in effect we multiply the left hand side by n but also divide it by n (remember n^{(m-(count+1))} is n×...×n with one less n than n^(m-count)). We know at the start that acc× n^(m-count)=n^m, since with acc initially 1 and count initially 0, that is just equivalent to n^m= n^m. Iteration finishes when count becomes equal to m, but it will still be the case then that acc×n^(m-count) =n^m. Now with count having the value m, and of course n ⁰ being equal to 1, the equation reduces to acc=n^m.

The condition acc×n^(m-count) =n^m is known as a loop invariant. That is something that if it is true before each iteration, is also true after it. Then if we can show it is true before we started all the iterations, we know it must be true when they have all finished. We know also that at that point the condition that was tested before making each iteration must have become false, and with this we can show the result we want must be true.

Recursive Algorithms

Now we developed an iterative algorithm to calculate n^m, we can also consider a recursive algorithm. One thing to note is that n⁰ is 1. So if m is equal to 0, the algorithm does nothing, it just returns 1. But note also that n^m is equal to n×n^m-1. So this gives us our recursive algorithm:

If m is 0, just return 1
Otherwise get the result of the algorithm applied recursively with m set to the old m less 1.
Multiply the result by n.

Our equation tells us this should work. One way of understanding how it works is to note that with iteration there is just one state, one set of variables, which are altered continuously. But with recursion, each time the algorithm is invoked, there is a new state. The new state in a recursive call has its own variable m which is given a value one less than the variable m in the call to the algorithm that made the recursive call. Actually, it also has its own variable n, which has the same value as the n in the call that made the recursive call, but is not the same variable (in computer terms, it will occupy a different place in the computer's memory). Our recursive call to evaluate n^m makes a recursive call to evaluate n^m-1, and multiplies that by n, the recursive call makes a further recursive call to evaluate n^m-2 and multiplies that by n and so on. Eventually we will hit the case where we are evaluating n^m-m which is n⁰, which is 1, so we are sure that the recursion will come to an end. We will then have m multiplications of n, which gives us our n^m.

However, a simpler way of convincing ourselves this works is just to note that if the call to evaluate n^m-1 works, then we know the call to evaluate n^m returns n×n^m-1 which is the correct answer. So we take it on trust that the recursive call works, rather than think about chains of recursion. The important thing is that we are sure that each recursive call takes us closer to the base case where there is no recursion, in this case evaluating n⁰ which immediately gives us 1.

Formally, we use the principle of induction to prove that a recursive algorithm works. The idea is that we assume the algorithm works for an arbitrary value k, then prove that with this assumption, it works for k+1. Then we show it works for 0, and if it works for 0, it must work for 1 (since if it works for k, we have shown it works for k+1), if it works for 1 it must work for 2 and so on for ever. We know that n^k+1 is n×n^k both mathematically and as given in this algorithm, so by induction the algorithm works.

We can note that this recursive algorithm is going to require a lot more store than the iterative one, because the iterative one just has one state which it changes, while the recursive one creates new states on every recursive call. That, along with the book-keeping necessary to keep track of the repeated method calls, is why algorithms involving recursion are often less efficient than algorithms involving iteration.

Efficient Recursive Algorithms

A more efficient recursive algorithm to calculate n^m involves noting that as n^m is n×n×...×n with m multiplications, it must be n×n×...×n with n÷2 multiplications multiplied by itself. More formally, since n^a×n^b is n^a+b then n^m must be n^m÷2×n^m÷2, or (n^m÷2)². However, if we are using integer division where m/2 is half of m if m is even, but ½ less than a half of m if m is odd, then we need to use the following algorithm to calculate n^m:

If m is 0, just return 1
Otherwise, get the result of the algorithm applied recursively with m set to the old m/2.
If the old m is even, return the square of the result of the recursive call, otherwise return the square of the result of the recursive call multiplied by n.

To consider why this is more efficient, both recursive algorithms make one recursive call and perform one multiplication. But the first one reduces the parameter of the algorithm by one on each recursive call until it reaches 0, while the second one halves it. Obviously, you are going to get to 0 much quicker by repeatedly halving a number (rounding down to the nearest whole number) than by repeatedly reducing it by one.

How many times can you halve a number rounding down fractions, until you get to 0? Consider dividing by 10 in a similar way. You can do it once with a one-digit number (decimal digits), twice with a two-digit number, three times with a three-digit number and so on. This is the logarithm of the number to the base 10. The logarithm of a number a to the base b, which is written log_ba is defined as c where a=b^c. Similarly, the number of times you can halve a number m until you get to zero is log₂m. In fact, when considering the efficiency of algorithms, the base of the logarithm used generally isn't considered important, since there is a constant relationship between the logarithm of a number to one base and its logarithm to another, for any numbers a, b and c, log_ca can be obtained from log_ba by dividing it by log_bc.

Code for the algorithms

Here is the Java code for the algorithm to calculate n^m iteratively, together with a main method that enables you to demonstrate the algorithm directly from a Linux call:

import java.io.*;

class PowerIt1
{
 public static void main(String[] args)
 throws IOException
 {
  int n,m;
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  System.out.print("Enter number n : ");
  n = Integer.parseInt(in.readLine());
  System.out.print("Enter number m : ");
  m = Integer.parseInt(in.readLine());
  System.out.print(n+" to the power of "+m+" is: ");
  System.out.println(power(n,m));
 }

 public static int power(int n, int m)
 // Calculates n to the power of m iteratively
 {
  int acc=1, count=0;
  while(count!=m)
     {
      acc=acc*n;
      count=count+1;
     }
  return acc;
 }
}

Please note, the important thing here is the method called power. As it is a public static method in the class PowerIt1 it can be called directly from other code using PowerIt1.power(a,b), where a and b are any expressions that evaluate to integers (actually b must evaluate to a non-negative integer).

IMPORTANT

The method called main is not part of the implementation of the algorithm, it is just there so that in Linux you can type:

java PowerIt1

(assuming you have already compiled the class, producing the file PowerIt1.class) and get a little dialogue that runs a call of the static method power. Please note, for example, that you do not need to know what the line in the method main which reads

BufferedReader in = new BufferedReader(new InputStreamReader(System.in));

does. For the purpose of this course, you do not need to know what a BufferedReader is. If this was a course on the intricacies of the Java language you might need to know that, but it isn't, it's a course on Algorithms and Data Structures. It's important that you note this, since as this course goes on you will see some other examples of code which is written purely to enable things to be read from or written to the screen in order to run other code that illustrates various algorithms and data structures. Please don't be worried or confused by or think you have to learn or memorise any code I give you which is purely there in order to help you run the examples by giving a simple text-based interface.

If you want to see some code which takes its input from a separate dialog box rather than from the console window, see here.

***********************************************************************

Here is the code for the first recursive algorithm to calculate n^m:

import java.io.*;

class PowerRec1
{
 public static void main(String[] args)
 throws IOException
 {
  int n,m;
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  System.out.print("Enter number n : ");
  n = Integer.parseInt(in.readLine());
  System.out.print("Enter number m : ");
  m = Integer.parseInt(in.readLine());
  System.out.print(n+" to the power of "+m+" is: ");
  System.out.println(power(n,m));
 }

 public static int power(int n, int m)
 // Calculate n to the power of m using recursion
 {
  if(m==0)
     return 1;
  else
     return n*power(n,m-1);
 }
}

Note that the main method is exactly the same as previously, as it has nothing to do with the algorithm, it just exists to call the method power. Of course, here as previously, the line beginning // has nothing to do with the algorithm either, it's just comment, that is an annotation added to point out to the human reader of the code what it is for. The code for the more efficient recursive algorithm to calculate n^m is:

public static int power(int n, int m)
// Efficiently calculate n to the power of m using recursion
{
 if(m==0)
    return 1;
 else
    {
     int m2=m/2;
     int p=power(n,m2);
     if(m%2==0)
        return p*p;
     else
        return p*p*n;
    }
}

You could put this in a file, identical to the ones above except for the class name, in the place of the power methods above to get a program which demonstrates using the efficient recursive algorithm.

Note that it isn't essential to use recursion to get a more efficient algorithm than our initial one which performs m multiplications. Here is the code which implements an efficient iterative algorithm to calculate n^m:

public static int power(int n, int m)
// Efficiently calculates n to the power of m iteratively
{
 int pow=n, acc=1, count=m;
 while(count!=0)
    {
     if(count%2==1)
        acc=acc*pow;
     pow=pow*pow;
     count=count/2;
    }
 return acc;
}

While based on the same idea as the efficient recursive algorithm, it is not so obvious why this one works. This illustrates the advantage of using recursion to derive algorithms. Recursive algorithms flow naturally from breaking a problem into subproblems of the same type. So it is often easier to derive a recursive algorithm than an iterative one.

Note that if you run any of the code here, you won't actually notice any efficiency differences. This is because the amount of computation required in all of them is so small that it appears to be done instantaneously, and none of them take up enough computer memory to make any visible difference.

Saving method call values

Let us look again at the first (inefficient) recursive method for calculating n^m:

 public static int power(int n, int m)
 {
  if(m==0)
     return 1;
  else
     return n*power(n,m-1);

The method call power(n,m-1) is used directly as part of a further expression. We could have put its value into a local variable which exists temporaily to hold it:

 public static int power(int n, int m)
 {
  if(m==0)
     return 1;
  else
     {
      int p=power(n,m-1);
      return n*p;
     }
 }

but there would have been no point in this. Sometimes if expressions are complicated it helps make the code look more understandable to break them up into parts and use temporary variables to hold the values of the parts before they are put together to make the final value. In this case, however, declaring this temporary variable called p which exists just to hold the result of the call power(n,m-1) and pass it to be multiplied by n in the return statement return n*p is pointless. It just makes the code look more complex. It is important to note that it has no effect on efficiency, in fact the Java compiler will quite likely compile both version to exactly the same Java byte code. It's just a matter of taste when and when not to introduce a temporary variable if all that variable does is to hold a value to be used once elsewhere.

The situation is completely different though, if we have a variable which holds a value of a method call which is then used more than once. In the code for the efficient recursive method for calculating n^m the variable m2 is used just to hold the value of m divided by 2, it was not really needed, we could just as well have written the code as:

 public static int power(int n, int m)
 {
  if(m==0)
     return 1;
  else
     {
      int p=power(n,m/2);
      if(m%2==0)
         return p*p;
      else
         return p*p*n;
     }
 }

that is, using m/2 directly where it is needed. It would have made no difference to the amount of calculation that takes place. However, suppose we decided not to have a separate variable p to store the result of evaluating power(n,m/2) but instead used this method call just where the value was needed, as in:

 public static int power(int n, int m)
 {
  if(m==0)
     return 1;
  else
     {
      if(m%2==0)
         return power(n,m/2)*power(n,m/2);
      else
         return power(n,m/2)*power(n,m/2)*n;
     }
 }

Here each call of power(n,m/2) will be done separately: the Java execution mechanism will not "remember" that it has already evaluated power(n,m/2), it will go ahead and completely recalculate it, even though it will end up with exactly the same value. You might suppose that this would double the amount of work done, but if you do, think a little more carefully. Each call of power(n,m/2), unless m/2 is 0, will make two calls of power(n,m/4), so in all there is a total of four calls of power(n,m/4). By a similar argument, there are eight calls of power(n,m/8), sixteen calls of power(n,m/16) until the division of m (rounding down) has got down to 0. If m is large, there will be a huge increase in the amount of computation done.

As you can see, if a method has two recursive calls, there is an "explosion" of recursive calls. In some algorithms, this can't be helped, but here since the two recursive calls have the same argument, obviously the explosion can be removed by making one recursive call and saving the result.

Matthew Huntbach

Last modified: 21 January 2003