Searching arrays

The array search problem

A common problem is to find whether an item with a particular property occurs in an array. Suppose, for example, we had an array of records of cars and we wanted to search it to see if it had a record for a car with a particular registration number. In this case, the registration number is used as a key to retrieve a complete object containing more complex data.

In order to concentrate on the basics of searching an array, just as we looked at sorting arrays of integers, so we shall look at searching an array of integers to see of one of its cells contains a particular integer.

Linear search

If the integers in the array do not occur in any particular order, there is no choice but to look at each one until either we have found the item we are looking for, or we have looked at every item in the array. Here is a method which does this (using searching for a particular integer in an array of integers as an illustration of the general problem):

public static boolean search(int n,int[] a)
// Return true if integer n is found in array a, false otherwise
{
 int i;
 for(i=0; i<a.length&&a[i]!=n; i++) {}
 return i<a.length;
}

That method returned a boolean, true if the integer was found, false otherwise. A variant on this method returns the index of the cell in the array where the integer is found. In this case, it needs to return some special value, which must be an integer as the array index it returns is an integer, if the integer being searched for is not in the array. By convention, -1 is often used as the special value returned from a method which would otherwise return a non-negative integer, to indicate a special circumstance. Here is a method that does this:

public static int search(int n,int[] a)
// Return the index of the first occurrence of n in array a
// or -1 if n doesn't occur in array a
{
 int i;
 for(i=0; i<a.length&&a[i]!=n; i++) {}
 if(i==a.length)
    return -1;
 else
    return i;
}

The general pattern is the same in both cases. Note the use of a for loop which has an empty body: all that is required is for i to be increased until either it is equal to the length of the array, or it indexes a cell in which the desired number is found. The short-circuit effect of the && is useful here, when we check i<a.length&&a[i]!=n if i is not less than a.length we know the whole expression is false, we don't go on to check a[i]!=n, which is just as well as since with i having the value a.length, the cell a[i] does not exists, and attempting to access it will cause an ArrayIndexOutOfBoundsException to be thrown.

Note in the first method we write return i<a.length; which we can do because i<a.length is a boolean value and the method returns a boolean. There is no need to write the longer if(i<a.length) return true; else return false;.

Here is a class incorporating the search method plus a main method that can be used to test it. The main method generates an array of random numbers, and you can specify what range the numbers will fall in - they will fall evenly in the range specified.

import java.io.*;
import java.util.*;

class SearchIntArray2
{

 public static void main(String[] args)
 throws IOException
 // Tests searching an unsorted array of integers to find the
 // position of a given integer
 {
  int n,max,num,pos;
  int[] array;
  Random rand = new Random();
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  System.out.print("Enter number of integers: ");
  n = Integer.parseInt(in.readLine());
  array = new int[n];
  System.out.print("Enter highest integer: ");
  max = Integer.parseInt(in.readLine());
  for(int i=0; i<n; i++)
     array[i] = (int)(rand.nextDouble()*max)+1;
  System.out.println("The array is:");
  System.out.println(arrayToString(array));
  System.out.print("Enter the number you want: ");
  num = Integer.parseInt(in.readLine());
  pos = search(num,array);
  if(pos==-1)
     System.out.println("The number is not in the array");
  else
     System.out.println("The number is at position "+pos);
 }

 private static String arrayToString(int[] a)
 // Return string representation of array a
 {
  String str="[";
  if(a.length>0)
     {
      str+=a[0];
      for(int i=1; i<a.length; i++)
         str+="|"+a[i];
     }
  return str+"]";
 }

 public static int search(int n,int[] a)
 // Return the index of the first occurrence of n in array a
 // or -1 if n doesn't occur in array a
 {
  int i;
  for(i=0; i<a.length&&a[i]!=n; i++) {}
  if(i==a.length)
     return -1;
  else
     return i;
 }

}

Executing this program is not time-consuming, since unlike the sort algorithms this is not O(n²) or even O(n log n). In the worst case, the program will have to make n comparisons before concluding the integer is not in the array, so the problem is O(n). A naïve analysis might suggest on average you would have to look at ½n cells to find n, but this would only be so if we guaranteed that n occurred exactly once in the array with an equal probability of any position. If the numbers in array are taken from the range 1 to m, and we are searching for a number in that range, the likelihood that we find it in the first cell is, of course 1/m. The likelihood that the first occurrence of the number is in the second cell is ((m-1)/m)×(1/m), i.e. the chance that it is not in the first cell multiplied by the chance of finding it in any particular cell. The chance that it is not in any of the n cells is ((m-1)/m) ⁿ.

Suppose we have an array of a million integers each chosen randomly from the range 1 to a million, what is the chance that if we pick a number in that range, it won't be found in the array? As it happens, the value of ((n-1)/n) ⁿ approaches 1/e (where e is the mathematical constant 2.71828... etc) as n increases, and by the time we get to a million, it's pretty close. So the chance is actually a little over a third.

This is not a maths course, so you won't have to prove that, but here's a program to test it out:

import java.io.*;
import java.util.*;

class SearchIntArray3
{

 public static void main(String[] args)
 throws IOException
 // Tests searching for an integer in an array of integers
 // Can be used for large numbers, includes a time check
 {
  int n,max,num;
  boolean flag;
  int[] array;
  long time1,time2;
  Random rand = new Random();
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  System.out.print("Enter number of integers: ");
  n = Integer.parseInt(in.readLine());
  array = new int[n];
  System.out.print("Enter highest integer: ");
  max = Integer.parseInt(in.readLine());
  for(int i=0; i<n; i++)
     array[i] = (int)(rand.nextDouble()*max)+1;
  System.out.print("Enter the number you want: ");
  num = Integer.parseInt(in.readLine());
  time1 = new Date().getTime();
  flag = search(num,array);
  time2 = new Date().getTime(); 
  if(flag)
     System.out.println("The number IS in the array");
  else
     System.out.println("The number is NOT in the array");
  System.out.println("Time difference is: "+(time2-time1)+"ms");
 }

 public static boolean search(int n,int[] a)
 // Return true if integer n is found in array a, false otherwise
 {
  int i;
  for(i=0; i<a.length&&a[i]!=n; i++) {}
  return i<a.length;
 }

}

This differs from the previous code in including a time check, but not displaying the contents of the array, so you can use it for very large collections of numbers, and check the time taken to find whether a particular number is in a (randomly-generated) collection.

Try this program with a million numbers each in the range of one to a million. The amount of store available for Java may vary, so the biggest array you can try this on may vary, if you go above the limit you will get an OutOfMemoryError being thrown. Note that new Date().getTime() returns a long value which is the current time in terms of milliseconds since midnight on 1st January 1970. So calling it twice, once at the start of the search then again at the end, gives us an estimate of the time taken. It's a crude estimate, but in this case sufficient to illustrate that the search isn't instantaneous, with a million numbers you will find a few milliseconds difference. It's fast enough though that, unlike our sorting programs, the time taken to deal with a million numbers is so short that you wouldn't notice the delay when running the program and waiting for the result. You might notice a slight delay while it's generating a million random numbers, however.

Binary Search

Suppose you know the integers in the array you're searching are in ascending numerical order. Then you know you can stop searching and conclude the number you're looking for isn't in the array when, starting at the beginning, you look at each one and come across one greater than the number you're searching for. All the rest of the numbers you haven't looked at yet must be even greater. Below is a class that includes a method for searching based on this insight. There is also a main method which generates an array of integers distributed randomly in a range given by the user but in ordered position. The ordering of the numbers is done by one of the sort methods provided in the Arrays class of the Java library, demonstrating that while we have looked at sorting arrays of integers as an exercise in understanding algorithms, we don't really need to write a method to do it as one is already provided s part of Java.

import java.io.*;
import java.util.*;

class SearchIntArray4
{
// Demonstrate naive linear search of ordered array of integers

 public static void main(String[] args)
 throws IOException
 {
  int n,max,num;
  int[] array;
  Random rand = new Random();
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  System.out.print("Enter number of integers: ");
  n = Integer.parseInt(in.readLine());
  array = new int[n];
  System.out.print("Enter highest integer: ");
  max = Integer.parseInt(in.readLine());
  for(int i=0; i<n; i++)
     array[i] = (int)(rand.nextDouble()*max)+1;
  Arrays.sort(array);
  System.out.print("Enter the number you want: ");
  num = Integer.parseInt(in.readLine());
  if(search(num,array))
     System.out.println("The number IS in the array");
  else
     System.out.println("The number is NOT in the array");
 }
 
 public static boolean search(int n,int[] a)
 // Return true if integer n is found in array a, false otherwise
 // Assumes array a is in ascending numerical order
 {
  int i;
  for(i=0; i<a.length&&a[i]<n; i++) {}
  return (i<a.length&&a[i]==n);
 }
}

You still have to check that i is less than the length of the array, just in case the number you're searching for is greater than the last number (and hence all the others) in the array.

But this is still a very stupid way of searching the array. Consider the case of searching for a name in a telephone directory. The above method is like starting on the first page and reading every single name until either you come to the end of the directory (unlikely unless the name you're looking for begins with 'Z', but programmers have to be careful to include unlikely but possible cases), or you come across either the name you want or one which comes past it in alphabetical order.

What you actually do is open the telephone directory somewhere near the middle. If you're lucky, the name you want is on the page you opened it at. Otherwise, if the name you want comes before the names on that page alphabetically, you repeat the search process on the pages currently in your left hand. And if the name you want comes after the names you've found, you repeat the search process on the pages currently in your right hand. So there we are - we have described search in terms of search, searching for a name in a telephone directory is naturally done using recursion. This is called binary search

The algorithm is something like the merge sort algorithm, where we cut the problem in half each time. Howver, in that case we then worked on both halves. In this case, we only work on one half. A naïve solution might create a new array of half the size of the old one, copy the half of the old array into it, and apply the search method recursively. Here's the code for doing that:

public static boolean search(int n,int[] a)
// Naive recursive method to search for integer n in array a
// Binary search, but does unnecessary copying
{
 if(a.length==0)
    return false;
 else
    {
     int mid=a.length/2;
     if(a[mid]==n)
        return true;
     else if(a[mid]>n)
        {
         int b[] = new int[mid];
         for(int i=0; i<mid; i++)
            b[i]=a[i];
         return search(n,b);
        }
     else
        {
         int b[] = new int[a.length-mid-1];
         for(int i=0; i<b.length; i++)
            b[i]=a[i+mid+1];
         return search(n,b);
        }
    }
}

It is naïve because actually we don't need to go to the effort of copying half the array. We can just use the same array in our recursive call, but just pass to it limits telling it which portion of the array to search. Here's the code for this binary search without copying:

public static boolean search(int n,int[] a)
// Return true if integer n is found in array a, false otherwise
{
 return search(n,a,0,a.length);
}

private static boolean search(int n,int[] a,int from,int to)
// Return true if integer n is found in the section of array a
// starting at a[from] and up to and including a[to-1].
// Return false otherwise
{
 if(from==to)
    return false;
 else
    {
     int mid = (from+to)/2;
     if(a[mid]==n)
        return true;
     else if(a[mid]>n)
        return search(n,a,from,mid);
     else
        return search(n,a,mid+1,to);
    }
}

Note, the public version of the search method does not have these extra parameters giving the section of the array being searched. It would just add an unnecessary complication if the user had to put in 0 and a.length as extra parameters when finding whether an integer occurs in array a. It would also mean we couldn't just replace the old search method with the new one without altering any code that called it. So all the work in searching is done by this subsidiary private method, also called search, but we can have two methods with the same name in Java so long as they different in the number or types of their arguments. The public method just contains an initial call to the private method which sets the extra parameters as necessary.

If you replace the code for search in the program above which includes the time checks with the code here, you will find that even with a million integers in the array, hardly any time is taken to search it. The reason for this is that each recursive call divides the array into half. A million can only be divided into half twenty times before it gets to one. So this algorithm is O(log n), because the number of significant operations it does (i.e. checking a number in the array against the nunber we are searching for) is equal to the logarithm to base 2 of the number of items in the arry.

Note that the same technique used here can be used to get a more efficient version of the constructive merge sort discussed previously. Here is the more efficient code which can be put in the place of the constructive sort method:

public static int[] sort(int[] a)
// Sort the contents of array a in ascending numerical order
// and return in a new array
{
 return sort(a,0,a.length);
}

private static int[] sort(int[] a,int from,int to)
// Sort the contents of array a from position 'from' up to but
// not including position 'to' in ascending numerical order
// and return in a new array
{
 if(to-from>1)
    {
     int i,mid = (from+to)/2;
     int[] half1=sort(a,from,mid);
     int[] half2=sort(a,mid,to);
     int j=0, k=0;
     int[] b = new int[to-from];
     for(i=0; j<half1.length&&k<half2.length; i++)
        if(half1[j]<half2[k])
           {
            b[i]=half1[j];
            j++;
           }
        else
           {
            b[i]=half2[k];
            k++;
           }
     for(; j<half1.length; i++, j++)
        b[i]=half1[j];
     for(; k<half2.length; i++, k++)
        b[i]=half2[k];
     return b;
    }
 else
    {
     int[] b = new int[1];
     b[0]=a[from];
     return b;
    }
}

In the old version of constructive merge sort, the integers were copied into two new arrays, half1 and half2, and these were sorted before being merged into another new array b. But the copying of the original data into new arrays was unnecessary, because in this contructive sort, the original data is not altered. Therefore, we can safely pass the original array to the recursive calls, in the one hand setting the limits to sort from the start to the mid-point, on the other from the mid-point to the end. With further recursive calls, the range is divided up into smaller portions, until the base case is when it is divided up into portions of length 1, and only then are new arrays created to build up the output array.

Note, however, that the efficiency gains made here are a constant factor, not a change in the order of the efficiency of the algorithm. The new version is still O(n log n).

Matthew Huntbach

Last modified: 16 January 2003