Implementation and application of abstract data types: Sets using Arrays, part 2

A simple set manipulation program

In our previous example, we showed how a set of integers could be represented by an abstract data type. The advantage of this was that there was a clear separation between the application of sets and the implementation of the type (which we called IntSet1 etc). A program could be written which makes use of the type, but the person writing the program need not know what is in the file that actually provides the type. The person writing this application program need not worry about keeping the data representing a set in correct format, since the only way of manipulating that data is through the public methods provided by the implementation. The implementation has to be written so that the methods are guaranteed to perform correctly and always leave data in a correct format. The actual representation of the type in terms of arrays and so on is declared as private in the implementation so it cannot be accessed by any code except the methods in the class of that type.

As well as clarity, another advantage of separating application from implementation code is that the implementation can be changed without having to go through the application code taking account of the changes. For example, here is another version of the simple application using sets we discussed previously:

import java.io.*;

class UseIntSets5
{
// Set maintenance program with separate set class
// Uses set class IntSet3

 public static void main(String[] args)
 throws IOException
 {
  int n=0;
  char ch;
  String line;
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  IntSet3 mySet = new IntSet3();
  do {
      System.out.print(": ");
      line = in.readLine();
      ch = line.charAt(0);
      if(line.length()>1)
         n = Integer.parseInt(line.substring(1).trim());
      switch(ch)
         {
          case 'q' : 
            break;
          case 'd' :
            mySet.delete(n);
            break;
          case 'a' :
            mySet.add(n);
            break;
          case 'm' :
            if(mySet.member(n))
               System.out.println("Is a member");
            else
               System.out.println("Not a member");
            break;
          case 'p':
            System.out.println(mySet);
            break;
          default:
            System.out.print("d - delete, a - add, m - member,");
            System.out.println(" p - print, q - quit");
            break;
         }
     }
  while(ch!='q');
 }

}

The only change made is that a new type for sets is used, which we call IntSet3. All we need to know here is that class IntSet3 provides the methods we need to implement sets: add, delete, member and toString, and these behave correctly according to how we want sets to behave.

A set implementation using ordered arrays

In our previous implementation of sets, the order in which the numbers in the set were actually stored in the array was the order in which they were added to the set. But in the implementation IntSet3 the numbers are stored in the array in ascending numerical order. This means a bit more effort is required to add new numbers to the set and delete numbers from it, but if we want to check whether a number is a member of the set, we can do so using the much more efficient binary search algorithm rather than have to go through the whole array one by one. This would be particularly useful in a case where in the application program most of the time we wanted to check whether an integer was a member of a set, and we rarely changed the membership of the set.

Here is this implementation:

class IntSet3
{
// Set class, sets stored as ordered partly-filled arrays

 private int[] array;
 private int count;
 final static int MAX=100;

 public IntSet3()
 {
  array = new int[MAX];
 }

 public void delete(int n)
 {
  int i;
  for(i=0; i<count&&array[i]<n; i++) {}
  if(i<count&&array[i]==n)
     {
      for(;i<count; i++)
         array[i]=array[i+1];
      count--;
     }
 }

 public void add(int n)
 {
  int i;
  for(i=0; i<count&&array[i]<n; i++) {}
  if(i==count||array[i]>n)
     {
      for(int j=count; j>i; j--)
         array[j]=array[j-1];
      array[i]=n;
      count++;
     }
 }

 public boolean member(int n)
 {
  int from=0, to=count;
  int mid=(from+to)/2;
  while(from!=to&&array[mid]!=n)
     {
      if(array[mid]>n)
         to=mid;
      else
         from=mid+1;
      mid=(from+to)/2;
     }
  return from!=to;
 }

 public String toString()
 {
  int i;
  String str="{";
  if(count>0)
     {
      str+=array[0];
      for(i=1;i<count;i++)
         str+=","+array[i];
     }
  return str+"}";
 }
}

Because numbers are now stored in order, if a new number is added, it can't just be added to the end of the used section of data. Instead, it is necessary to find the position where is should be added, which is before the first number in the set which is higher than it. Note we also have to allow for the possibility that the number being added is higher than any of the numbers already in the set. Having found the position in which the new number should be put, we have to "move up" all the numbers following it in the used portion of the array. We do this by starting with the highest one and copying it into the next cell in the array. Then its original position can be overwritten by copying the next highest one into it, and so on down to the number at the position where the new number is to be put. Then the new number is put in its place. Similarly, when a number is deleted, all those above it must be "moved down" one position in the array.

A set implementation using an array of booleans

To show how the separation of application and implementation allows us to change the implementation without changing the application, let us consider a completely different way of representing a set of integers. We will represent them by an array of booleans, where if the array is called a then if integer i is in the set, a[i] is true, otherwise a[i] is false.

This implementation is extremely efficient in terms of time, since each of the operations on it (apart from giving the string representation) requires just one operation. Adding the integer i to the set involves just setting a[i] to true, deleting the integer i from the set involves just setting a[i] to false, and testing whether i is a member of the set just involves returning the value of a[i]. Here is the code:

class IntSet4
{
 // Implementation of set of integers using array of booleans
 // Integers restricted to range 0 to MAX-1

 private boolean[] array;
 public static final int MAX=100;
 
 public IntSet4()
 {
  array = new boolean[MAX];
 }

 public void delete(int n)
 {
  array[n]=false;
 }

 public void add(int n)
 {
  array[n]=true;
 }

 public boolean member(int n)
 {
  return array[n];
 }

 public String toString()
 {
  String str="{";
  int i;
  for(i=0; i<MAX&&!array[i]; i++) {}
  if(i<MAX)
     {
      str=str+i;
      for(i++;i<MAX;i++)
         if(array[i])
            str+=","+i;
     }
  return str+"}";
 }
}

The disadvantage to this representation is that as arrays are of fixed size, and there has to be a cell in the array for each possible integer that could be in the set, it can only represent sets storing a limited range of integers. That is represented by the constant MAX in the code (where the range of integers that can be stored in the set is from 0 to MAX-1). To make the code above simple, there is no check on whether an integer added or deleted or tested for membership of the array is within the range, but that means a program using this representation crashes if an attempt is made to use an integer outside the range as an arguemnt to the set methods, wich for any real use is unacceptable. However, without any additional checks, the code for the set application above can be very simply changed to change the way sets are represented. Just change the line

IntSet3 mySet = new IntSet3();

IntSet4 mySet = new IntSet4();

Sets using fully filled arrays

We decided previously to represent sets by objects which have as their internal data structure an array of fixed size and a count saying how much of the array is currently in use. An alternative would be to use an array of the size of the number of items in the set. The advantage of this is that it does not waste memory. In the previous case, every set object had to have an array of the maximum size a set could be. In this alternative representation, only as much memory as is needed is taken. The disadvantage is that whenever an item is deleted from the set or a new item added to it, a new array has to be created to replace the old one and all the items in the old array have to be copied into the new one, which takes more time. So this new way of storing sets would be useful in an application where we have a lot of set objects so ensuring we don't use up too much memory. It isn't so useful in the application we have been considering where there is only one set so memory use isn't an issue.

Here is an implementation of sets of integers using arrays of the same size as the sets:

class IntSet5
{
// Set class, sets stored as ordered fully-filled arrays

 protected int[] array;

 public IntSet5()
 {
  array = new int[0];
 }

 public void delete(int n)
 {
  int i;
  for(i=0; i<array.length&&array[i]<n; i++) {}
  if(i<array.length&&array[i]==n)
     {
      int[] array1 = new int[array.length-1];
      for(int j=0; j<i; j++)
         array1[j]=array[j];
      for(i++;i<array.length; i++)
         array1[i-1]=array[i];
      array=array1;
     }
 }

 public void add(int n)
 {
  int i;
  for(i=0; i<array.length&&array[i]<n; i++) {}
  if(i==array.length||array[i]>n)
     {
      int[] array1 = new int[array.length+1];
      for(int j=0; j<i; j++)
         array1[j]=array[j];
      array1[i]=n;
      for(int j=array.length; j>i; j--)
         array1[j]=array[j-1];
      array=array1;
     }
 }

 public boolean member(int n)
 {
  int from=0, to=array.length;
  int mid=(from+to)/2;
  while(from!=to&&array[mid]!=n)
     {
      if(array[mid]>n)
         to=mid;
      else
         from=mid+1;
      mid=(from+to)/2;
     }
  return from!=to;
 }

 public String toString()
 {
  int i;
  String str="{";
  if(array.length>0)
     {
      str+=array[0];
      for(i=1;i<array.length;i++)
         str+=","+array[i];
     }
  return str+"}";
 }
}

Note that the numbers in the array are stored in ascending order, so the membership test can be done using binary search.

As previously, in the methods add and delete, a check is made to see if the array needs to be changed. It doesn't if we are adding a number to a set when that number is already there, or deleting a number from the set when that number isn't there. If the array does need to be changed, we create a new array, temporarily referred to by the variable array1, which is one greater in size than the original array when we are adding a number, and one less when we are deleting one. We then copy the numbers from the original array into the one referred to by array1 either missing out the number to be deleted, or adding in the number to be added. The array field of the object is then set to refer to array1, which means the original array is no longer referred to (so the space used for it is recycled by Java's "garbage collection" mechanism). The array now stored as part of the object is the new one which was created in the method.

An inheritance example

As we have seen, assignment of one object variable to another in Java means the two object variables refer to the same object. So if we had two variables, s1 and s2 of one of our set types, say IntSet5, executing

s2=s1;

would mean that s1 and s2 refer not only to the same set but to the same object. So if we then executed:

s2.add(7);

the number 7 would be added to the set represented by s1 and the set represented by s2 because they are the same set. If we wanted to create a new copy of the set for s2 to refer to, so it could be changed without affecting the set s1 refers to, we need to add a method to the set class to provide a copy operation. Then:

s2=s1.copy();

provides the desired effect:

We could just add the code for the method copy to the class IntSet5. But an alternative way is to define a new class which extends class IntSet5. Note that the array in IntSet5 is declared as protected rather than private which means it can be accessed by code in subclasses of IntSet5. Here is code which works like this:

class IntSet6 extends IntSet5
{
 public IntSet6()
 {
  super();
 }

 private IntSet6(int[] a)
 {
  array=a;
 }
 
 public IntSet6 copy()
 {
  int[] array1 = new int[array.length];
  for(int i=0; i<array.length; i++)
     array1[i]=array[i];
  return new IntSet6(array1);
 }
}

An IntSet6 object can have all the methods from IntSet5 called on it. But it can also have the additional method copy called on it, and when this method is called in returns an IntSet6 object. The public constructor for an IntSet6 object just uses the public constructor for an IntSet5 object, that is what the statement super() does. But there is also a private constructor for IntSet6 objects which takes an array of integers as an argument and sets the array of integers in the new IntSet6 to that array. Since this constructor is private, it can only be called within the code for IntSet6. It is called within the code for the method copy, which creates a new array the same size as the array in the object the method is called on, and then copies the numbers from that array into the new one. This is then used to make the new IntSet6 object which is returned.

Below is some code which provides an application similar to the set maintenance application given previously, but adds an undo command to the commands available on the set. This command appears to undo whatever was done by the previous command. How it actually works is that a copy of the set is made whenever it is changed by an add or delete command, and the set is replaced by this copy if an undo command is given. The current set is referenced by the variable mySet in the code, while the variable oldSet references the copy made of the mySet before it was altered. Note that the sets referenced by mySet and oldSet are swapped round when an undo command is given. This means that if an undo command is given after another undo command, it will cause the situation to revert to the position before the undo commands were given, that is the second undo command undoes the first undo command.

import java.io.*;

class UseIntSets8
{
// Set maintenance program, includes 'undo' command
// Uses set class with copy method

 public static void main(String[] args)
 throws IOException
 {
  int n=0;
  char ch;
  String line;
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  IntSet6 mySet = new IntSet6(),oldSet = new IntSet6(),tempSet;
  do {
      System.out.print(": ");
      line = in.readLine();
      ch = line.charAt(0);
      if(line.length()>1)
         n = Integer.parseInt(line.substring(1).trim());
      switch(ch)
         {
          case 'q' : 
            break;
          case 'd' :
            oldSet = mySet.copy();
            mySet.delete(n);
            break;
          case 'a' :
            oldSet = mySet.copy();
            mySet.add(n);
            break;
          case 'm' :
            if(mySet.member(n))
               System.out.println("Is a member");
            else
               System.out.println("Not a member");
            break;
          case 'u':
            tempSet = mySet;
            mySet = oldSet;
            oldSet = tempSet;
            break;
          case 'p':
            System.out.println(mySet);
            break;
          default:
            System.out.print("d - delete, a - add, m - member,");
            System.out.println("u - undo,  p - print, q - quit");
            break;
         }
     }
  while(ch!='q');
 }

}

Matthew Huntbach

Last modified: 22 January 2003