Intersection and union: Sets using Arrays, part 4

A full set specification

In the previous set of notes, we gave an axiomatic specification of sets involving just the operations member, add and delete. However, those familiar with the mathematical notion of sets will be aware that we generally consider the operations union and intersection to be an essential part of set theory. An item is a member of the union of two sets, S and T if and only if it is a member of S or a member of T. An item is a member of the intersection of two sets, S and T if and only if it is a member of S and a member of T. We can express this more formally in terms of axioms:

6. s.union(t).member(i) == s.member(i)||t.member(i)
7. s.intersection(t).member(i) == s.member(i)&&t.member(i)

Here is a first attempt to implement a version of sets of integers including union and intersection methods, with the data structure representing the set being that we considered before, an array of length the number of items in the array with the items stored in ascending order (assuming we are dealing with sets of integers):

import java.util.*;

class IntSet8 
{

 private int[] array;

 public IntSet8()
 {
  array = new int[0];
 }

 private IntSet8(int[] a)
 {
  array = a;
 }

 public IntSet8 delete(int n)
 {
  int i;
  for(i=0; i<array.length&&array[i]<n; i++) {}
  if(i<array.length&&array[i]==n)
     {
      int[] array1 = new int[array.length-1];
      for(int j=0; j<i; j++)
         array1[j]=array[j];
      for(i++;i<array.length; i++)
         array1[i-1]=array[i];
      return new IntSet8(array1);
     }
  return this;
 }

 public IntSet8 add(int n)
 {
  int i;
  for(i=0; i<array.length&&array[i]<n; i++) {}
  if(i==array.length||array[i]>n)
     {
      int[] array1 = new int[array.length+1];
      for(int j=0; j<i; j++)
         array1[j]=array[j];
      array1[i]=n;
      for(int j=array.length; j>i; j--)
         array1[j]=array[j-1];
      return new IntSet8(array1);
     }
  return this;
 }

 public boolean member(int n)
 {
  int from=0, to=array.length;
  int mid=(from+to)/2;
  while(from!=to&&array[mid]!=n)
     {
      if(array[mid]>n)
         to=mid;
      else
         from=mid+1;
      mid=(from+to)/2;
     }
  return from!=to;
 }

 public String toString()
 {
  int i;
  String str="{";
  if(array.length>0)
     {
      str+=array[0];
      for(i=1;i<array.length;i++)
         str+=","+array[i];
     }
  return str+"}";
 }

 public static IntSet8 fromString(String str) 
 {
  IntSet8 set = new IntSet8();
  str=str.trim();
  if(str.charAt(0)!='{'||str.charAt(str.length()-1)!='}')
     return new IntSet8();
  str = str.substring(1,str.length()-1);
  StringTokenizer toks = new StringTokenizer(str,",");
  try {
       while(toks.hasMoreTokens())
          {
           String num = toks.nextToken().trim();
           set=set.add(Integer.parseInt(num));
          }
       return set;
      }
  catch(NumberFormatException e)
      {
       return new IntSet8();
      }
 }

 public IntSet8 union(IntSet8 set)
 {
  for(int i=0; i<array.length; i++)
     set = set.add(array[i]);
  return set;
 }

 public IntSet8 intersection(IntSet8 set)
 {
  IntSet8 newSet = new IntSet8();
  for(int i=0; i<array.length; i++)
     if(set.member(array[i]))
        newSet = newSet.add(array[i]);
  return newSet;
 }
}
Apart from adding methods union and intersection to our class for sets, we have taken the opportunity to add a new method called fromString which takes a string and returns the equivalent set, the reverse of the toString operation. The methods add, delete and member are coded as before.

It is not important now to understand how fromString works, though you should be able to if you are familiar with string tokenizers, basic string methods, and Java's try-catch mechanism for dealing with exceptions. The method expects sets to be written starting with the character {, ending with the character } and with integers separated by commas inside. Blank spaces can occur anywhere in the string (except between the digits of individual numbers). It would be more in keeping with the spirit of Java if this method were to throw an exception if the string is not in a format which correctly represents a set of integers, but for simplicity in this case it just returns an empty set. The method fromString is declared as static because it isn't a method that operates on any particular object.

Below is a simple program which can be used to demonstrate these sets with the union and intersection operations:

import java.io.*;

class TestIntSets1
{
 public static void main(String[] args)
 throws IOException
 {
  BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
  String str;
  IntSet8 s1, s2, s3;
  System.out.println("Enter set 1: ");
  str = in.readLine();
  s1 = IntSet8.fromString(str);
  System.out.println("Enter set 2: ");
  str = in.readLine();
  s2 = IntSet8.fromString(str);
  s3 = s1.union(s2);
  System.out.println("The union of the two is: ");
  System.out.println(s3);
  System.out.println("The intersection of the two is: ");
  System.out.println(s1.intersection(s2));
 }
}
Note that it is not necessary in this program to have the separate statement s3 = s1.union(s2) followed by System.out.println(s3). We could have had just System.out.println(s1.union(s2)) which would have created a reference to the union of s1 and s2 and immediately passed that to be the argument to the println method. Assigning the result of a method call to a variable is useful if you want to use the result of that method call again. Sometimes it is done, however, just to prevent very complex statements with method calls inside method calls. Here it is done just as a demonstration. As the method fromString is a static method, the call to it is attached to the name of the class where the method is found, rather than to an object reference.

The advantage of the union and intersection methods as defined in class IntSets8 is that they are very simple to write, since they make use of the member and add methods we already have. We cannot, however, define union and intersection entirely using methods we already have, since they require us to go through a set an item at a time, and we do not have methods that enable us to do that. Therefore to write these methods we had to make use of the fact we knew sets were represented in this case by arrays and go through the arrays directly.

The disadvantage of the methods as given here is that they are very inefficient. Consider that every time we call the method add to add an item to a set, we copy the entire array of the old set to a new array for the new set. We then copy that array again and discard it when we add aother item to the set.

An efficient implementation of union and intersection

More efficient implementations of the methods union and intersection work directly constructing new arrays to represent the new sets created by the operations. We can make use of the fact we know the arrays are in ascending numerical order. The code below is taken from a class which otherwise is identical to IntSet8, expect that all references to IntSet8 there become references to the new class, IntSet9.
 public IntSet9 union(IntSet9 set)
 {
  int[] a1 = new int[this.array.length+set.array.length];
  int i=0,j=0,k=0;
  for(; i<this.array.length&&j<set.array.length; k++)
     if(this.array[i]==set.array[j])
        {
         a1[k]=this.array[i];
         i++;
         j++;
        }
     else if(this.array[i]<set.array[j])
        {
         a1[k]=this.array[i];
         i++;
        }
     else
        {
         a1[k]=set.array[j];
         j++;
        } 
  for(;i<this.array.length; k++)
     {
      a1[k]=this.array[i];
      i++;
     }
  for(;j<set.array.length; k++)
     {
      a1[k]=set.array[j];
      j++;
     }
  int[] a2 = new int[k];
  for(i=0; i<k; i++)
     a2[i]=a1[i];
  return new IntSet9(a2);
 }

 public IntSet9 intersection(IntSet9 set)
 {
  int[] a1 = new int[this.array.length];
  int i=0,j=0,k=0;
  while(i<this.array.length&&j<set.array.length)
     if(this.array[i]==set.array[j])
        {
         a1[k]=this.array[i];
         i++;
         j++;
         k++;
        }
     else if(this.array[i]<set.array[j])
        i++;
     else
        j++;
  int[] a2 = new int[k];
  for(i=0; i<k; i++)
     a2[i]=a1[i];
  return new IntSet9(a2);
 }
The code for merging two ordered arrays into one ordered array representing union is very similar to the code we used to merge arrays when we were considering merge sort here. We have an index going through each of the two arrays being merged (i and j in this case) and an index going through the new array being given values (k). However, in this case when we come across equal values in the two arrays being merged only one copy of that value is put in the new array. This means that at the start we don't know the size of the new array. It will be the length of the sum of the two initial arrays, less one for every integer that occurs in both of these arrays, but we don't know how many of these there are. So we initially create an array, called a1, of the size of the sum of the lengths of the two initial arrays to hold the array for the union. This is the maximum size the union array could possibly be. Once we have complete the merger and know exactly how many integers will be in the array representing the intersection, we create an array of that size, called a2, and copy the integers from a1 to a2.

Union here is defined as a non-static method, so to get the union of two sets s1 and s2 we call the method union on s1 with s2 as its argument, that is, s1.union(s2).Then inside the code for this method when it is executed, this.array refers to the array of s1 and set.array refers to the array of s2. It would be possible just to write array rather than this.array since array on its own in a non-static method in a class refers to the field array of the object the method is called on. But writing this.array makes that a little clearer. Note that the code for a method can refer to the private fields of the object that methid is called on, and also to the private fields of objects of the same type which are passed as arguments to that method, which is why in the code for union we can refer to set.array.

The code for intersection works in a similar way to the code for union, but in this case an integer is copied into the new array only if the same integer is found in both the initial arrays. Again, at the start we don't know the size of the final array. We know, however that it can be no bigger than either of the initial arrays, so we create an array called a1 of the size of the initial array of the object the method is called on, and at the end copy integers from it into the possibly smaller final array called a2.

The fromString method given previously also worked on the basis of using the add method to add new numbers as they are read, which is inefficient when doing that means copying the whole array. So here's a more efficient fromString method:

public static IntSet9 fromString(String str)
 {
  IntSet9 set = new IntSet9();
  str=str.trim();
  if(str.charAt(0)!='{'||str.charAt(str.length()-1)!='}')
     return new IntSet9();
  str = str.substring(1,str.length()-1);
  StringTokenizer toks = new StringTokenizer(str,",");
  int[] a = new int[toks.countTokens()];
  try {
       for(int i=0; toks.hasMoreTokens(); i++)
          {
           String num = toks.nextToken().trim();
           a[i]=Integer.parseInt(num);
          }
       Arrays.sort(a);
       return new IntSet9(a);
      }
  catch(NumberFormatException e)
      {
       return new IntSet9();
      }
 }
The numbers are read into an array which is then used to create a new IntSet9 object. It is necessary that the array used to create the set object is ordered, but we don't make that a necessity of the numbers in the string representation of the set. So we sort the array, using a sort method from Java's library rather than writing our own, before creating the set object.

Constructive implementation of set of integers using array of booleans

We showed previously an alternative representation of a set of integers using an array of booleans. That implementation of sets used destructive methods, but we could have a constructive implementation of sets of integers as an array of booleans. In this case, the add and delete operations involve making a copy of the array with the array element indexed by n changed to true to represent adding n to the set, and changed to false to represent deleting it. As before, we can return a reference to the unchanged set if we are attempting to add an integer that is already there, or delete one that isn't there. The union and intersection operations are particularly easy with this representation. Since an element is in the union of two sets if it is in one or in the other, if the two sets are represented by arrays of booleans a and b, and the union represented by the array of booleans c then for any n c[n] is set to a[n]||b[n]. Similarly, the intersection of two sets can be obtained by setting each of the cells in the array representing the intersection to a[n]&&b[n] where n is the index of the cell. This gives us the following implementation:
import java.util.*;

class IntSet10
{
 // Implementation of set of integers using array of booleans
 // Integers restricted to range 0 to MAX-1
 // Uses constructive methods, including union and intersection

 private boolean[] array;
 public static final int MAX=100;
 
 public IntSet10()
 {
  array = new boolean[MAX];
 }

 private IntSet10(boolean[] a)
 {
  array=a;
 }

 public IntSet10 delete(int n)
 {
  if(array[n])
     {
      int i;
      boolean[] a = new boolean[MAX];
      for(i=0;i<n;i++)
         a[i]=array[i];
      a[i++]=false;
      for(;i<MAX;i++)
         a[i]=array[i];
      return new IntSet10(a);
     }
  else
     return this;
 }

 public IntSet10 add(int n)
 {
  if(!array[n])
     {
      int i;
      boolean[] a = new boolean[MAX];
      for(i=0;i<n;i++)
         a[i]=array[i];
      a[i++]=true;
      for(;i<MAX;i++)
         a[i]=array[i];
      return new IntSet10(a);
     }
  else
     return this;
 }

 public boolean member(int n)
 {
  return array[n];
 }

 public String toString()
 {
  String str="{";
  int i;
  for(i=0; i<MAX&&!array[i]; i++);
  if(i<MAX)
     {
      str=str+i;
      for(i++;i<MAX;i++)
         if(array[i])
            str+=","+i;
     }
  return str+"}";
 }
 
 public static IntSet10 fromString(String str)
 {
  boolean[] a = new boolean[MAX];
  str=str.trim();
  if(str.charAt(0)!='{'||str.charAt(str.length()-1)!='}')
     return new IntSet10(a);
  str = str.substring(1,str.length()-1);
  StringTokenizer toks = new StringTokenizer(str,",");
  try {
       while(toks.hasMoreTokens())
          {
           String num = toks.nextToken().trim();
           a[Integer.parseInt(num)]=true;
          }
      }
  catch(NumberFormatException e)
      {
       for(int i=0; i<MAX; i++)
          a[i]=false;
      }
  return new IntSet10(a);
 }

 public IntSet10 union(IntSet10 set)
 {
  boolean[] a = new boolean[MAX];
  for(int i=0; i<MAX; i++)
     a[i]=this.array[i]||set.array[i];
  return new IntSet10(a);
 }

 public IntSet10 intersection(IntSet10 set)
 {
  boolean[] a = new boolean[MAX];
  for(int i=0; i<MAX; i++)
     a[i]=this.array[i]&&set.array[i];
  return new IntSet10(a);
 }
}

Matthew Huntbach

Last modified: 27 January 2003