Linked lists: Sets using Lists

Linked data structures

We will continue with the theme of representing sets of integers, but consider a completely different form of representation. Note that this change of data structure used inside set objects is invisible to programs which use sets, which will continue to see the familiar methods as discussed previously. The whole point of abstract data types is that you can change how an object is represented internally without having to make any changes to how it is used.

We now consider the idea of linked data structures. A linked data structure usually involves an object type which has a field of the same type within it, in other words a recursive type. Less usually it could involve the equivalent of indirect recursion, for example, an object which has a field of another type and that other type has a field of the first type.

The simplest linked data structure is the linked list. A linked list consists of an object with two fields, the first a piece of data, the second a linked list. As for simplicity we concentrated on sets of integers, so for simplicity we will concentrate on linked lists of integers, though the data could actually be of any type. So our linked list consists of an integer and a linked list, and the linked list which is part of it consists of an integer and a linked list, and the linked list which is part of that consists of an integer and a linked list ... and so on. While it seems this must go on for ever, we need to remember that an object field can be set to refer to an object or set to null meaning it is not set to refer to anything. This is the equivalent of the base case in recursion. We do not follow an endless chain of linked lists because eventually we will come to one set to null and thus come to a halt. Actually, we could have a linked list structure which is endless, but not infinite, since a list can point back to a cell which is already part of the list, creating a circular structure.

Here is the class for a linked list:

class Cell
{
 int first;
 Cell next;

 Cell(int f,Cell n)
 {
  first=f;
  next=n;
 }
}

We have called the class Cell rather than LinkedList to indicate that we can think of things of this type as objects with two parts, one an integer, the other a Cell. Note too that the fields of the Cell object are not declared as private. This means that they can be changed by code outside the class Cell. Suppose we have three variables of type Cell:

Cell c1, c2, c3;

and that after some code which sets c2 and c3 to refer to something, we set c1 to refer to a new Cell object, with 5 and c2 as the arguments to the constructor:

c1 = new Cell(5,c2);

Now the situation is:

The object that c1 refers to has two parts, and as they are not private to the object, they can be treated just like variables called c1.first and c1.next. The lack of a line around the boxes representing these fields shows they are not protected by being private to the object. Note that the arrows always point to the whole object even if it is shown to have separate parts; in Java (unlike C) there is no concept of a pointer to part of an object. The constructor sets c1.first to 5 and c1.next to refer to the same object that c2 refers to. But we can change either of them. If we first execute:

c1.first = 8;

we will get the situation:

If we then execute

c1.next = c3;

we will get the situation:

Of course, the objects that c2 and c3 refer to are themselves cells with two parts. Suppose we now execute:

c3 = new Cell(10,null);

The situation becomes:

Note that although c3 now refers to a new object, c1.next continues to refer to the object it was pointing to, which is the object c3 used to refer to. The diagonal line across the second part of the object c3 refers to is used to indicate an object value which is set to null.

Now if we execute

c1.next = c3;

again, we will get the situation:

The object that c1.next originally pointed to is not drawn because now none of the variables here point to it. If nothing at all points to it, the Java interpreter will remove the space allocated to it altogether, allowing it to be reused, a process known as garbage collection.

If after this we set c3.next to refer to a new Cell object, for example by executing:

c3.next = new Cell(4,null);

we will get the following situation:

It can be seen that even though c1 wasn't mentioned in the last statement, what it refers to is changed by it, because c1 refers to an object that refers to the object that c3 refers to, and that object was changed.

Following the links here, we can think of c1 as referring to a list [8,10,4]. This is why this sort of data structure is called a "linked list".

Representing sets by linked lists

We can represent sets by linked lists rather than arrays. The code below does just this. It provides the same destructive add and delete methods, along with a member and toString method, so the class here could be used in place of one of the destructive set classes we used earlier just by changing the type name in the application program.

class IntSet11
{
// Destructive Set class, sets stored as unordered linked lists

 Cell list;

 public void delete(int n)
 {
  if(list!=null)
     if(list.first==n)
        list=list.next;
     else
        {
         Cell ptr;
         for(ptr=list; ptr.next!=null&&ptr.next.first!=n; ptr=ptr.next) {}
         if(ptr.next!=null)
            ptr.next=ptr.next.next;
        }
 }

 public void add(int n)
 {
  Cell ptr;
  for(ptr=list; ptr!=null&&ptr.first!=n; ptr=ptr.next) {}
  if(ptr==null)
     list = new Cell(n,list);
 }

 public boolean member(int n)
 {
  Cell ptr;
  for(ptr=list; ptr!=null&&ptr.first!=n; ptr=ptr.next) {}
  return (ptr!=null);
 }

 private static class Cell
 {
  int first;
  Cell next;

  Cell(int f,Cell n)
  {
   first=f;
   next=n;
  }
 }

 public String toString()
 {
  String str="{";
  if(list!=null)
     {
      str+=list.first;
      for(Cell ptr=list.next; ptr!=null; ptr=ptr.next)
          str+=","+ptr.first;
     }
  return str+"}";
 }

}

What is happening here is that a set object has a field which refers to a linked list inside it. For example, if we had a variable of type IntSet11 called s which represented the set {5,8,3,6}, we could represent this diagrammatically as:

One new thing here is that the class Cell is actually inside the class IntSet11. To make this quite clear, it is put between the method member and the method toString, although this is just for demonstration and it would be more normal practice to put it either before or after all the methods. This is a simple example of a nested class (sometimes referred to as an "inner class" though that term is often used only for certain sorts of nested classes). The class is declared as private which means it cannot be used outside the class IntSet11 where it is declared. Nested classes can be declared as static or not, just like methods and fields in classes. If an inner class is declared as static, as Cell is, it cannot refer to non-static methods or non-static fields of the class. Making a class a nested class is a way of indicating it exists only in order to be used within its enclosing class.

There is no constructor given for the class IntSet11. If there is no constructor for a class it is given a default constructor. The default constructor for any class has no arguments and sets all numerical fields in the new object created to 0 and all object fields to null. So new IntSet11() will create a new IntSet11 object, with its field called list, which is of type Cell set to null. An IntSet11 object with its list field set to null represents the empty set (the set with no integers in it). It can be seen that the add method just adds the new number to the front of the list of numbers representing the set. So the list does not store the numbers in any particular order. It is, however, necessary to test first whether the number is already in the list, if it is it us not added again at the front (otherwise the delete method wouldn't work properly, because deleting one cell containing the number wouldn't delete the other).

The other methods use the technique of setting a variable to refer to each of the items in the list in turn, moving down the list until either it reached the end of the list or it reaches some point where it needs to halt. Consider the loop in the method member:

for(ptr=list; ptr!=null&&ptr.first!=n; ptr=ptr.next) {}

Note this is a loop without a body, indicated by the {} at the end. The variable being moved down the list is ptr standing for "pointer". It could be called anything, but it is commonly called ptr. At first ptr is set to be equal to the list field of the object. Then a check is made to see if either ptr is null, which would only happen if the object represented the empty set, or the first field of the Cell object ptr refers to is equal to the number n. If not, the update operation ptr=ptr.next sets ptr to refer to the second Cell object in the linked list. This process is repeated until either ptr refers to a Cell object which stores the number being tested or membership of the set, or it has been moved right through the list and reached the end of it. In the former case, the number is a member of the list, in the latter it isn't. So ptr!=null is true if ptr does not refer to null and instead refers to some cell storing n, it is false otherwise so the value of this boolean expression is returned as the value of the method call.

The code for delete simply sets the list field of the IntSet11 object to the next field of the Cell object it is pointing to if it happens that the number to be deleted is at the front of the list. Otherwise, it moves ptr until it references the Cell before the one to be deleted. For example, if we want to delete 4 from the list representing [5,8,3,4,7], the variable ptr is made to move down the list until it points to the Cell object storing 3:

Now executing ptr.next=ptr.next.next will set the next field of the cell ptr is referencing, that is the cell holding 3, to be equal to the next field of the cell that ptr.next was originally referring to, that is to refer to the cell holding 7. Since the cell holding 3 is part of the list which starts at the field list of the IntSet11 object, the effect of this is to cut the cell holding 4 out of the list:

In fact, since nothing points to the cell holding 4, we can think of it as disappearing and being recycled in garbage collection:

Note that in these "cells and pointers" diagrams, the actual placing of the cells and shape of the lines does not mean anything. All that matters is where the lines representing pointers come from (individual variables or fields within cells) and where they point to (whole cells).

When we are deleting, an additional check needs to be made that ptr is not pointing to the last cell of the list. If it is, then ptr has been moved right through the list and the number to be deleted has been found not to be present in the list.

Representing sets by ordered linked lists

Sets as shown above were represented by unordered linked lists. We could represent sets by linked lists stored in numerical order However, this isn't very helpful. The major reason it was helpful to represent sets by ordered arrays was that it made searching for a member in the set quick as we could use the binary search algorithm. Binary search works because given a position in an array we can get to it in one step. However, we can't do that with a linked list - the only way we can get to an element in the linked lists we have discussed here is to start at the beginning of the list and follow the links down one at a time. If we know a list representing a set is ordered, if we are searching for a particular set member either to test its membership or to delete it from the set, we can stop searching if on going through the set we find a number greater than it (assuming the order is ascending numerical) as we know the number won't occur after that. So on average this will cut the amount of search done by a half. However, using the big-O notation this is still O(n), so it is not an equivalent gain in efficiency to that caused by introducing binary search with its O(log n) time.

For illustration, here is come code that represents sets by ordered linked lists:

class IntSet12
{
// Destructive Set class, sets stored as ordered linked lists

 Cell list;

 public void delete(int n)
 {
  if(list!=null)
     if(list.first==n)
        list=list.next;
     else
        {
         Cell ptr;
         for(ptr=list; ptr.next!=null&&ptr.next.first<n; ptr=ptr.next) {}
         if(ptr.next!=null&&ptr.next.first==n)
            ptr.next=ptr.next.next;
        }
 }

 public void add(int n)
 {
  if(list==null)
     list = new Cell(n,null);
  else if(n<list.first)
     list = new Cell(n,list);
  else
     {
      Cell ptr;
      for(ptr=list; ptr.next!=null&&ptr.next.first<n; ptr=ptr.next) {}
      if(ptr.next==null||ptr.next.first>n)
         ptr.next = new Cell(n,ptr.next);
     }
 }

 public boolean member(int n)
 {
  Cell ptr;
  for(ptr=list; ptr!=null&&ptr.first<n; ptr=ptr.next) {}
  return (ptr!=null&&ptr.first==n);
 }

 public String toString()
 {
  String str="{";
  if(list!=null)
     {
      str+=list.first;
      for(Cell ptr=list.next; ptr!=null; ptr=ptr.next)
          str+=","+ptr.first;
     }
  return str+"}";
 }

 private static class Cell
 {
  int first;
  Cell next;
  
  Cell(int f,Cell n)
  {
   first=f;
   next=n;
  }
 }
}

Note the code is more complex, in methods delete and member representing the two ways in which the loop may terminate without finding the desired number: either the pointer sent through the list reaches the end of it, or it reaches the position of pointing to a larger number. The code for add shows how an item may be inserted into a linked list. First a pointer is set to point to the cell before the position the new cell is to go in, then ptr.next = new Cell(n,ptr.next) is executed, where n is the item to be inserted. In the case where the insertion point is before all current cells, a special section of code needs to be written to join the new cell to the front. Suppose we want to insert 5 in numerical order in the linked list representing [2,3,6,9]. The method add checks the list is not null and the number to be inserted is not less than the first number in the list. Having checked this, it moves ptr to the point where it points to the cell whose next field is greater than the number to be inserted, by executing the loop:

for(ptr=list; ptr.next!=null&&ptr.next.first<n; ptr=ptr.next) {}

resulting in the position being:

Then executing ptr.next = new Cell(n,ptr.next) where n holds 5 gives:

Deletion is similar to deletion from a non-ordered list, except if the pointer is moved to a position where the cell referenced by the next field of the cell referenced by the pointer holds a number greater than the number being deleted, we know that number cannot be in the list. So in the simple form of sets we have been considering, we leave the situation unchanged and do not report any error or exceptional condition.

Copying linked lists

Suppose we wanted to make a copy method for sets represented with linked lists, as we did previously for sets represented using arrays. Again, we could use inheritance to give us a new class which adds a copy method:

class IntSet11a extends IntSet11
{
 public IntSet11a()
 {
  super();
 }

 private IntSet11a(Cell l)
 {
  list=l;
 }

 public IntSet11a copy()
 {
 Left blank for now
 }
}

Note that since this class refers to the Cell class and the list field from the class IntSet11a, they will need to be of status protected in that class rather than private.

As in our previous example, we have a private constructor which takes the data structure used to represent the new set as its argument, and creates a new object around that data structure. If the following code were used for the copy method:


 public IntSet11a copy()
 {
  Cell list1=list;
  return new IntSet11a(list1);
 }

it would not work properly, because although we would create a new object, the linked list in the new object would share the linked list from the old object. So the following:

s2=s1.copy();

with s1 representing the set {2,7,4,3} would result in the structure:

So we have two separate objects, referred to by s1 and s2, each with their own list field. But the list fields refer to the same Cell object. Since the delete method alters the linked list of a set object, calling, for example

s1.delete(4);

would give the following situation, changing as a side-effect change what s2 represents, which is clearly not the situation we want:

What we need is to copy the list completely, so that calling s2=s1.copy() will result in the list field of s2 referring to a list of cells with the same contents as that of s1 but not actually the same cells, that is, using our previous example:

The following code for copy will achieve this:

 public IntSet11a copy()
 {
  if(list==null)
     return new IntSet11a(null);
  else
     {
      Cell list1 = new Cell(list.first,null);
      Cell ptr,ptr1;
      for(ptr1=list1,ptr=list; ptr.next!=null; ptr=ptr.next, ptr1=ptr1.next)
         ptr1.next = new Cell(ptr.next.first,null);
      return new IntSet11a(list1);
     }
 }

As can be seen, if the linked list is null, representing the empty set, a new object whose list field is set to null is created. Otherwise, a new Cell object is created to be the front of the new linked list, its integer is the integer from the cell at the front of the old linked list. Two pointers ptr1 and ptr are then set to point to the first cell in the new linked list and the old linked list respectively. A new cell for the next field of ptr1 is created with the integer from the next field of ptr, this gives the situation:

Then both pointers are advanced down the list, and the process repeated until ptr points to the last cell in its list. The effect is that the copy is built up in the order the cells come. This copy is then used to create a new object using the private constructor.

We shall use the techniques used here to copy a linked list again when we consider implementing a constructive set class using recursion.

Matthew Huntbach

Last modified: 4 March 2005