Foundation · Uncategorized

Expand your Collections collection – Part 1: a generic set

Welcome to this first post in a series about custom (generic) collections in Delphi. These are part of our Grijjy Foundation library of classes and utilities that are used throughout our code base. Other Grijjy Repositories often depend on this library.

Introducing TgoSet

Have you ever doubted between using a TList<T> or TDictionary<T> to store a collection of items? You want to quickly check if the collection contains a specific item, so a dictionary would make sense. But on the other hand, you don’t want to store key/value pairs, just items, so a regular list would be more appropriate.

You may have “solved” this issue by creating a dictionary with a dummy value type (like a TDictionary<String, Integer>), and just add 0 values with all your strings. While this works, it feels like kind of a hack, and it wastes memory and performance.

That’s why we created a generic set called TgoSet<T>. You can find it in the Grijjy.Collections unit.

Like a dictionary, a set is a collection of unordered items that uses hashing to quickly lookup items. And like a list, it just stores items and no key/value pairs. But unlike a list, the order of the items in the collection is undefined, and you cannot access the items by index. You can enumerate all items though using a for..in loop or the ToArray method.

Simple API

Using a TgoSet is pretty easy. The API is similar to that of a dictionary:

var
  Keywords: TgoSet<String>;
  Keyword: String;
begin
  Keywords := TgoSet<String>.Create;
  try
    Keywords.Add('if');
    Keywords.Add('case');
    Keywords.Add('repeat');
    Keywords.Add('while');
    Keywords.Remove('repeat');

    if (Keywords.Contains('if')) then
      WriteLn('"if" is a Delphi keyword');

    Write('Delphi keywords:');
    for Keyword in Keywords do
      Write(' ', Keyword);
  finally
    Keywords.Free;
  end;
end;

The output of this code is:

"if" is a Delphi keyword
Delphi keywords: while if case

As you can see, the order of the items in the set is undefined. It (usually) does not match the addition order or alphabetic order. Like a dictionary, the order of the items is determined by their hash codes.

Note that, like a dictionary, the Add method will raise an exception if the item already exists in the set. Use AddOrSet instead to avoid this (although it is a bit slower).

Also, like a dictionary, you can pass your own comparer (IEqualityComparer<T>) to the constructor if you don’t want to use the default comparer. You may want to do this for example, when you want to check for strings case-insensitively. Or when you want to use a faster hash function than the one Delphi provides (we may present such one in a future blog article).

Other Set types

If you look at the source code for TgoSet<T>, you will see that it is derived from TgoReadOnlySet<T>. This means you can present a read-only view of the set as a property or parameter, so that others cannot inadvertently modify the set.

As with other collections in Delphi, there is also a version that can hold objects only, and that takes ownership of these objects. This version, TgoObjectSet<T> will automatically destroy its objects when the set is destroyed, cleared or when objects are removed from the set. To remove an object without destroying it, use the Extract API.

How a Set works

Please feel free to skip the remainder of this post if you don’t care about the inner workings of this set.

A set most closely resembles a dictionary without values. So we can start by looking at TDictionary<TKey, TValue> and strip out all value-related code. Delphi’s dictionary is implemented using a hash table and linear probing to resolve hash collisions. We use the same model in our sets.

Our hash table is just a dynamic array of TItem records:

type
  TgoReadOnlySet<T> = class(TEnumerable<T>)
  private type
    TItem = record
      HashCode: Integer;
      Item: T;
    end;
  private
    FItems: TArray<TItem>;
    ...
  end;

A TItem keeps track of its value and hash code. The length of the hash table (FItems) is always a power of two.

Then, adding an item involves these steps:

  1. Calculate the hash code for the item (using IEqualityComparer<T>.GetHashCode).
  2. Transform the hash code to an index into the hash table. Since the length of the hash table is always a power of two, we can just logical and the hash code with the length of the hash table minus 1.
  3. Lookup the corresponding item in the hash table. If it is empty, we can store our item in there. Otherwise, we keep increasing the index until and empty item is found (this is the linear probing part).

In code, this looks like this:

procedure TgoSet<T>.Add(const AItem: T);
var
  Mask, Index, HashCode, HC: Integer;
begin
  if (FCount >= FGrowThreshold) then
    Resize(Length(FItems) * 2);

  HashCode := FComparer.GetHashCode(AItem) and $7FFFFFFF;
  Mask := Length(FItems) - 1;
  Index := HashCode and Mask;

  while True do
  begin
    HC := FItems[Index].HashCode;
    if (HC = EMPTY_HASH) then
      Break;

    if (HC = HashCode) and FComparer.Equals(FItems[Index].Item, AItem) then
      raise EListError.CreateRes(@SGenericDuplicateItem);

    Index := (Index + 1) and Mask;
  end;

  FItems[Index].HashCode := HashCode;
  FItems[Index].Item := AItem;
  Inc(FCount);
end;

First, we check if we need to grow the hash table. We do this once the hash table becomes 75% full. At that point, the chance of hash collisions is big enough to warrant an expansion.

Also note that we and the hash code with $7FFFFFFF. This basically clears the sign-bit, making the code non-negative. This is needed because we reserve hash code -1 (EMPTY_HASH) to signify an empty hash table entry.

The while loop performs the linear probing to find an empty entry in the hash table. If there already exists an entry in the hash table with the same hash code, then we use IEqualityComparer<T>.Equals to compare its item with the item we want to add. If they are the same, we raise an exception because duplicate items are not allowed in a set.

Those are the basics of implementing a hash table in a collection. Of course, you also need to take care of growing the set when needed, and removing items from the collection. If you are interested in the algorithms involved here, please take a look at the source code. It shouldn’t be too difficult to follow now you know how the hash table is organized.

Coming up

In our next post in this series, we will present a generic ring buffer.

4 thoughts on “Expand your Collections collection – Part 1: a generic set

  1. What happens if you add an element to a set when there are N elements in it, and then try to find it when the set has N * 4 elements? In this situation, the initial indices will not match and the search will start farther than the element is located?..

    Like

    1. Don’t know exactly what you mean here, but the set uses hashing to quickly retrieve items. When the set grows, all existing items are re-hashed, and should have very little impact on retrieval performance.

      Like

Leave a comment