Dictionaries, Hashing and Performance

Arnaud Bouchez · February 12, 2021

New hasher in town, to test and benchmark:
https://blog.synopse.info/?post/2021/02/12/New-AesNiHash-for-mORMot-2

Murmur and xxHash just far away from it, in terms of speed, and also in terms of collisions I guess... 15GB/s on my Core i3, on both Win32 and Win64.

The smallest length of 0-15 bytes are taken without any branch, 16-128 bytes have no loop, and 129+ are hashed with 128 bytes per iteration.
Also note its anti-DOS abilities, thanks to its random seed at process startup.
So it was especially tuned for a hashmap/dictionary.

Edited February 12, 2021 by Arnaud Bouchez

Marat1961 · February 14, 2021

The search speedup for hashmaps in this case is ~ 1/N, where N is the size of the table.

As long as the table is not completely filled with less than < 0.7 the table works well, and then comes performance degradation.

In such cases, the table size is increased and all data is moved to the new table.

This takes extra time and also means that while you do this, your system will not be able to serve clients.

In my opinion, up to about 16 elements, it is better to use a regular unsorted list and a linear search.
Then it makes sense to use either a TList with sorting each time a new key is added, ~ N * log2(N) for quicksort, or a tree where search and insertion with order maintenance is an inexpensive log2(N).

Advantages of hashmaps.
Search time for hashmap includes: hash generation + 1 to m equality comparisons.

Ideal hash function:
1. must be fast
2. Does not create collisions (does not generate the same key for different keys).

I have compared many hash functions in my time and realized that even the simplest ones provide good results on real data.

I have tested it on a database of addresses with about a million entries and for a client base with about 500,000 entries.
The algorithms included md5, sha-2, sha-3, src32, multiplicative hash.

function HashMultiplicative(const key: PByte; Size: Cardinal): Cardinal;
var
  i, hash: Cardinal;
  p: PByte;
begin
  hash := 5381;
  p := key;
  for i := 1 to Size do
  begin
    hash := 33 * hash + p^;
    Inc(p);
  end;
  Result := hash;
end;

For strings, a function that takes string length in characters into account when calculating the hash is useful.
For floating point numbers, it is better to use an algorithm that hashes the exponent and the mantissa of the number separately.
If it's an object or a record, sometimes it makes sense to write your own function that hashes selected key fields of the object.

I like hash table implementations with chains.

I usually know the approximate number of objects in the system and can immediately set the required input table size N.
This will require a bit more memory, but you know right away what the speedup will be and there will be no time wasted on dynamically increasing the table.

There is no point in increasing the size of the table much more than N.

Since the input index is determined by the search
index_input := hash mod N;

You can't choose a table size multiple of a byte, it increases the crowding of data in the table.

For many hashing algorithms, if you choose a size other than a prime number, the probability of collisions will increase and it will degrade the statistics.

PolywickStudio · March 19, 2023

I've been doing some simple testing.

Delphi 11.3

2000000 TStringList.AddObject in 141.17ms (14,167,115.00/s)
TStringList.Rehash in 1us
TStringList.Clear in 47.36ms
first TStringList.IndexOf in 7us
20000 TStringList.IndexOf in 8.35s (2,392.00/s)

2000000 TStringListHash.AddObject in 139ms (14,387,661.00/s)
TStringListHash.Rehash in 0us
TStringListHash.Clear in 46.22ms
first TStringListHash.IndexOf in 275.24ms
20000 TStringListHash.IndexOf in 5.60ms (3,570,790.00/s)

2000000 TDictionary.AddObject in 788.58ms (2,536,188.00/s)
TDictionary.Clear in 292.92ms
20000 TDictionary.IndexOf in 3.24ms (6,172,839.00/s)

Spring4D's IDictionary

20000 IDictionary.Add in 3.38ms (5,917,159.00/s)
first IDictionary.TryGetValue in 1us
20000 IDictionary.TryGetValue in 753us (26,560,424.00/s)

Use Spring?

Cedomir Plavljanic · March 25, 2023

Can you correct values for spring4D tests. Numbers element in add parts is to small in comparison with other (probably not problem with test but with text in message).

Stefan Glienke · March 25

On 2/16/2020 at 3:41 PM, vfbb said:

It is not an issue, it is the cost benefit of higher performance x less memory. But the optimization of creating it with the capacity is valid considering that the performance increases is on average 30%, both for small collections and for large ones.


uses
  System.SysUtils, System.Generics.Collections, System.Diagnostics, FMX.Dialogs;

procedure TestWithInitialCapacity;
var
  I, J, LNewId: Integer;
  LDictionary: TDictionary<Integer, Boolean>;
  LStopwatch: TStopwatch;
begin
  LStopwatch := TStopwatch.StartNew;
  for I := 0 to 10000 do begin
    LDictionary := TDictionary<Integer, Boolean>.Create(200);
    for J := 0 to 100 do begin
      repeat
        LNewId := Random(High(Integer));
      until not LDictionary.ContainsKey(LNewId);
      LDictionary.Add(LNewId, True);
    end;
    FreeAndNil(LDictionary);
  end;
  showmessage(Format('Test with initial capacity: %g', [LStopwatch.Elapsed.TotalMilliseconds]));
end;

procedure TestWithoutInitialCapacity;
var
  I, J, LNewId: Integer;
  LDictionary: TDictionary<Integer, Boolean>;
  LStopwatch: TStopwatch;
begin
  LStopwatch := TStopwatch.StartNew;
  for I := 0 to 10000 do begin
    LDictionary := TDictionary<Integer, Boolean>.Create;
    for J := 0 to 100 do begin
      repeat
        LNewId := Random(High(Integer));
      until not LDictionary.ContainsKey(LNewId);
      LDictionary.Add(LNewId, True);
    end;
    FreeAndNil(LDictionary);
  end;
  showmessage(Format('Test without initial capacity: %g', [LStopwatch.Elapsed.TotalMilliseconds]));
end;

I just came across this thread while searching for something else, and while reading this code, I want to make a few points:

I agree on the essential point: when you know how many items you will add to a collection, it is always beneficial to set the capacity upfront.

Now, even though this is some naive benchmark code, it shows a few things:

- Calling ContainsKey before the attempt to call Add means performing the lookup twice, which has not been necessary at all since Delphi 10.3 because of the new method TryAdd. On my machine, the tests go down from 51ms and 31ms to 44ms and 26ms.

- The hash function is one of the key parts that makes or breaks a dictionary performance-wise. The numbers I posted are from 12.3, which already uses a better hash algo (FNV1a) than it did years ago (BobJenkins) but is still far away from ideal.

Especially for Integer keys, if the hashtable is designed in a robust way of dealing with hash collisions and clustering, you can get away with not hashing anything but using the value itself. Unfortunately, the RTL one is not built in that way, but for this benchmark, it is fine to replace the eq compare it uses with the one from Spring4d, which does what I mentioned before. This brings down the numbers to 36ms and 22ms. That is another 15% improvement.

Now Spring4d would not be Spring4d if it did not have another trick up its sleeve: when it uses the default eq compare, the hashtable does not need to call GetHashCode and Equals for some intrinsic types such as Integer because the value itself is used as hashcode and for checking Integer equality it does not need to call some method. That leads to the above test with initial capacity taking 14ms.

Rollo62 · March 26

16 hours ago, Stefan Glienke said:

... it uses with the one from Spring4d, which does what I mentioned before. This brings down the numbers to 36ms and 22ms. That is another 15% improvement.

Now Spring4d would not be Spring4d if it did not have another trick up its sleeve: ...
That leads to the above test with initial capacity taking 14ms.

Thanks for these enlightments.
If Spring4D already optimizes to 14ms, how did you get the only 2nd best results at 22ms?

Is there any configuration setting needed, to switch on or off the Hash-Algorithm, how to use Spring4D in the right way with Integers?

Lars Fosdal · March 26

And... now in the correct thread...

Did you guys read the article on the undergraduate that suggested new improvements to hash tables?
Article: https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/
"Tiny Pointers" Paper: https://arxiv.org/abs/2111.12800

Stefan Glienke · March 26

36 minutes ago, Rollo62 said:

Thanks for these enlightments.
If Spring4D already optimizes to 14ms, how did you get the only 2nd best results at 22ms?

Is there any configuration setting needed, to switch on or off the Hash-Algorithm, how to use Spring4D in the right way with Integers?

The numbers before the last one are all based on the quoted code using RTL Dictionary with the tweaks I explained.

There is no setting needed. When you are using collections in spring4d without an explicit comparer, there are several places where they use an optimized path without a comparer at all - for example, when using an Integer as a key in a dictionary or when sorting a list of integers.

33 minutes ago, Lars Fosdal said:

Did you guys read the article on the undergraduate that suggested new improvements to hash tables?
Article: https://www.quantamagazine.org/undergraduate-upends-a-40-year-old-data-science-conjecture-20250210/
"Tiny Pointers" Paper: https://arxiv.org/abs/2111.12800

Did not read the paper, but I watched his presentation. I have to admit I only understand half of all that theory stuff - I am still waiting to see this in an actual implementation because theory-wise things are all fine, but when it comes to the actual code reality kicks in. For example, hardware wants to have a word on these different arrays, which ideally would be laid out in a contiguous block, and then you need some indexing algorithm for the subparts and so on.

Edited March 26 by Stefan Glienke

Lars Fosdal · March 26

2 minutes ago, Stefan Glienke said:

I am still waiting to see this in an actual implementation because theory-wise things are all fine, but when it comes to the actual code reality kicks in.

Yeah, that was my thoughts as well. Still, it is fascinating to see innovative ideas on old subjects.

Lars Fosdal · March 26

https://www.theregister.com/2025/02/13/hash_table_breakthrough/ adds some further detail on a different paper, "Optimal Bounds for Open Addressing Without Reordering" which is related to the first "Tiny Pointers" paper.
And these Python implementations derive from that paper: https://github.com/sternma/optopenhash

Sign In

Dictionaries, Hashing and Performance

Recommended Posts

Arnaud Bouchez 409

Share this post

Link to post

Marat1961 17

Share this post

Link to post

PolywickStudio 3

Share this post

Link to post

Cedomir Plavljanic 0

Share this post

Link to post

Stefan Glienke 2142

Share this post

Link to post

Rollo62 581

Share this post

Link to post

Lars Fosdal 1865

Share this post

Link to post

Stefan Glienke 2142

Share this post

Link to post

Lars Fosdal 1865

Share this post

Link to post

Lars Fosdal 1865

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity