Jump to content
Mike Torrettinni

Fast lookup tables - TArray.BinarySearch vs Dictionary vs binary search

Recommended Posts

Good point - I will even go even further and think all the lookup methods should returning rather than a specific property value which a) improves performance, b) makes them better comparable and c) makes this code more flexible

Most then even turn into easily inlinable one liners.

Share this post


Link to post
5 hours ago, balabuev said:

Here is my try. I've changed a lot in existing code (mentioned early comparers, equality comparers, etc.). Also, I've replaced returned string type in functions by PString, just because otherwise string handling takes too much time and algorythms comparison becomes not interesting. 

As well I've added my own minimalistic implementation of hash maps (not universal).

 

mapstest.zip

 

 

Some very impressive numbers, x32 and x64!

  • Like 1

Share this post


Link to post
Posted (edited)

FWIW the approach to use objects as bucket items for a hashmap is a no go in real code as your memory will be all over the place plus the memory overhead of objects. In a microbenchmark you will get away with that because there won't be much more memory around.

On my i7 the TNameMap already drops behind the RTL dict at around 50000k items. The same seems to be the case for the TSynDict as that drops even more.

Edited by Stefan Glienke

Share this post


Link to post
Posted (edited)

@balabuev is this something you came up with on the spot, or you are using something similar or is from another library? This is not licensed, right, I can use in commercial software?

Edited by Mike Torrettinni

Share this post


Link to post
Posted (edited)
5 hours ago, Mike Torrettinni said:

is this something you came up with on the spot, or you are using something similar or is from another library? This is not licensed, right, I can use in commercial software?

 

No library used, just a simplest implementation of hash map. Use it as you need.

 

7 hours ago, Stefan Glienke said:

the approach to use objects as bucket items for a hashmap is a no go in real code

 

I'm hearing about this the first time ever, and it seems to me very strange. Also, a single hash map can be organized on top of TDataLine user objects without any additional objects (via additng Next and Hash fields directly to TDataLine). Two maps, however will need another one pair of such properties, which is not so graceful, but also possible, if we speak about the extreme case.

 

7 hours ago, Stefan Glienke said:

On my i7 the TNameMap already drops behind the RTL dict at around 50000k items.

 

I've only tested with 100k items. I can see the effect. However, it's not clear for me why this happens.

 

 

 

Edited by balabuev
  • Thanks 1

Share this post


Link to post
On 3/6/2021 at 2:32 PM, Mike Torrettinni said:

I'm refactoring usage of various different look-up tables and I'm trying to come with 1 fast look-up table design. Most of them have 1 thing in common: search by integer (ID) and search by string (Name).

All data is sorted.

 

So, I bench-marked TArray.BinarySearch, TDictionary and custom binary search.

Even though a flaw in design was pointed out for TArray.BinarySearch, (see https://en.delphipraxis.net/topic/4575-why-is-tarraybinarysearch-slower-than-normal-binary-search-function/), I still wanted to compare results:

 

For search by integer, custom binary search seems to be a little faster than TDictionary, until large tables:

image.png.67388e6df369cfce9ae31490179558c9.png

 

For search by string, custom binary search wins even more, still for not large tables:

image.png.50244b99e8133d6704d0554f0f020b16.png

 

Interesting is that for smaller searches, 10 records, a sequential search is faster than TDictionary.

 

I also tried TGpStringHash (by Primoz Gabrielcic), which is very fast, but unfortunately no implementation for integers.

TSynDictionary (from mORMot) is also very fast, but I don't use mORMotand license is not friendly for my commercial project.

 

Is there any other fast search implementations that is not too specialized and ready to use out of the box, that I can try?

 

 

 

 

Take a look - Binary Array Set. Rare, but very useful data structure.

  • Thanks 1

Share this post


Link to post
On 5/2/2021 at 4:39 AM, julkas said:

Take a look - Binary Array Set. Rare, but very useful data structure.

That is interesting. Do you use it, for what cases you prefer using it?

 

I can't find Delphi implementation. Do you know does it exist?

Share this post


Link to post
22 hours ago, Mike Torrettinni said:

That is interesting. Do you use it, for what cases you prefer using it?

 

I can't find Delphi implementation. Do you know does it exist?

There is no ready to use Pascal implementation 😞 . You can start / take ideas from existing C++ & Java code and create good (without remove/delete 😞) ordered set or order statistics set (array bisection algorithm may be helpful) Object Pascal class.

Share this post


Link to post
2 hours ago, julkas said:

There is no ready to use Pascal implementation 😞 . You can start / take ideas from existing C++ & Java code and create good (without remove/delete 😞) ordered set or order statistics set (array bisection algorithm may be helpful) Object Pascal class.

And then after you spent quite some time implementing this without defects that it would not be any faster than all the ready-to-use collection types or the actual performance improvement would be completely negligible.

Share this post


Link to post
Posted (edited)

I must say, when in the phase to get stuff to work, i throw TDictionaries all over the place. It is fast to code, it's legible. Sometimes enabling them with a TList or TArray for sorted output or some such. Of course, if its just 10 digits, it's an array from start. Just make it readable and working!

 

Then, when the project has been/is being deployed to some alpha/beta users and is stabile enough (in order not to confuse testers or myself) i do some extensive monitoring. That shows where the point of interest are to for example ditch a dict and use an array, or delve deep into the book to get inspiration.

 

What i mean to say is, use the most "logical" way of storing in the beginning. Then focus on the parts of the system were it really pays. That way, you will have legible code most places and "tuned" (more complex source-reading, more sensitive maintenance) code where it really matters.

 

Just my $,05. 

Edited by Dany Marmur
  • Like 3

Share this post


Link to post

@Dany Marmur Agreed - it's the job of runtime library developers to get the most out of their data structures and algorithms so the users of those libraries don't have to worry 99.9% of the time but just chose the reasonable type/algo for the job.

 

Funny enough I am currently in the process to do exactly that for my library and can tell you that for the RTL it's certainly not the case, unfortunately.

  • Like 5

Share this post


Link to post

An impressive number of implemented collections for sure but it's only compatible with FreePascal and from a quick look you won't easily make that code Delphi compatible.

But thanks for mentioning it - I will certainly run some benchmarks to compare.

Share this post


Link to post
11 hours ago, Stefan Glienke said:

An impressive number of implemented collections for sure but it's only compatible with FreePascal and from a quick look you won't easily make that code Delphi compatible.

But thanks for mentioning it - I will certainly run some benchmarks to compare.

FPC benchmarks

Share this post


Link to post
Posted (edited)
On 3/18/2021 at 2:19 AM, balabuev said:

Here is my try. I've changed a lot in existing code (mentioned early comparers, equality comparers, etc.). Also, I've replaced returned string type in functions by PString, just because otherwise string handling takes too much time and algorythms comparison becomes not interesting. 

As well I've added my own minimalistic implementation of hash maps (not universal).

 

mapstest.zip

 

 

Hi. I've downloaded and looked at your NameMapUnit unit and have the following comments:

1. Firstly, great speed, so thanks and well done.

2. I suggest adding an "InitialCapacity" parameter to your  TMapBase.constructor

3. It seems to me that your TMapBase.Rehash method, which concurrently removes and reinserts items into FItems, risks some duplication. (OK, on further testing this doesn't seem to be a problem.) Anyhow, there may be a marginal improvement by reinserting TMapItem items instead of recreating them.

Cheers.

Edited by angusj

Share this post


Link to post
2 hours ago, angusj said:

Anyhow, there may be a marginal improvement by reinserting TMapItem items instead of recreating them

I tried to keep the code reasonably simple. Yes, sure, some improvements are possible. Instead of objects we can at least use records for map items with New/Dispose (or even better, with GetMem/FreeMem), as @Stefan Glienke mentioned before (if I've understood right). We can also go futher and try to minimize heap allocations, or use some sort of map item pool, etc. We also can use some initial capacity as you propose.

 

But, this will complicate the demo code, thus, hiding the main idea, which it tries to show. Moreover, map filling is not critical for this particular test. We measure only search speed.

 

 

Share this post


Link to post
3 hours ago, balabuev said:

But, this will complicate the demo code,

Noted. Cheers, and thanks again.

Share this post


Link to post
On 5/7/2021 at 8:43 AM, balabuev said:

I tried to keep the code reasonably simple. Yes, sure, some improvements are possible. Instead of objects we can at least use records for map items with New/Dispose (or even better, with GetMem/FreeMem), as @Stefan Glienke mentioned before (if I've understood right). We can also go futher and try to minimize heap allocations, or use some sort of map item pool, etc. We also can use some initial capacity as you propose.

Neither - all hash tables that have a word when it comes to performance are using some contiguous block of memory (aka array/vector). Otherwise, any possible cache locality is just completely destroyed.

Share this post


Link to post
On 5/19/2021 at 12:03 AM, Stefan Glienke said:

Neither - all hash tables that have a word when it comes to performance are using some contiguous block of memory

I suspected you will say this. But, anyway, in real life we use a lot of heap allocated stuff, like objects or strings, so I think there will be no big difference.

Share this post


Link to post
Posted (edited)
1 hour ago, balabuev said:

I suspected you will say this. But, anyway, in real life we use a lot of heap allocated stuff, like objects or strings, so I think there will be no big difference.

We can certainly argue over the term "big" - but performance is either CPU or memory bound - and with hash table items be scattered all over the heap it most certainly will be memory bound.

Hash tables are complex beasts and there are many considerations when designing one but usually, you want to avoid blasting your items all over the heap.

Interesting read: https://leventov.medium.com/hash-table-tradeoffs-cpu-memory-and-variability-22dc944e6b9a

Edited by Stefan Glienke
  • Like 6

Share this post


Link to post
On 5/26/2021 at 12:59 AM, Stefan Glienke said:

with hash table items be scattered all over the heap it most certainly will be memory bound

All this is true when working with tiny things, like, for example, mapping int to int. But, in more usual case, like string keys, the effect is shadowed by string handling overhead and the fact that strings themselves are heap allocated.

Share this post


Link to post

If you say so - people have profiled the crap out of hashtables - and none that is halfway decent is using unnecessary memory indirections or wastes space by storing pointers.

The point with strings might be true but then you only pay the price when using strings - and chances are that the string you are using to look up the item is already in your cache.

Share this post


Link to post
Posted (edited)

The question is in overall size. If the size is not too big, which is again - typical, then the table with "unnecessary memory indirections" will work faster, because the logic here is simpler.

 

A table, which inlines everything, must also pay its own price to prevent "wasting space by storing pointers" by introducing additional fields, doing bit manipulation and more array element accesses. But, such a table also wastes space because of lower "load factor", and for yet empty slots when SizeOf(Value) > SizeOf(Pointer).

 

Some well known examples:

- Java HashMap and Hashtable - heap allocated entries. Well, they have almost no choice, because Java has no structs. The only way to inline is to keep several parallel arrays.

- C# Dictionary<K, V> - kind of tricky code, which preallocates all entries in a separate array, and then uses a kind of tiny embedded memory manager to provide entries from this array. As a result it stores two separate arrays (same waste of space) and indexes instead of direct pointers (waste of space; slower). Since all entries are preallocated, then it becomes even more waste of space when SizeOf(Value) > SizeOf(Pointer).

 

 

Edited by balabuev
  • Like 1

Share this post


Link to post

Your assessment on the C# dictionary is incorrect - in fact it uses a pretty nice design - the actual "hashtable" is just an array of integer. Pretty compact. They store the indices to the other array where the actual items reside in and there are no gaps because those items are stored in a contiguous way. No wasting space if you have a low load factor. Yes, there is one indirection but usually, unless you have so many items that these two does not fit into cache anymore this is pretty darn fast. Collisions are being solved by linking the items that collided - can certainly argue there is some wasted space because all items that never were subject to a collision have that next pointer being unused. There are other implementations such as the one in python or the one we implemented in Spring4d (which is very similar to the one in Python) that uses probing with some factor to avoid clustering.

 

More on the C# implementation https://blog.markvincze.com/back-to-basics-dictionary-part-2-net-implementation/

Share this post


Link to post
Posted (edited)
4 hours ago, Stefan Glienke said:

Your assessment on the C# dictionary is incorrect

Why? I claimed that it uses same amount of fields like conventional hash table with heap allocated entries - and that's true. There is an array of indices - called "buckets", and yes, this is the main table array, which is analogous to array of pointers to entries (nothing "pretty compact"). There is the "next" field in each entry. So, the waste of space is roundly the same. The levels of indirections - are the same and even worse. You also ignored the fact that all entries are preallocated - thus even more waste of space.

 

All entries are stored continuously in a single array - better cache locality - yes, but I've not talked about this. Overall, this topic is a big question.  In real app we have a higly unpredictable sequence of memory accesses, because an app is really a big mess (graph) of heap allocated objects, and hash map access code is mixed with accesses of data from the other objects. And moreover, since hashmap has O(1) average access performance, you do not typically access more than a single entry at a time.

 

So, speaking more generally, cache locality in OOP language is more a task for memory manager. And overall it importance for real code is imho exaggerated.

 

 

Edited by balabuev

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×