Jump to content

Recommended Posts

Implementing a fast hash table has way more and more important implications than picking the proper size and grow factor.

 

Oh and proving that something is faster than another thing is far from trivial (as with most benchmarks).

 

Interesting read: https://probablydance.com/2017/02/26/i-wrote-the-fastest-hashtable/

Edited by Stefan Glienke
  • Like 3

Share this post


Link to post

I would expect the hash generator you apply to your key would be pretty significant.

 

My hashing needs are typically focused on strings shorter than 100-something chars, Int32 references, or Int64 references, typically varying from a few hundred to less than twenty thousand elements.

Share this post


Link to post
Guest
11 minutes ago, Lars Fosdal said:

I would expect the hash generator you apply to your key would be pretty significant.

That is relative, but for your short 100-char something, i recommend to go with xxHash https://github.com/Xor-el/xxHashPascal

 

20 minutes ago, Stefan Glienke said:

Oh and proving that something is faster than another thing is far from trivial (as with most benchmarks).

I disagree, wasn't the reason that Delphi won hearts away from VB is speed ?

Now Java is some places comparable to Delphi in performance, and C# .net competing with it in every aspect, while C# native leaving Delphi to bite the dust.

 

The question is : Why we should be concerned about this not the R&D in Embarcadero ?

Can you imagine if there wasn't FastMM and FastCode project ? back in days .

Show me real application Client/Server with real and huge data though put, build out of the box with Delphi / RAD , there is none, and don;t fool your self the real application that can compete in this world is either heavily optimized with parts compiled with FPC and other compilers, or .........i am tired now !

 

Hell, do you know that the LLVM part of the RAD IDE is compiled with Visual Studio ?!!!!

Seattle is compiled Visual Studio 2010 , from ExeInfo "Microsoft Visual C++ v.10 DLL ( 8B ) Visual Studio 2010 - Name : Embarcadero cross-platform compiler backend."

XE8 like Seattle, same compiler !

And i really don't think they changed much with latest IDE's, away from might or might not be compilable by CBuilder, if it does then it will be useless in speed, i am inclined to doubt they are using the latest LLVM repository.

 

So yes those optimization that done by the community are important, may be......., may be some day when those optimization will reach some ""Embarcadero R&D"" ( extra double quote) and it will be in "copy and paste" format, they decide to pasted it there, like how things is going for years, they are trying to optimize the themes and low and behold it is not working as it should, what a ** ! (just two letters)

 

 

Always search and read and test, benchmark and be forever in search for knowledge, don't be like Emabarcadero.  an advise for everyone.

 

To add a beautiful and working themes to your VCL use 3rd party components, there is few and personally i think all would work just fine, and in all cases you are one email from getting your problem fixed, personally i use AlphaSkins for more than 13 years and there was many problem over those years and all was fixed in one email.

 

There is many projects that gave speed consideration and worth using or at least browsing, DWScript , and of course the mother source of optimization (all the libraries from) https://github.com/synopse , i just can imagine the amount of hours had been poured there, many libraries and sources out there.

 

Sorry Stefan, but i am really not attacking any one or any personal view, only pointing of the wrong direction of Embarcadero and when are they gonna change such policy ( i mean head in sand ) "it looks like it is working, ship it"

Share this post


Link to post
29 minutes ago, Kas Ob. said:

I disagree, wasn't the reason that Delphi won hearts away from VB is speed ?

You missed my point (I blame the language barrier): I was saying that comparing different implementations for some algorithms (such as hash tables) are not trivial to benchmark and compare - I have seen (and written before I knew better) too many biased benchmarks testing only fractions, ignoring certain aspects, using specific data, simply comparing apples and oranges or last but not least being fooled by hardware effects like branch predictors or cache behavior.

And that is regardless the programming language.

Edited by Stefan Glienke
  • Like 1

Share this post


Link to post

This is not the final name for the thread, it's just all I came up with in my ignorance. I am open for improvements.

  • Like 1

Share this post


Link to post
7 hours ago, Kas Ob. said:

Just ran a TDictionary comparison using Murmor3 v TxxHash32.CalculateHash32:

    One Million (xxHash,PreAlloc):   01.446-8681
    One Million (Murmor3,PreAlloc):  00.743-3334

    Four Million (Murmor3):          04.895-7561
    Four Million (Murmor3,PreAlloc): 03.130-7519
    Four Million (xxHash,PreAlloc) : 04.491-7058

Your mileage may vary..

Edited by FredS
added link to Murmor3 source

Share this post


Link to post
Guest
7 minutes ago, FredS said:

Just ran a TDictionary comparison using Murmor3 (Assembly) v TxxHash32.CalculateHash32:


    One Million (xxHash,PreAlloc):   01.446-8681
    One Million (Murmor3,PreAlloc):  00.743-3334

    Four Million (Murmor3):          04.895-7561
    Four Million (Murmor3,PreAlloc): 03.130-7519
    Four Million (xxHash,PreAlloc) : 04.491-7058

Your mileage may vary..

Theoretically xxHash should perform better, but Murmor is good too, can you please repeat your test with FPC ? if you have it installed of course. with optimization enabled for FPC

Share this post


Link to post
24 minutes ago, Kas Ob. said:

can you please repeat your test with FPC ?

Not installed.

 

There is mention that xxHash32 is slower on x64, which is most likely the reason..
Equality Comparer in System.Generics.Defaults is limited to a 32bit hash..

Share this post


Link to post
Guest

lets try something else

Would you please repeat your test with small change?

 

replace RotateLeft32 with this version and remove inline from declaration ( because you know, inlining assembly is something no compiler ever did )

class function TxxHash32.RotateLeft32(value: LongWord; count: Integer) : LongWord;
asm
  mov ecx,count
  rol value,cl
end;

 

Share this post


Link to post
18 minutes ago, Kas Ob. said:

replace RotateLeft32

 

    Four Million (Murmor3,PreAlloc): 03.021-4807
    Four Million (xxHash,PreAlloc):  12.250-4320

Enough play time, if you can get xxHash32 to beat Murmor3 let me know 🙂

Share this post


Link to post
Guest

Thank you, it is nice to know that Murmor3 is faster than xxHash.

 

 

Share this post


Link to post

first of all, I must intercept here.
MurmurHash is not faster than xxHash, never.
That implementation been used is naive and has never been optimized.
I suggest you use the version in HashLib4Pascal as that has been optimized over time.

 

Benchmarks

 

Results on My PC are attached below.

As you can see in the benchmark, xxHash blows MurmurHash out of the waters especially in FPC. :classic_biggrin:

 

Delphi.jpg

FPC.jpg

Edited by Ugochukwu Mmaduekwe

Share this post


Link to post
Guest

@Ugochukwu Mmaduekwe That is more than nice !

 

Now i see you are person appreciates speed too, so here thing or two to read, hope you will enjoy.

https://lemire.me/blog/2019/03/19/the-fastest-conventional-random-number-generator-that-can-pass-big-crush/

https://lemire.me/blog/2019/03/20/arm-and-intel-have-different-performance-characteristics-a-case-study-in-random-number-generation/

https://github.com/wangyi-fudan/wyhash

 

I myself use that wyHash random number generator extensively, but not as cryptographically secure random number generator, not yet anyway, i combine it with different beast.

 

Anyway it is small and very fast and very portable.

 

I hope you enjoy reading about it and implementing it !

Share this post


Link to post

@Ugochukwu Mmaduekwe

 

I am probably doing it wrong but using either TMurmurHash3_x86_32 or TXXHash32 in a dictionary equality comparer for integer performs super poor compared to either the THashBobJenkins.GetHashValue from the RTL or the FNV1A_Hash_Meiyan from FastCompare.

That is obviously because of the rather heavy OOP architecture which causes several (virtual) calls and allocations for every Compute call.

 

This is what I put into my GetHashCode function for the equality comparer:

 

hash.ComputeUntyped(value, SizeOf(Integer)).GetHashCode; // hash is an IHash instance where I created a TXXHash32 for.

Edited by Stefan Glienke

Share this post


Link to post
21 minutes ago, Stefan Glienke said:

hash.ComputeUntyped(value, SizeOf(Integer)).GetHashCode; // hash is an IHash instance where I created a TXXHash32 for.

@Stefan Glienke

The above should be

hash.ComputeUntyped(value, SizeOf(Integer)).GetUInt32; // hash is an IHash instance where I created a TXXHash32 for.

//or

hash.ComputeUntyped(value, SizeOf(Integer)).GetInt32; // hash is an IHash instance where I created a TXXHash32 for.

Also, you could use TXXHash32  and avoid the whole slowdown associated with interface virtual calls but then you will have to manage the memory yourself.
doing what I described above then you will have

 

hash.ComputeUntyped(value, SizeOf(Integer)).GetUInt32; // where hash is a THash instance.

Could you please try this and post your results?

Edited by Ugochukwu Mmaduekwe

Share this post


Link to post

Still like >20times slower than using the one from your xxHash32 unit.

 

THash.TransformUntyped has huge overhead with that dynamic array and the implicit finally associated with it and probably way more stuff going on while the TxxHash32.CalculateHash32 only works with some variables on the stack.

 

BTW for the xxHash32 unit: at least the Delphi compiler will never ever inline RotateLeft32 as the body comes after the call and in such situations it never inlines (see http://docwiki.embarcadero.com/RADStudio/Rio/en/Calling_Procedures_and_Functions_(Delphi)#Using_the_inline_Directive)

Share this post


Link to post
10 minutes ago, Stefan Glienke said:

Still like >20times slower than using the one from your xxHash32 unit.

 

THash.TransformUntyped has huge overhead with that dynamic array and the implicit finally associated with it and probably way more stuff going on while the TxxHash32.CalculateHash32 only works with some variables on the stack.

 

BTW for the xxHash32 unit: at least the Delphi compiler will never ever inline RotateLeft32 as the body comes after the call and in such situations it never inlines (see http://docwiki.embarcadero.com/RADStudio/Rio/en/Calling_Procedures_and_Functions_(Delphi)#Using_the_inline_Directive)

>20times slower is really strange as that is not what happens on my end.

as regards the dynamic array, I am aware it has it's overhead but I added it since it gives people the ability to adjust the buffer size to one of their choice.

Edited by Ugochukwu Mmaduekwe

Share this post


Link to post

Keep in mind that I have a dict<int, int> so all I do is hash an integer - so the overhead of each gethashcode call is significant for the overall result. If I would hash like some larger data the overhead shrinks - that's what I mentioned earlier about benchmarks and comparing things 🙂

  • Like 1

Share this post


Link to post
5 minutes ago, Stefan Glienke said:

Keep in mind that I have a dict<int, int> so all I do is hash an integer - so the overhead of each gethashcode call is significant for the overall result. If I would hash like some larger data the overhead shrinks - that's what I mentioned earlier about benchmarks and comparing things 🙂

so for your use case, the other leaf xxHash unit will be more suitable after some optimizations due to it's simplistic stack based approach

Share this post


Link to post
6 minutes ago, Ugochukwu Mmaduekwe said:

so for your use case, the other leaf xxHash unit will be more suitable after some optimizations due to it's simplistic stack based approach

Well technically for all use cases of a hash function for a hash table unless the data you are building a hash for is super large (which is rather rare I think) - which is what this topic started with.

 

Anyway that is why I asked FredS for his benchmark code to be able to compare the same things.

Edited by Stefan Glienke

Share this post


Link to post
7 hours ago, Stefan Glienke said:

or we are talking apples and orange

How come I always get homework when I come here 🙂

 

The code used was originally done to test a home rolled implementation of 'AddIfNotExists' in Berlin.
In short that particular test won't do you any good.

 

But here are the steps used:

  1. Turned OFF Overflow and Range checking in xxHash32
  2. Changed the test from using the default Jenkins hash to use (that) xxHash32.
  3. This was done by copying the TDelegatedEqualityComparer used with Murmor3 and changing a single line:
    Result := Integer(TxxHash32.CalculateHash32(PChar(Value)^, Length(Value) * CharSize));
    // 
    Result := Murmor3(PChar(Value)^, Length(Value) * CharSize, 0);

     

  4. The test itself is based on a TDictionary<string,integer> where the string is a GUID appended with the for loop value.
    This was originally done since one version of Murmor had issues with collisions when the first part of the string was identical.

That's it, as before: your mileage may vary..

 

Share this post


Link to post

Thank you. I am just trying to put test results into perspective (*) - just posting some numbers is almost useless until one knows what was tested. That's all I wanted to know.

 

Especially when then people compare different hash algorithms based on how much throughput they have for large data - which is interesting but almost irrelevant for hashtables.

 

(*) Reason is I am still evalutating if it might be worthwhile to also drop System.Generics.Defaults from Spring.Collections and roll our own comparers with more optimized code.

Edited by Stefan Glienke

Share this post


Link to post
11 minutes ago, Stefan Glienke said:

roll our own comparers

Yup, especially if true that xxHash64 on x64 outperforms..

Share this post


Link to post
4 hours ago, Stefan Glienke said:

compare different hash algorithms

 

I haven't looked at Fastcode for ages, today I tried out the 'System.Generics.FastDefaults' unit only to find that the implementation of Murmur3 has changed.

The latest one actually did NOT work at all for me...
 

The one used in my tests is from the Initial commit:

Alpha version 0.1

https://github.com/JBontes/FastCode/commit/7c47518e93870aaf155e70b6f43e3ba1dfb93898

 

Edit:

Turned out that Berlin needed a restart, results of the same test (NewM3 being the one in the current master):

Four Million (Murmor3,PreAlloc): 03.124-1398
Four Million (NewM3,PreAlloc):   03.078-7503

 

Edited by FredS

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×