pre-generic dictionary class

Attila Kovacs · February 25, 2020

14 minutes ago, Anders Melander said:

better to add unsorted and then sort at the end

The "Sorted" property changes the behavior how stringlist works.

If it's not set, it won't yield on duplicate values (looking up would take too much time), and Thomas obviously needs this function as he tries to mimic a TDictionary.

Edited February 25, 2020 by Attila Kovacs

dummzeuch · February 25, 2020

14 minutes ago, Anders Melander said:

Okay, but setting Sorted=True and then adding to the list will keep the list sorted; It's using binary search to find the insertion point so it's not that bad, but it might be better to add unsorted and then sort at the end. It depends on the dataset.

Obviously, as you pointed out yourself, I am adding duplicates, which the string list should ignore. (That actually is based on my "real world" usage of the list which I want to replace.) I know of no way to achieve that otherwise, or is there one that I don't know about?

Anders Melander · February 25, 2020

3 minutes ago, Attila Kovacs said:

Thomas obviously needs

IMO it's not that clear what he needs.

dupError is set which means that duplicates should cause an exception, but the code explicitly produce duplicates - and prevents them from being added.

Anyhow, I don't think I have anything to contribute on the topic that a quick google search couldn't have provided in the first place.

Anders Melander · February 25, 2020

7 minutes ago, dummzeuch said:

I know of no way to achieve that otherwise, or is there one that I don't know about?

Probably not, but either use Duplicates=dupIgnore or do the test manually. Don't do both.

Attila Kovacs · February 25, 2020

7 minutes ago, Anders Melander said:

IMO it's not that clear what he needs

I must have misunderstood the topic title. My apologies.

dummzeuch · February 25, 2020

50 minutes ago, Anders Melander said:

Probably not, but either use Duplicates=dupIgnore or do the test manually. Don't do both.

Good point, I added the manual test later to make THashedStringList work with the same code as a sorted TStringList. It shouldn't hurt performance though since the expensive search code is executed anyway to find the place for inserting the new string.

btw: I appreciate your feedback on the testing code, even if it might not have sounded like I did.

Edwin Yip · February 25, 2020

5 hours ago, Fr0sT.Brutal said:

I believe mORMot has the fastest implementation of everything :))

There also big libs like Griijy , TheUnknownOnes etc. Check https://github.com/Fr0sT-Brutal/awesome-pascal

Agreed.

Check TSynDictionary from SynCommons.pas of the mORMot framework. SynCommons.pas is self-contained without any dependency.

Thijs van Dien · February 26, 2020

I was surprised by @dummzeuch's benchmark results, so ran my own. In short, I created 44350 strings by taking 8870 English words and appending 1 to 5 to them, added half of them (index odd) to the data structure, to finally check for all of them if they were present. The experiment was repeated 100 times, and the listed results are the average total time for each action:

TStringList
- Add: 143.44 ms
- TryGet: 123.09 ms

TStringHash (Size = 256)

- Add: 23.58 ms

- TryGet: 62.56 ms

TStringHash (Size = 512)

- Add: 17.09 ms

- TryGet: 40.54 ms

TStringHash (Size = 1024)

- Add: 13.47 ms

- TryGet: 29.3 ms

TStringHash (Size = 2048)

- Add: 11.45 ms

- TryGet: 21.68 ms

TStringHash (Size = 4096)

- Add: 10.2 ms

- TryGet: 18.41 ms
TStringHash (Size = 8192)
- Add: 9.49 ms
- TryGet: 16.28 ms
TStringHash (Size = 16384)
- Add: 9.17 ms
- TryGet: 15.21 ms
TStringHash (Size = 32768)
- Add: 8.73 ms
- TryGet: 14.24 ms
TStringHash (Size = 65536)
- Add: 8.68 ms
- TryGet: 14.03 ms
TDictionary
- Add: 10.2 ms
- TryGet: 16.09 ms

TSynDictionary

- Add: 9.02 ms

- TryGet: 14.99 ms

If you're really interested, I can send you the code, but I think these results make sense. As long as the number of buckets is somewhat appropriate, a TStringHash outperforms TDictionary even. Possibly the former has a more specialized hash function, or simply does less other work (e.g. no accounting for reallocation). Note that I did make it fair by checking for duplicates in TStringHash too. If you don't need that, it becomes even faster.

Edited February 26, 2020 by Thijs van Dien

timfrost · February 26, 2020

I got similar results for loading (6.25ms) and finding (9.06ms) in a ternary search tree. And the StringHash(65536) load time for my list on my machine was 9.4ms. Looking at the StringHash source, the memory usage of both that and TST is the same order of magnitude, but the StringHash makes much more use of strings, of course. The TST meets my needs, and I have no plans to switch, mainly because my function builds in more load helpers and matching features, which fit with how I use it.

Georgge Bakh · March 1, 2020

My collections library based on pseudo-templates has a HashMap implementation. Besides of key and value types it can be parametrized with a hash function of the default one is not suitable for some reason.

https://github.com/casteng/tacl

Andrea Raimondi · March 14, 2020

Yo!

Use an interface and then create two different implementing classes, one for DXE+ with a TDictionary and another one using some minimal custom list. Then you compile it accordingly.

Sign In

pre-generic dictionary class

Recommended Posts

Attila Kovacs 693

Share this post

Link to post

dummzeuch 1720

Share this post

Link to post

Anders Melander 2114

Share this post

Link to post

Anders Melander 2114

Share this post

Link to post

Attila Kovacs 693

Share this post

Link to post

dummzeuch 1720

Share this post

Link to post

Edwin Yip 154

Share this post

Link to post

Thijs van Dien 10

Share this post

Link to post

timfrost 81

Share this post

Link to post

Georgge Bakh 29

Share this post

Link to post

Andrea Raimondi 15

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity