Jump to content
dummzeuch

pre-generic dictionary class

Recommended Posts

14 minutes ago, Anders Melander said:

better to add unsorted and then sort at the end

The "Sorted" property changes the behavior how stringlist works.

If it's not set, it won't yield on duplicate values (looking up would take too much time), and Thomas obviously needs this function as he tries to mimic a TDictionary.

Edited by Attila Kovacs

Share this post


Link to post
14 minutes ago, Anders Melander said:

Okay, but setting Sorted=True and then adding to the list will keep the list sorted; It's using binary search to find the insertion point so it's not that bad, but it might be better to add unsorted and then sort at the end. It depends on the dataset.

Obviously, as you pointed out yourself, I am adding duplicates, which the string list should ignore. (That actually is based on my "real world" usage of the list which I want to replace.) I know of no way to achieve that otherwise, or is there one that I don't know about?

Share this post


Link to post
3 minutes ago, Attila Kovacs said:

Thomas obviously needs

IMO it's not that clear what he needs.

dupError is set which means that duplicates should cause an exception, but the code explicitly produce duplicates - and prevents them from being added.

 

Anyhow, I don't think I have anything to contribute on the topic that a quick google search couldn't have provided in the first place.

Share this post


Link to post
7 minutes ago, dummzeuch said:

I know of no way to achieve that otherwise, or is there one that I don't know about?

Probably not, but either use Duplicates=dupIgnore or do the test manually. Don't do both.

Share this post


Link to post
7 minutes ago, Anders Melander said:

IMO it's not that clear what he needs

I must have misunderstood the topic title. My apologies.

  • Haha 1

Share this post


Link to post
50 minutes ago, Anders Melander said:

Probably not, but either use Duplicates=dupIgnore or do the test manually. Don't do both.

Good point, I added the manual test later to make THashedStringList work with the same code as a sorted TStringList. It shouldn't hurt performance though since the expensive search code is executed anyway to find the place for inserting the new string.

btw: I appreciate your feedback on the testing code, even if it might not have sounded like I did.

Share this post


Link to post

I was surprised by @dummzeuch's benchmark results, so ran my own. In short, I created 44350 strings by taking 8870 English words and appending 1 to 5 to them, added half of them (index odd) to the data structure, to finally check for all of them if they were present. The experiment was repeated 100 times, and the listed results are the average total time for each action:

 

TStringList
- Add: 143.44 ms
- TryGet: 123.09 ms

TStringHash (Size = 256)

- Add: 23.58 ms

- TryGet: 62.56 ms

TStringHash (Size = 512)

- Add: 17.09 ms

- TryGet: 40.54 ms

TStringHash (Size = 1024)

 - Add: 13.47 ms

- TryGet: 29.3 ms

TStringHash (Size = 2048)

- Add: 11.45 ms

- TryGet: 21.68 ms

TStringHash (Size = 4096)

- Add: 10.2 ms

- TryGet: 18.41 ms
TStringHash (Size = 8192)
- Add: 9.49 ms
- TryGet: 16.28 ms
TStringHash (Size = 16384)
- Add: 9.17 ms
- TryGet: 15.21 ms
TStringHash (Size = 32768)
- Add: 8.73 ms
- TryGet: 14.24 ms
TStringHash (Size = 65536)
- Add: 8.68 ms
- TryGet: 14.03 ms
TDictionary
- Add: 10.2 ms
- TryGet: 16.09 ms

TSynDictionary

- Add: 9.02 ms

- TryGet: 14.99 ms

 

If you're really interested, I can send you the code, but I think these results make sense. As long as the number of buckets is somewhat appropriate, a TStringHash outperforms TDictionary even. Possibly the former has a more specialized hash function, or simply does less other work (e.g. no accounting for reallocation). Note that I did make it fair by checking for duplicates in TStringHash too. If you don't need that, it becomes even faster.

Edited by Thijs van Dien
  • Thanks 1

Share this post


Link to post

I got similar results for loading (6.25ms) and finding (9.06ms) in a ternary search tree.  And the StringHash(65536) load time for my list on my machine was 9.4ms. Looking at the StringHash source, the memory usage of both that and TST is the same order of magnitude, but the StringHash makes much more use of strings, of course.  The TST meets my needs, and I have no plans to switch, mainly because my function builds in more load helpers and matching features, which fit with how I use it.

 

  • Like 1

Share this post


Link to post

Yo!

Use an interface and then create two different implementing classes, one for DXE+ with a TDictionary and another one using some minimal custom list. Then you compile it accordingly. 

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×