VSoft.Ulid - A Delphi Implementation of ULID for Delphi XE2 or later.

Vincent Parrett · August 5, 2024

Hi All

I created a Delphi implementation of ULID - A Universally Unique Lexicographically Sortable Identifier. ULID's are a better option than GUIDs where sorting is needed

128-bit compatibility with UUID
1.21e+24 unique ULIDs per millisecond
Lexicographically sortable!
Canonically encoded as a 26 character string, as opposed to the 36 character UUID
Uses Crockford's base32 for better efficiency and readability (5 bits per character)
Case insensitive
No special characters (URL safe)
Monotonic sort order (correctly detects and handles the same millisecond)

https://github.com/VSoftTechnologies/VSoft.Ulid

Vincent Parrett · August 5, 2024

Out of curiosity I decided to benchmark TUlid - using the excellent Spring4D Benchmark - some interesting results

32bit

64 bit

Not being a performance guru - I am guessing the difference in the TUlid.Create performance is down to the Random number generator - @Stefan Glienke

Kas Ob. · August 6, 2024

@Vincent Parrett I would suggest to benchmark the following

class function TUlid.InternalNewUlid(timestamp: UInt64): TUlid;
var
  random : UInt64;
  ts : Int64Rec absolute timestamp;
begin
  result := default(TUlId);
  //reverse order!
  result.FTimeStamp0 := ts.Bytes[5];
  result.FTimeStamp1 := ts.Bytes[4];
  result.FTimeStamp2 := ts.Bytes[3];
  result.FTimeStamp3 := ts.Bytes[2];
  result.FTimeStamp4 := ts.Bytes[1];
  result.FTimeStamp5 := ts.Bytes[0];
  random := FXorShift64.Next;
  //Move(random,result.FRandomness0, 2); // randomness 0-1
  //PWORD(@result.FRandomness0)^ := UInt16(random);      // the exact size
  PNativeUInt(@result.FRandomness0)^ := NativeUInt(random);  // it is safe to overflow here, might yeild better and simpler asm instruction
  random := FXorShift64.Next;
  //Move(random,result.FRandomness2, 8); // randomness 2-9
  PUInt64(@result.FRandomness0)^ := random;
end;

Removing the overhead of Move should count for something, i guess !!

Vincent Parrett · August 6, 2024

1 minute ago, Kas Ob. said:

@Vincent Parrett I would suggest to benchmark the following


  random := FXorShift64.Next;
  //Move(random,result.FRandomness0, 2); // randomness 0-1
  //PWORD(@result.FRandomness0)^ := UInt16(random);      // the exact size
  PNativeUInt(@result.FRandomness0)^ := NativeUInt(random);  // it is safe to overflow here, might yeild better and simpler asm instruction
  random := FXorShift64.Next;
  //Move(random,result.FRandomness2, 8); // randomness 2-9
  PUInt64(@result.FRandomness0)^ := random;

Removing the overhead of Move should count for something, i guess !!

@Stefan Glienke is way ahead of you already sent me a few suggestions. Using a variant record

    case Boolean of
        True: (
          FRandomness0 : byte;
          FRandomness1 : byte;
          FRandomness2 : byte;
          FRandomness3 : byte;
          FRandomness4 : byte;
          FRandomness5 : byte;
          FRandomness6 : byte;
          FRandomness7 : byte;
          FRandomness8 : byte;
          FRandomness9 : byte);
        False: (
          FRandomness0_1 : Word;
          FRandomness2_9 : UInt64);

Did squeeze a out few ns better performance.

I'll test your idea though - because micro optimising is more fun the boring stuff I was working on... 😉

Kas Ob. · August 6, 2024

Here another idea.

The implementation require 2 consequence random then calculate them in one hit, instead of calling FXorShift64.Next in the middle, in other words add/use Random1 and Random2, use only one Next.

Vincent Parrett · August 6, 2024

2 hours ago, Kas Ob. said:

in other words add/use Random1 and Random2, use only one Next.

This made no discernable difference. Your version of avoiding the move was slightly faster than the variant array idea.

Thanks to a bunch of suggestions from Stefan & Kas it is now a lot faster than the c# version I was comparing it to (64bit create - 63ns) - I did also add some validation to the parse method which slowed it down again!

The big speedup in the Parse method was avoiding TEncoding.GetBytes - which uses a dynamic array (slow).

D 11.3 - 32bit
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
TUlid.Create                   52.1 ns         51.6 ns     10000000
TUlid.Parse - AnsiString       28.4 ns         28.5 ns     23578947
TUlid.Parse                    33.7 ns         33.0 ns     20363636
TUlid.ToString                 32.0 ns         32.5 ns     23578947

D 11.3 64bit
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
TUlid.Create                   10.9 ns         11.0 ns     64000000
TUlid.Parse - AnsiString       28.1 ns         28.5 ns     23578947
TUlid.Parse                    29.2 ns         29.5 ns     24888889
TUlid.ToString                 29.4 ns         29.3 ns     22400000

D12.1 32bit
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
TUlid.Create                   51.2 ns         51.6 ns     10000000
TUlid.Parse - AnsiString       28.5 ns         28.3 ns     24888889
TUlid.Parse                    30.2 ns         30.5 ns     23578947
TUlid.ToString                 30.2 ns         29.3 ns     22400000

D 12.1 64bit
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
TUlid.Create                   8.94 ns         8.79 ns     74666667
TUlid.Parse - AnsiString       28.1 ns         28.5 ns     26352941
TUlid.Parse                    32.8 ns         32.2 ns     21333333
TUlid.ToString                 33.0 ns         33.0 ns     20363636

Anyway, that's more than fast enough - given that this is more than likely used to generate id's for use with databases or in messaging - it's unlikely you would ever hit a performance issue with this!

https://github.com/VSoftTechnologies/VSoft.Ulid/tree/f-optimise

I'll merge into main shortly.

Edit : forgot to mention, the reason 32bit is still slower than 64bit is due to way the compiler does Int64 division (calls a helper function).

Edited August 6, 2024 by Vincent Parrett
addition

Kas Ob. · August 6, 2024

18 minutes ago, Vincent Parrett said:

3 hours ago, Kas Ob. said:

in other words add/use Random1 and Random2, use only one Next.

This made no discernable difference.

It is expected with your modern CPU, but the difference will be noticeable when the CPU under high usage, this is just for information and you don't need to do it, i liked to share.

Anyway, on my XE8 i found one of these loch ness monsters, the one that i saw in real life and in dreams, the ones that very hard to catch red handed, when the compiler throw the towel and silently produce unoptimized ugly assembly instructions.

This is reproducible on my XE8 and would love to know if this case is still in newer versions:

1) create new and empty console applicaiton.

2) add VSoft.Ulid

3) put this line in the main procedure,

uses
  System.SysUtils,
  VSoft.Ulid;

begin
  TUlid.Create;  // we don't care now for memory leak or whatevery
end.

4) add 64-bit platform and enable optimize, keep stack frame enabled.

5) use this create, the same i used earlier

class function TUlid.InternalNewUlid(timestamp: UInt64): TUlid;
var
  random : UInt64;
  ts : Int64Rec absolute timestamp;
begin
  result := default(TUlId);
  //reverse order!
  result.FTimeStamp0 := ts.Bytes[5];
  result.FTimeStamp1 := ts.Bytes[4];
  result.FTimeStamp2 := ts.Bytes[3];
  result.FTimeStamp3 := ts.Bytes[2];
  result.FTimeStamp4 := ts.Bytes[1];
  result.FTimeStamp5 := ts.Bytes[0];
  random := FXorShift64.Next;

  PNativeUInt(@result.FRandomness0)^ := NativeUInt(random);
  random := FXorShift64.Next;
  Move(random,result.FRandomness2, 8); //  comment and uncomment this line and see the difference in assembly code
  PUInt64(@result.FRandomness0)^ := random;
end;

The result is astonishing !

Here the difference from my XE8

The compiler with move behaved differently and stupidly.

One side note this is also really annoying, though i don't know if this still the same for the newer and enhanced versions

image.png.f9f26e21e43c5d6f7f8d496156720064.png

it could be handled way better for these limited size like up to 128bit.

Vincent Parrett · August 6, 2024

You have a typo

  PUInt64(@result.FRandomness0)^ := random; << should be FRandomness2

I did actually implement that and it did make a big difference - it's in the f-optimise branch - I haven't merged the changes in to main just yet.

I also removed the call to default(TUlid) since we're setting all the fields anyway (as pointed out by Stefan!).

I would say that codegen on later versions of delphi is only slightly better than XE8 - mostly it's improved rtl methods. I'm no assembly expert though so take that comment with a pinch of salt. 12.1 is slightly faster due to some div by const optimizations (again thanks to Stefan).

Kas Ob. · August 6, 2024

You still can (and may be should) use the record variant as Stefan suggested, only extended it a little from case boolean to case integer with 0,1 and 2 where you can align the needed types, in the third (additional) case just put

    case Integer of
        0: (
          FRandomness0 : byte;
          FRandomness1 : byte;
          FRandomness2 : byte;
          FRandomness3 : byte;
          FRandomness4 : byte;
          FRandomness5 : byte;
          FRandomness6 : byte;
          FRandomness7 : byte;
          FRandomness8 : byte;
          FRandomness9 : byte);
        1: (
          FRandomness0_1 : Word;
          FRandomness2_9 : UInt64)
        2: (
          FRandomness0_N : NativeUInt);

There is no need for padding, the compiler should take care of that.

ap2021 · Friday at 05:33 AM

Good stuff! And a couple of additional points:

1) You could use hardware-generated random seed from either a) CPU or b) the Trusted Platform Module, although as it can be supplied in the constructor, it's immaterial.

2) It would really add a lot of value to have a Timestamp property or function and maybe also UTCTimestamp property as well, possibly also 2 of each: returning TDateTime and integer Unix timestamps. Ideally as class functions, doing TryParse internally on a supplied ULID. - this way, the data with ULID primary keys can be directly used for plotting things on a real time scale.

BTW, have you tested it under high-concurrency loads? - i.e.: for the uniqueness of the results, or anything unexpected, like '', or '0' and some such results?

Vincent Parrett · Friday at 08:14 AM

2 hours ago, ap2021 said:

1) You could use hardware-generated random seed from either a) CPU or b) the Trusted Platform Module, although as it can be supplied in the constructor, it's immaterial.

I have no idea how to do that.

2 hours ago, ap2021 said:

2) It would really add a lot of value to have a Timestamp property or function and maybe also UTCTimestamp property as well, possibly also 2 of each: returning TDateTime and integer Unix timestamps. Ideally as class functions, doing TryParse internally on a supplied ULID. - this way, the data with ULID primary keys can be directly used for plotting things on a real time scale.

Happy to take a pull request to implement this. FWIW My UUID v7 library does expose the UTCTimestamp. It uses a v54 UUID under the hood to create the random part - which on windows uses the windows cng crypto library.

2 hours ago, ap2021 said:

BTW, have you tested it under high-concurrency loads? - i.e.: for the uniqueness of the results, or anything unexpected, like '', or '0' and some such results?

Other than a quick n dirty test application, no.

ap2021 · Saturday at 07:42 AM

Vincent, thanks, I saw you have created an issue yourself (and I still have no account for GitHub). Here's a public converter, which may be helpful: https://ugai.github.io/ulid-timestamp-converter/

As for the random numbers, CPU is supposed to have RDRAND and RDSEED instructions, although they are flagged in Delphi asm as invalid - they do not keep up, but I assume that that can be plugged with the bytes in a DB statement - it's at the edge of my experience, so I'm not pushing. It's probably different for x32 vs. x64 and could malfunction cross-platform because of parameter passing conventions vs. the register(s) it uses. Apparently it has its own limitations and could return 0, if not ready, so a fallback would be required anyway. All hardware random numbers are also slow, so this only makes sense for initial seeding. And they all have adverse reports, so some may prefer one over the other, so it's probably best left to the caller do do what they want.

DelphiUdIT · Saturday at 09:30 AM

On 10/3/2025 at 10:14 AM, Vincent Parrett said:

I have no idea how to do that.

If you want to use RDRAND and RDSEEK, I attach the sources so you can see how they works.

These is another topic where we talked about: https://en.delphipraxis.net/topic/10271-getting-rdseed-with-delphi/?do=findComment&comment=81868

These two instructions works in Intel and Amd CPU only (x86 or x64). In Arm they were introduced recently and the behave in slightly different way.

In the source there are some lines commented that you can delete.

These have been in use for at least 4 years in all my production programs without ever having given me any problems. They have also been in use and constantly monitored for a year on a test web server (to check the workload and performance in a multithreaded environment).

Unit11.zip

Sign In

VSoft.Ulid - A Delphi Implementation of ULID for Delphi XE2 or later.

Recommended Posts

Vincent Parrett 909

Share this post

Link to post

Vincent Parrett 909

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Vincent Parrett 909

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Vincent Parrett 909

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Vincent Parrett 909

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

ap2021 0

Share this post

Link to post

Vincent Parrett 909

Share this post

Link to post

ap2021 0

Share this post

Link to post

DelphiUdIT 267

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity