Vincent Parrett 909 Posted August 5, 2024 Hi All I created a Delphi implementation of ULID - A Universally Unique Lexicographically Sortable Identifier. ULID's are a better option than GUIDs where sorting is needed 128-bit compatibility with UUID 1.21e+24 unique ULIDs per millisecond Lexicographically sortable! Canonically encoded as a 26 character string, as opposed to the 36 character UUID Uses Crockford's base32 for better efficiency and readability (5 bits per character) Case insensitive No special characters (URL safe) Monotonic sort order (correctly detects and handles the same millisecond) https://github.com/VSoftTechnologies/VSoft.Ulid 6 2 Share this post Link to post
Vincent Parrett 909 Posted August 5, 2024 Out of curiosity I decided to benchmark TUlid - using the excellent Spring4D Benchmark - some interesting results 32bit 64 bit Not being a performance guru - I am guessing the difference in the TUlid.Create performance is down to the Random number generator - @Stefan Glienke Share this post Link to post
Kas Ob. 160 Posted August 6, 2024 @Vincent Parrett I would suggest to benchmark the following class function TUlid.InternalNewUlid(timestamp: UInt64): TUlid; var random : UInt64; ts : Int64Rec absolute timestamp; begin result := default(TUlId); //reverse order! result.FTimeStamp0 := ts.Bytes[5]; result.FTimeStamp1 := ts.Bytes[4]; result.FTimeStamp2 := ts.Bytes[3]; result.FTimeStamp3 := ts.Bytes[2]; result.FTimeStamp4 := ts.Bytes[1]; result.FTimeStamp5 := ts.Bytes[0]; random := FXorShift64.Next; //Move(random,result.FRandomness0, 2); // randomness 0-1 //PWORD(@result.FRandomness0)^ := UInt16(random); // the exact size PNativeUInt(@result.FRandomness0)^ := NativeUInt(random); // it is safe to overflow here, might yeild better and simpler asm instruction random := FXorShift64.Next; //Move(random,result.FRandomness2, 8); // randomness 2-9 PUInt64(@result.FRandomness0)^ := random; end; Removing the overhead of Move should count for something, i guess !! Share this post Link to post
Vincent Parrett 909 Posted August 6, 2024 1 minute ago, Kas Ob. said: @Vincent Parrett I would suggest to benchmark the following random := FXorShift64.Next; //Move(random,result.FRandomness0, 2); // randomness 0-1 //PWORD(@result.FRandomness0)^ := UInt16(random); // the exact size PNativeUInt(@result.FRandomness0)^ := NativeUInt(random); // it is safe to overflow here, might yeild better and simpler asm instruction random := FXorShift64.Next; //Move(random,result.FRandomness2, 8); // randomness 2-9 PUInt64(@result.FRandomness0)^ := random; Removing the overhead of Move should count for something, i guess !! @Stefan Glienke is way ahead of you already sent me a few suggestions. Using a variant record case Boolean of True: ( FRandomness0 : byte; FRandomness1 : byte; FRandomness2 : byte; FRandomness3 : byte; FRandomness4 : byte; FRandomness5 : byte; FRandomness6 : byte; FRandomness7 : byte; FRandomness8 : byte; FRandomness9 : byte); False: ( FRandomness0_1 : Word; FRandomness2_9 : UInt64); Did squeeze a out few ns better performance. I'll test your idea though - because micro optimising is more fun the boring stuff I was working on... 😉 1 1 Share this post Link to post
Kas Ob. 160 Posted August 6, 2024 Here another idea. The implementation require 2 consequence random then calculate them in one hit, instead of calling FXorShift64.Next in the middle, in other words add/use Random1 and Random2, use only one Next. Share this post Link to post
Vincent Parrett 909 Posted August 6, 2024 (edited) 2 hours ago, Kas Ob. said: in other words add/use Random1 and Random2, use only one Next. This made no discernable difference. Your version of avoiding the move was slightly faster than the variant array idea. Thanks to a bunch of suggestions from Stefan & Kas it is now a lot faster than the c# version I was comparing it to (64bit create - 63ns) - I did also add some validation to the parse method which slowed it down again! The big speedup in the Parse method was avoiding TEncoding.GetBytes - which uses a dynamic array (slow). D 11.3 - 32bit ------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------- TUlid.Create 52.1 ns 51.6 ns 10000000 TUlid.Parse - AnsiString 28.4 ns 28.5 ns 23578947 TUlid.Parse 33.7 ns 33.0 ns 20363636 TUlid.ToString 32.0 ns 32.5 ns 23578947 D 11.3 64bit ------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------- TUlid.Create 10.9 ns 11.0 ns 64000000 TUlid.Parse - AnsiString 28.1 ns 28.5 ns 23578947 TUlid.Parse 29.2 ns 29.5 ns 24888889 TUlid.ToString 29.4 ns 29.3 ns 22400000 D12.1 32bit ------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------- TUlid.Create 51.2 ns 51.6 ns 10000000 TUlid.Parse - AnsiString 28.5 ns 28.3 ns 24888889 TUlid.Parse 30.2 ns 30.5 ns 23578947 TUlid.ToString 30.2 ns 29.3 ns 22400000 D 12.1 64bit ------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------- TUlid.Create 8.94 ns 8.79 ns 74666667 TUlid.Parse - AnsiString 28.1 ns 28.5 ns 26352941 TUlid.Parse 32.8 ns 32.2 ns 21333333 TUlid.ToString 33.0 ns 33.0 ns 20363636 Anyway, that's more than fast enough - given that this is more than likely used to generate id's for use with databases or in messaging - it's unlikely you would ever hit a performance issue with this! https://github.com/VSoftTechnologies/VSoft.Ulid/tree/f-optimise I'll merge into main shortly. Edit : forgot to mention, the reason 32bit is still slower than 64bit is due to way the compiler does Int64 division (calls a helper function). Edited August 6, 2024 by Vincent Parrett addition 1 Share this post Link to post
Kas Ob. 160 Posted August 6, 2024 18 minutes ago, Vincent Parrett said: 3 hours ago, Kas Ob. said: in other words add/use Random1 and Random2, use only one Next. This made no discernable difference. It is expected with your modern CPU, but the difference will be noticeable when the CPU under high usage, this is just for information and you don't need to do it, i liked to share. Anyway, on my XE8 i found one of these loch ness monsters, the one that i saw in real life and in dreams, the ones that very hard to catch red handed, when the compiler throw the towel and silently produce unoptimized ugly assembly instructions. This is reproducible on my XE8 and would love to know if this case is still in newer versions: 1) create new and empty console applicaiton. 2) add VSoft.Ulid 3) put this line in the main procedure, uses System.SysUtils, VSoft.Ulid; begin TUlid.Create; // we don't care now for memory leak or whatevery end. 4) add 64-bit platform and enable optimize, keep stack frame enabled. 5) use this create, the same i used earlier class function TUlid.InternalNewUlid(timestamp: UInt64): TUlid; var random : UInt64; ts : Int64Rec absolute timestamp; begin result := default(TUlId); //reverse order! result.FTimeStamp0 := ts.Bytes[5]; result.FTimeStamp1 := ts.Bytes[4]; result.FTimeStamp2 := ts.Bytes[3]; result.FTimeStamp3 := ts.Bytes[2]; result.FTimeStamp4 := ts.Bytes[1]; result.FTimeStamp5 := ts.Bytes[0]; random := FXorShift64.Next; PNativeUInt(@result.FRandomness0)^ := NativeUInt(random); random := FXorShift64.Next; Move(random,result.FRandomness2, 8); // comment and uncomment this line and see the difference in assembly code PUInt64(@result.FRandomness0)^ := random; end; The result is astonishing ! Here the difference from my XE8 The compiler with move behaved differently and stupidly. One side note this is also really annoying, though i don't know if this still the same for the newer and enhanced versions it could be handled way better for these limited size like up to 128bit. Share this post Link to post
Vincent Parrett 909 Posted August 6, 2024 You have a typo PUInt64(@result.FRandomness0)^ := random; << should be FRandomness2 I did actually implement that and it did make a big difference - it's in the f-optimise branch - I haven't merged the changes in to main just yet. I also removed the call to default(TUlid) since we're setting all the fields anyway (as pointed out by Stefan!). I would say that codegen on later versions of delphi is only slightly better than XE8 - mostly it's improved rtl methods. I'm no assembly expert though so take that comment with a pinch of salt. 12.1 is slightly faster due to some div by const optimizations (again thanks to Stefan). Share this post Link to post
Kas Ob. 160 Posted August 6, 2024 You still can (and may be should) use the record variant as Stefan suggested, only extended it a little from case boolean to case integer with 0,1 and 2 where you can align the needed types, in the third (additional) case just put case Integer of 0: ( FRandomness0 : byte; FRandomness1 : byte; FRandomness2 : byte; FRandomness3 : byte; FRandomness4 : byte; FRandomness5 : byte; FRandomness6 : byte; FRandomness7 : byte; FRandomness8 : byte; FRandomness9 : byte); 1: ( FRandomness0_1 : Word; FRandomness2_9 : UInt64) 2: ( FRandomness0_N : NativeUInt); There is no need for padding, the compiler should take care of that. Share this post Link to post
ap2021 0 Posted Friday at 05:33 AM Good stuff! And a couple of additional points: 1) You could use hardware-generated random seed from either a) CPU or b) the Trusted Platform Module, although as it can be supplied in the constructor, it's immaterial. 2) It would really add a lot of value to have a Timestamp property or function and maybe also UTCTimestamp property as well, possibly also 2 of each: returning TDateTime and integer Unix timestamps. Ideally as class functions, doing TryParse internally on a supplied ULID. - this way, the data with ULID primary keys can be directly used for plotting things on a real time scale. BTW, have you tested it under high-concurrency loads? - i.e.: for the uniqueness of the results, or anything unexpected, like '', or '0' and some such results? Share this post Link to post
Vincent Parrett 909 Posted Friday at 08:14 AM 2 hours ago, ap2021 said: 1) You could use hardware-generated random seed from either a) CPU or b) the Trusted Platform Module, although as it can be supplied in the constructor, it's immaterial. I have no idea how to do that. 2 hours ago, ap2021 said: 2) It would really add a lot of value to have a Timestamp property or function and maybe also UTCTimestamp property as well, possibly also 2 of each: returning TDateTime and integer Unix timestamps. Ideally as class functions, doing TryParse internally on a supplied ULID. - this way, the data with ULID primary keys can be directly used for plotting things on a real time scale. Happy to take a pull request to implement this. FWIW My UUID v7 library does expose the UTCTimestamp. It uses a v54 UUID under the hood to create the random part - which on windows uses the windows cng crypto library. 2 hours ago, ap2021 said: BTW, have you tested it under high-concurrency loads? - i.e.: for the uniqueness of the results, or anything unexpected, like '', or '0' and some such results? Other than a quick n dirty test application, no. Share this post Link to post
ap2021 0 Posted Saturday at 07:42 AM Vincent, thanks, I saw you have created an issue yourself (and I still have no account for GitHub). Here's a public converter, which may be helpful: https://ugai.github.io/ulid-timestamp-converter/ As for the random numbers, CPU is supposed to have RDRAND and RDSEED instructions, although they are flagged in Delphi asm as invalid - they do not keep up, but I assume that that can be plugged with the bytes in a DB statement - it's at the edge of my experience, so I'm not pushing. It's probably different for x32 vs. x64 and could malfunction cross-platform because of parameter passing conventions vs. the register(s) it uses. Apparently it has its own limitations and could return 0, if not ready, so a fallback would be required anyway. All hardware random numbers are also slow, so this only makes sense for initial seeding. And they all have adverse reports, so some may prefer one over the other, so it's probably best left to the caller do do what they want. Share this post Link to post
DelphiUdIT 267 Posted Saturday at 09:30 AM On 10/3/2025 at 10:14 AM, Vincent Parrett said: I have no idea how to do that. If you want to use RDRAND and RDSEEK, I attach the sources so you can see how they works. These is another topic where we talked about: https://en.delphipraxis.net/topic/10271-getting-rdseed-with-delphi/?do=findComment&comment=81868 These two instructions works in Intel and Amd CPU only (x86 or x64). In Arm they were introduced recently and the behave in slightly different way. In the source there are some lines commented that you can delete. These have been in use for at least 4 years in all my production programs without ever having given me any problems. They have also been in use and constantly monitored for a year on a test web server (to check the workload and performance in a multithreaded environment). Unit11.zip Share this post Link to post