Jump to content
Sign in to follow this  
Arnaud Bouchez

New asm for AES-HASH, AES-PRNG, AES-CTR and AES-GCM for Delphi/FPC

Recommended Posts

Two blog posts to share:






TL&DR: new AES assembly code burst AES-CTR AES-GCM AES-PRNG and AES-HASH implementation, especially on x86_64, for mORMot 2.
It outperforms OpenSSL for AES-CTR and AES-PRNG, and is magnitude times faster than every other Delphi library I know about.

  • Like 5

Share this post

Link to post
6 hours ago, Arnaud Bouchez said:


Nicely done by the book, very nice.


One thing though and i have to ask, NIST had established two limits before reseeding for using CTR_DRBG safely , one for max bits number per request which is as max is 2^19 and the other is reseed interval based on the generate request which is 2^48, both are for AES128, AES192 and AES256, while i see mORMot2 AES-PRNG does have 2^25 bytes as threshold to reseed, so would you please point me where this 2^25 come from ? i am very interested and really appreciate it.


and if may i suggest this small change 

procedure TAesPrng.FillRandom(out Block: TAesBlock);
    DoBlock(rk, iv, Block{%H-}); // block=AES(iv)

    inc(iv.b[14], Ord(iv.b[15] = 0));
    inc(iv.b[13], Ord(iv.b[15] or iv.b[14] = 0));
    //if iv.b[15] = 0 then
  inc(fBytesSinceSeed, 16);
  inc(fTotalBytes, 16);

If we assume a limit for the counter as 2^24 then we are safe with 3 bytes counter and there is no way to overflow, this will remove the need for the heavy weight function CtrNistCarryBigEndian, saving few cycles per block, but will decrease the reseed limit, i think you got the idea and can find nice spot between 24 bits and 48 bits.

Share this post

Link to post

If you look at the asm - at least on FPC - in fact CtrNistCarryBigEndian() is inlined so has very little impact. It is called 1/256th times, and only add a two inc/test opcodes.
Using branchless instructions seems pointless in this part of the loop: DoBlock() takes dozen of cycles for sure, and the bottleneck is likely to be the critical section.

Also note that 2^24 depends on the re-seed parameter, which may be set to something more than 2^24*16 bytes (even NIST seems to allow up to 2^48), so a 3 bytes counter won't be enough.

CtrNistCarryBigEndian() is a nice and readable solution, in the context of filling a single block of 16 bytes.

Current 32MB default for the reseed value is still far below from the NIST advice of 2^48. We used 32MB from user perspective - previous limit was 1MB which was really paranoid.
Anyway, if an application needs a lot of random values, then it will instantiate its own TAesPrng, with a proper reseed, for each huge random need.

Edited by Arnaud Bouchez
  • Like 1

Share this post

Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this