Jump to content
Tommi Prami

FYI: Stumbled upon interesting ASM optimization trick LLVm can do (most likely others also)

Recommended Posts

Posted (edited)

image.thumb.png.65318404e5157495e03a208928ab7c0f.png

Apparently LAEL ("address calculator") can do multiply by 2, 4 and 8 very fast. 

Don't know anything about details, difference to other op codes etc...

Subject starts from here, at C to llvm IR -> to asm : 

 

-Tee-

Edited by Tommi Prami
Link added

Share this post


Link to post

Dang, watching this led me to buy yet another book 🤖

  • Haha 1

Share this post


Link to post
15 minutes ago, Lars Fosdal said:

Dang, watching this led me to buy yet another book 🤖

Hackers Delight?

  • Like 1

Share this post


Link to post

I am afraid so, even if it probably is somewhat outdated by now, in the light of new CPUs.

  • Like 1

Share this post


Link to post
Posted (edited)
22 hours ago, Lars Fosdal said:

I am afraid so, even if it probably is somewhat outdated by now, in the light of new CPUs.

That might be true, I should have it somewhere. Should read it again. I have only checked few specific things from it.

 

Edited by Tommi Prami
Typo

Share this post


Link to post
1 hour ago, Lars Fosdal said:

I am afraid so, even if it probably is somewhat outdated by now, in the light of new CPUs.

I have no idea what is in that book, but I doubt that it is very much obsolete. Math behind it does not change. Nor the fact that some operations will always be more expensive than others, no matter the CPU.

Share this post


Link to post
1 hour ago, Dalija Prasnikar said:

I have no idea what is in that book, but I doubt that it is very much obsolete.

I have it and many of the techniques in it that were great 10-15 years ago are totally unnecessary today and even detrimental with modern CPUs.

It's still a good read and even though some techniques are obsolete, knowing them can help solving other similar problems simply because they can make you think about problems differently.

 

 

2 hours ago, Dalija Prasnikar said:

Nor the fact that some operations will always be more expensive than others, no matter the CPU.

Um... Always? There used to be a time, not that long ago, when integer math was far superior to floating point math (see: Hackers Delight). Not so anymore.

via https://stackoverflow.com/questions/2550281/floating-point-vs-integer-calculations-on-modern-hardware

 

 

3 hours ago, Tommi Prami said:

Apparently LAEL ("address calculator") can do multiply by 2, 4 and 8 very fast. 

The video is a bit long so I haven't watched it yet, but it's LEA: Load Effective Address, "LEAL" is AT&T syntax for LEA (Intel syntax).

Here's a good comment on the topic by Peter Cordes: Using LEA on values that aren't addresses / pointers?

 

Before you start replacing all your shifts with LEA you should be aware that it isn't always faster. As with almost everything in modern CPUs it depends on what the CPU is otherwise busy with.

https://stackoverflow.com/questions/70316686/assembly-why-is-lea-eax-eax-eaxconst-shl-eax-eax-const-combined-fast

  • Thanks 2

Share this post


Link to post
37 minutes ago, Anders Melander said:

It's still a good read and even though some techniques are obsolete, knowing them can help solving other similar problems simply because they can make you think about problems differently.

This is always the greatest benefit.

 

42 minutes ago, Anders Melander said:

Um... Always? There used to be a time, not that long ago, when integer math was far superior to floating point math (see: Hackers Delight). Not so anymore.

Integer math is still faster. Just not that much. And division is still more expensive than multiplication.

 

Anyway, when it comes to any optimizations, it always goes hand in hand with measuring. and making sure that you really need optimization in the first place and are not making a bad trade-off by complicating the code.

Share this post


Link to post

Interestingly the Delphi compiler did some optimization for multiplication by const for quite a while but it was weird and not very well - see https://quality.embarcadero.com/browse/RSP-38636.

Unfortunately, when implementing this someone missed looking into the instruction timings of imul vs the replacements and so we got a degradation in some cases - see https://embt.atlassian.net/servicedesk/customer/portal/1/RSS-1011.

  • Like 2

Share this post


Link to post
7 minutes ago, Stefan Glienke said:

Interestingly the Delphi compiler did some optimization for multiplication by const for quite a while but it was weird and not very well - see https://quality.embarcadero.com/browse/RSP-38636.

Unfortunately, when implementing this someone missed looking into the instruction timings of imul vs the replacements and so we got a degradation in some cases - see https://embt.atlassian.net/servicedesk/customer/portal/1/RSS-1011.

If I remember from  the video, the LLVM will use shl/shr (never remember which) tric on the intermediate version (Like it'äs bytecode), but it'll use the LEA at the binary if it just can.

I always thought shl/shr would be the fastest way.. 

 

-Tee-

Share this post


Link to post
Posted (edited)

There are multiple considerations - I don't know what compiler version, target, and option he was using to conclude that it will use a lea rather than shift - at least with -O3 it will use shift for multiplications by 4 and 8 although lea would also be applicable - for mul by 2 most likely add is being emitted because that is just the smaller instruction. Another consideration is if the value is needed further - for example, x * 7 is implemented as x * 8 - x - and here it cannot use shift for the * 8 because it needs the original value of x to subtract, therefor it uses lea to store the result in another register and subtract the original register from it.

 

Regarding the LEA instruction - I just remembered that I also reported that it should utilize this instruction when doing pointer math - see https://quality.embarcadero.com/browse/RSP-34820

Edited by Stefan Glienke
  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×