Profile it.
The significance of the overhead really depends on what the routine is doing and how often it is called. For example if the call overhead is X but the routine itself takes 1000*X to execute then the overhead is insignificant.
Apart from the overhead of the call itself there's the overhead of passing parameters. You need to learn how parameters are passed in order to optimize them. Not all parameter types can be passed in registers, Delphi doesn't use all available registers, 32-bit and 64-bit does things differently, etc.
Also be aware that if you pass literal floating point numbers as parameters, the compiler might not consider the literal values to be the same type as the parameter which means that it will produce code to convert the values to the correct type.
For example, if the parameter type is Single and you need to pass 0.5, then pass it as Single(0.5) - or declare a typed constant with the value and pass that.
Set Code Inlining Control=Auto to have the compiler automatically inline small routines.
With regard to writing stuff in assembler be aware that assembler routines can't be inlined so the call overhead becomes mandatory. For example, in my code one constant bottleneck is calls to Round and Trunc. I have assembler versions of these two functions which are much faster but unfortunately the call overhead completely eliminate the performance gain so they are basically useless. It's beyond me why Delphi doesn't implement these two as intrinsics. They are listed as such but they are implemented as regular functions.
Not only that, but the Pascal version will provide a reference implementation to test and benchmark against and it will help documenting what the assembler versions does.