You are mostly right about this. If you have a small list size, you are mostly measuring creation/destruction time. But that is exactly one of the reasons you may want to use a stack-based list instead.
Although my main motivation wasn't about speed (it was more about memory fragmentation, which is hard to test as you mentioned), I added some speed tests to the repo anyway.
Like you said, any speed test in this area is biased, and does not represent a real-life situation, so you should take the results with a grain of salt. I tested 4 stack list implementations against Delphi's TList<Integer>:
TStackList1 is the original stack list from example 2 (with a fixed configurable size, but doesn't allow managed types).
TStackList2: as TStackList1 but takes Stefan's "put exception at the bottom" suggestion into account.
TStackList3: as TStackList2 but uses inlining
TStackList4: as TStackList1 but uses inlining
I tested these lists with a buffer of 256 bytes on Win32 using Delphi Rio. This shows how much faster each list is compared to Delphi's TList<Integer>.
TStackList1: 2.78 times faster
TStackList2: 2.77 times faster
TStackList3: 2.81 times faster
TStackList4: 3.03 times faster
Note that measurements vary a bit from session to session. Sometimes, "putting the exception at the bottom" has a positive impact, sometimes it doesn't. TStackList4 is always the fastest on Win32 though, but the differences aren't that big. On Win64, the results are about the same, but Delphi's TList<Integer> seems to perform a little bit better than on Win32.
Also, as you said, performance depends a lot on the list size. For a list size of 32 bytes, the stack lists are about 10x faster, but for a list size of 2048 bytes, they are only about 1.7x faster. This reinforces my suggestion that stack lists are most suitable for smaller temporary lists. For larger lists, just stick to heap-based collections instead. And you can tweak the capacity as you mentioned to avoid re-allocations.
You can't really test heap usage and fragmentation with these tests. Because we are creating and destroying lists in a loop, a smart memory manager just reuses the same heap memory over and over again without fragmentation. In real-life however, there will be lots of other things going on between the recreation of lists, and fragmentation will be more likely.