Thanks for this interesting topic, in particular such a one like me who deals with image processing and related algoriths and a Delphi/FPC lover. In my tests, the reference pixel-accessing code is the fastest one, and the difference on a bigger image (12000*8143 pixs) is more clear, please look at the results below (running two times successively). Additionally, I've tested FastMM4 and FastMM5 on these testes, and FastMM5 is clearly yielded the fastest time scores (here I did not give the FastMM4 scores), and I've used FastMM5 in my tests.
Using orginal image:
-----------------------
Running test: "Reference" (RELEASE build Win64, used with Normal Image(5600x3800))
Run count: 5
Min: 78.990ms, Average: 79.810ms, Max: 80.594ms
Running test: "Reference" (RELEASE build Win64, used with Normal Image(5600x3800))
Run count: 5
Min: 80.314ms, Average: 84.316ms, Max: 91.712ms
Running test: "ReferenceWithScanlineHelper" (RELEASE build Win64, used with Normal Image(5600x3800))
Run count: 5
Min: 93.387ms, Average: 93.930ms, Max: 94.503ms
Running test: "ReferenceWithScanlineHelper" (RELEASE build Win64, used with Normal Image(5600x3800))
Run count: 5
Min: 93.118ms, Average: 93.848ms, Max: 94.309ms
Using bigger image:
-----------------------
Running test: "Reference" (RELEASE build Win64, used with Bigger Image(12000x8143))
Run count: 5
Min: 356.202ms, Average: 361.204ms, Max: 378.090ms
Running test: "Reference" (RELEASE build Win64, used with Bigger Image(12000x8143))
Run count: 5
Min: 352.916ms, Average: 367.075ms, Max: 385.400ms
Running test: "ReferenceWithScanlineHelper" (RELEASE build Win64, used with Bigger Image(12000x8143))
Run count: 5
Min: 422.031ms, Average: 429.115ms, Max: 438.597ms
Running test: "ReferenceWithScanlineHelper" (RELEASE build Win64, used with Bigger Image(12000x8143))
Run count: 5
Min: 423.584ms, Average: 426.645ms, Max: 430.645ms