Mike Torrettinni 198 Posted September 24, 2020 (edited) I know I should never rely on iterator value to be kept as iterated value when normal For loop ends, but what about in For in loop? I never rely on i being the same value after loop as when 'condition' is met, so I assign i value to variable (vIndex) so I can use the value of i after the loop: for i := 1 to n do if condition then begin vIndex := i; Break; end; But in For..in loop I assume vItem still holds the same value as when condition is met, so I can use vItem afer loop for vItem in Items do if condition then Break; All my tests show that vItem keeps the value and can be used after For...in loop. Am I right and I can use vItem without worries, or is there a case where vItem should not be expected to be the same value as when 'condition' is met? Edited September 24, 2020 by Mike Torrettinni Share this post Link to post
David Heffernan 2345 Posted September 24, 2020 (edited) It's the same issue for both types of for loops. The value of the loop variable is undefined if the for loop terminates normally. In your case that corresponds to the condition never having been met. Unfortunately though, whilst you are aware that there is an issue here, you have not correctly understood what the issue is. 1 hour ago, Mike Torrettinni said: I never rely on i being the same value after loop as when 'condition' is met, so I assign i value to variable (vIndex) so I can use the value of i after the loop That reasoning is incorrect. The loop variable i does not change in case of a normal termination, e.g. the break in your code. Before you consider for in loops I recommend that you correct your misunderstanding of for loops. Executive summary. There is an issue with for loops of both kind. It is the same issue for both kind of for loop. But the issue is not what you have described. Edited September 24, 2020 by David Heffernan 1 Share this post Link to post
Lars Fosdal 1792 Posted September 24, 2020 http://docwiki.embarcadero.com/RADStudio/Sydney/en/Declarations_and_Statements_(Delphi)#For_Statements For to rules are pretty clear: Quote After the for statement terminates (provided this was not forced by a Break or an Exit procedure), the value of counter is undefined. Warning: The iteration variable counter cannot be modified within the loop. This includes assignment and passing the variable to a var parameter of a procedure. Doing so results in a compile-time warning. For in rules are a bit general: Quote Warning: The iteration variable cannot be modified within the loop. This includes assignment and passing the variable to a var parameter of a procedure. Doing so results in a compile-time warning. I am not sure how this would be a problem if we are talking about a loop over a list of objects, as we then would be passing a reference to the list object? I have not really thought about how the variable of a for in loop over an array of records is kept. Do we see a deep copy of the array element in the loop variable, or is there pointer magic involved? I can see how this could be a problem if the reference passed is to a local stack copy. 1 Share this post Link to post
David Heffernan 2345 Posted September 24, 2020 55 minutes ago, Lars Fosdal said: I have not really thought about how the variable of a for in loop over an array of records is kept. Do we see a deep copy of the array element in the loop variable, or is there pointer magic involved? I can see how this could be a problem if the reference passed is to a local stack copy. Loop variables are assigned using assignment semantics. That's all you need to know. Share this post Link to post
Lars Fosdal 1792 Posted September 24, 2020 21 minutes ago, David Heffernan said: Loop variables are assigned using assignment semantics. That's all you need to know. So, looping an array of records using for v in array, can be significantly more expensive than indexed access using for vx := Low(array) to High(array), given the size of the record? 1 Share this post Link to post
David Heffernan 2345 Posted September 24, 2020 8 minutes ago, Lars Fosdal said: So, looping an array of records using for v in array, can be significantly more expensive than indexed access using for vx := Low(array) to High(array), given the size of the record? Correct. 1 Share this post Link to post
Sherlock 663 Posted September 24, 2020 So actually, while looking cool and trendy it really is slow and not so cool at all? I got some refactoring to do... gnarf! 2 Share this post Link to post
Fr0sT.Brutal 900 Posted September 24, 2020 6 minutes ago, Sherlock said: So actually, while looking cool and trendy it really is slow and not so cool at all? I got some refactoring to do... gnarf! Almost all syntax sugar is expensive. for-in means creating iterator, calling its methods etc instead of plain index addressing. But in most cases it doesn't matter. Anyway if speed is important, pointer-based iteration probably will be faster 4 Share this post Link to post
Lars Fosdal 1792 Posted September 24, 2020 Which means lists and arrays of objects may be preferable over lists and arrays of records in certain situations. Share this post Link to post
Lars Fosdal 1792 Posted September 24, 2020 15 minutes ago, Sherlock said: So actually, while looking cool and trendy it really is slow and not so cool at all? I got some refactoring to do... gnarf! Yeah... you are not the only one. 1 Share this post Link to post
David Heffernan 2345 Posted September 24, 2020 19 minutes ago, Fr0sT.Brutal said: for-in means creating iterator This is why I always implement my enumerators as records so that they can be automatically allocated. 19 minutes ago, Fr0sT.Brutal said: Anyway if speed is important, pointer-based iteration probably will be faster Will it? Faster than array indexing? That seems to be something of a myth in my experience. Share this post Link to post
David Heffernan 2345 Posted September 24, 2020 53 minutes ago, Sherlock said: So actually, while looking cool and trendy it really is slow and not so cool at all? Not necessarily. And in any case, if you wanted performance, you'd not have chosen Delphi. 1 Share this post Link to post
Sherlock 663 Posted September 24, 2020 Ah well, I chose Delphi because I like it. 😄 And if I can avoid a slow down whilst using Delphi I will try to do so. So thanks for the insight, and probably I'll do some benchmarking for my specific application, just to see if this refactoring is justified. Share this post Link to post
Stano 143 Posted September 24, 2020 If you use an iterator outside the loop, the compiler will warn you of the incorrect use. At least in 10.4. Share this post Link to post
Mike Torrettinni 198 Posted September 24, 2020 You guys are right, it really creates the whole record structure for variable iterator and copies the content, for each item iteration. So, if you have really big record structure it will use some memory, or will be considerably slower than using normal iterator. Had a suspicion on this, but never really thought about it enough (or test it myself), but now I know for sure. 7 hours ago, David Heffernan said: It's the same issue for both types of for loops. The value of the loop variable is undefined if the for loop terminates normally. Thanks, I had a feeling is good to check with experts 5 hours ago, Lars Fosdal said: After the for statement terminates (provided this was not forced by a Break or an Exit procedure), the value of counter is undefined. Warning: The iteration variable counter cannot be modified within the loop. This includes assignment and passing the variable to a var parameter of a procedure. Doing so results in a compile-time warning. or read the documentation 🙂 Share this post Link to post
Fr0sT.Brutal 900 Posted September 24, 2020 3 hours ago, Lars Fosdal said: Yeah... you are not the only one. Major rule of optimization says: Benchmark First Before Trying To Optimize 😉 . Syntax sugar is created for code simplicity and shortness and in most cases perf degradation will be insignificant 3 hours ago, David Heffernan said: Will it? Faster than array indexing? That seems to be something of a myth in my experience. I've really seen perf gain in string processing. With index loop you have ArrPtr + I*SizeOf(Item) each time which could, f.ex., cause cache misses. OTOH, Inc(CurrPtr, SizeOf(Item)) always operates on near memory area. Honestly I'm not a CPU-level expert and array loops could be well optimized already but in my case benches shown that pointer loop is faster. 3 hours ago, David Heffernan said: 2 Share this post Link to post
David Heffernan 2345 Posted September 24, 2020 (edited) 11 minutes ago, Fr0sT.Brutal said: With index loop you have ArrPtr + I*SizeOf(Item) each time which could, f.ex., cause cache misses. OTOH, Inc(CurrPtr, SizeOf(Item)) always operates on near memory area. Honestly I'm not a CPU-level expert and array loops could be well optimized already but in my case benches shown that pointer loop is faster. The hardware is designed to handle indexed access efficiently. I've seen plenty of cases where incrementing a pointer inside the loop is slower. After all, you have an extra variable that needs to be incremented. Hard to see why that would be faster. 11 minutes ago, Fr0sT.Brutal said: With index loop you have ArrPtr + I*SizeOf(Item) each time which could, f.ex., cause cache misses. No. The memory access pattern is identical. It's just a sequential pass across the array. Edited September 24, 2020 by David Heffernan 1 Share this post Link to post
Anders Melander 1782 Posted September 24, 2020 8 minutes ago, Fr0sT.Brutal said: With index loop you have ArrPtr + I*SizeOf(Item) each time which could, f.ex., cause cache misses Why would calculating a pointer cause a cache miss? Share this post Link to post
Mike Torrettinni 198 Posted September 24, 2020 In this simple example you can see the the memory consumption jump when starting to iterate and also the time it takes to copy each record (not too much, but noticeable under debugger): The memory consumption of each record is around 390MB so the effect of iterating with variable is very noticeable, while normal iterator is fast and no additional memory consumption: const cMaxItems = 100000000; type TData = record Values: array[1..cMaxItems] of string; end; var gData: TArray<TData>; vItem: TData; procedure TForm2.FormCreate(Sender: TObject); var c: integer; begin SetLength(gData, 3); gData[0].Values[1] := 'a'; gData[1].Values[1] := 'b'; gData[2].Values[1] := 'c'; for vItem in gData do // slow and watch task manager for memory consumption inc(c); for c := Low(gData) to High(gData) do // fast sleep(0); end; Share this post Link to post
Anders Melander 1782 Posted September 24, 2020 22 minutes ago, Mike Torrettinni said: The memory consumption of each record is around 390MB I'd say that if you have records that size then the iterator is not the problem you should be focusing on 1 Share this post Link to post
Mike Torrettinni 198 Posted September 24, 2020 23 minutes ago, Anders Melander said: I'd say that if you have records that size then the iterator is not the problem you should be focusing on If the iteration is the only thing that is affected with such large records, then is good to know what could be performance bottleneck. I don't have any records of such size, so it is good to see what happens if you have nested records of large size, or smaller records but you have a large number of them, so iteration copies smaller memory but a lot of times - could be even worse than a few of 390MB records. Share this post Link to post
David Heffernan 2345 Posted September 24, 2020 14 minutes ago, Mike Torrettinni said: or smaller records but you have a large number of them, so iteration copies smaller memory but a lot of times Why would there be an issue for small records? Presumably you are iterating over the array because you want to look at the content of the record. For a small enough record, there won't be any difference in perf between reading a field and copying the entire record. Share this post Link to post
Mike Torrettinni 198 Posted September 24, 2020 6 minutes ago, David Heffernan said: Why would there be an issue for small records? Presumably you are iterating over the array because you want to look at the content of the record. For a small enough record, there won't be any difference in perf between reading a field and copying the entire record. Iterating a few large records (390MB) or lots of small records (10000x 0.039MB), could have similar performance effects. Share this post Link to post
David Heffernan 2345 Posted September 24, 2020 40 minutes ago, Mike Torrettinni said: Iterating a few large records (390MB) or lots of small records (10000x 0.039MB), could have similar performance effects. No. Because when you iterate over an array, you typically don't read all of the content of each item. Imagine that all you do is look inside each record for an integer ID. If the record is huge, then you can just read a single integer, and move on, if using a classic for loop with an array of record. But if you use a for in loop you have to copy the entire record before reading the single integer ID. That's wasteful. In the case of a smaller record, let's say small enough to fit into a cache line, then reading an integer from the record has the essentially same cost as copying the entire record and then picking out the integer. So the trade off depends hugely on the size of the record, and what proportion of it you actually need to access. 1 Share this post Link to post
Mike Torrettinni 198 Posted September 24, 2020 (edited) 51 minutes ago, David Heffernan said: No. Because when you iterate over an array, you typically don't read all of the content of each item. I was referring to variable iterating For..in loop - in this case full array record content is copied into variable. No? At least that's what I see from the simple example above. Edited September 24, 2020 by Mike Torrettinni Share this post Link to post