A.M. Hoornweg 144 Posted August 9 Hello all, I have the situation where I have a dynamic array containing small fixed-size records (17 bytes in size, no managed types). The array behaves like a queue, new elements get appended at the end, old elements get removed from the start. This happens 24/7 and the application is mission critical. There may be anything from 0-5000 elements in the array. My logic for appending is simply "MyArray:=MyArray+[element]" and my logic for removing the oldest element is simply "Delete(MyArray,0, 1)". I'm worried that there might be a risk of excessive memory fragmentation and I'd like to hear your opinions on this. Thanks in advance! Share this post Link to post
Lajos Juhász 295 Posted August 9 This is most probably going to lead to memory fragmentation. My advice is to watch the working set size in Task Manager. If the memory grows very fast change the implementation if not live with it. Having said that, I have an application that is a bridge between two databases. It is prone to a low memory fragmentation that requires that I restart it 2-3 times every year. Share this post Link to post
David Schwartz 430 Posted August 9 What you want is a circular buffer that points to buffers in a memory pool if they're all the same size, or pools of different sizes if they vary. The pool pre-allocates (static) buffers that are re-used, so no fragmentation occurs. Memory fragmention is not your biggest issue, it's atomic reads and writes, or test-and-sets to avoid corruption and gridlock from race conditions. Share this post Link to post
Kas Ob. 124 Posted August 9 21 minutes ago, A.M. Hoornweg said: I'm worried that there might be a risk of excessive memory fragmentation and I'd like to hear your opinions on this. Don't worry there, as long you are using FastMM (or the default memory manager, they are the same) , there is no memory fragmentation, in fact it is impossible to fragment the memory when using FastMM with one allocation, these lists/arrays... are one allocation from the point of view of MM, hence however you operate on them like delete and append, it always one continuous allocation. Share this post Link to post
Dalija Prasnikar 1404 Posted August 9 1 hour ago, A.M. Hoornweg said: I have the situation where I have a dynamic array containing small fixed-size records (17 bytes in size, no managed types). The array behaves like a queue, new elements get appended at the end, old elements get removed from the start. This happens 24/7 and the application is mission critical. There may be anything from 0-5000 elements in the array. My logic for appending is simply "MyArray:=MyArray+[element]" and my logic for removing the oldest element is simply "Delete(MyArray,0, 1)". If you are frequently adding/removing items from array and the maximal occupied memory is not an issue, I would use TList<T> instead of dynamic array which supports Capacity. Set the Capacity to the maximum number of elements you are expecting (if there are more underlying dynamic array will automatically grow). That will prevent frequent allocations/deallocations and improve performance. You may also use TQueue<T> which might be fit for your needs. 1 hour ago, A.M. Hoornweg said: I'm worried that there might be a risk of excessive memory fragmentation and I'd like to hear your opinions on this. Whole array needs to be allocated in one place so there will be no memory fragmentation just because of that array reallocations. 1 Share this post Link to post
A.M. Hoornweg 144 Posted August 9 1 hour ago, Kas Ob. said: Don't worry there, as long you are using FastMM (or the default memory manager, they are the same) , there is no memory fragmentation, in fact it is impossible to fragment the memory when using FastMM with one allocation, these lists/arrays... are one allocation from the point of view of MM, hence however you operate on them like delete and append, it always one continuous allocation. My worry is, whenever I delete an element at the beginning, would the RTL just allocate a new block and copy the old array minus the first element to the new location? . Share this post Link to post
A.M. Hoornweg 144 Posted August 9 1 hour ago, David Schwartz said: What you want is a circular buffer that points to buffers in a memory pool if they're all the same size, or pools of different sizes if they vary. The pool pre-allocates (static) buffers that are re-used, so no fragmentation occurs. Memory fragmention is not your biggest issue, it's atomic reads and writes, or test-and-sets to avoid corruption and gridlock from race conditions. Access to the queue is only done by a single thread. A circular buffer is not handy because this queue keeps growing until a certain condition is met. Then a number of elements are removed from the beginning of the queue until the condition is no longer met. Share this post Link to post
Stefan Glienke 2019 Posted August 9 (edited) 16 hours ago, A.M. Hoornweg said: My worry is, whenever I delete an element at the beginning, would the RTL just allocate a new block and copy the old array minus the first element to the new location? Not exactly, it moves everything in the array to the front and then calls SetLength with the new length. What that does internally depends on the refcount of the array. If it's 1, then it does a Realloc (shrinking in place, though depends on the memory manager what exactly happens, typically for a small downsize it does it in place, thus basically resulting in a no op) If it's greater than 1 it allocates a new block and copies the content over - I would say calling SetLength inside of Delete is a bug because everything was already moved in place thus the behavior of SetLength is not to be desired in this case. In fact this leads to some corrupted copy lying around Reported https://embt.atlassian.net/servicedesk/customer/portal/1/RSS-1532 Edit: apparently this is already known since 2020 - https://quality.embarcadero.com/browse/RSP-27870 Edited August 10 by Stefan Glienke 4 Share this post Link to post
Anders Melander 1815 Posted August 9 1 hour ago, Stefan Glienke said: I would say calling SetLength inside of Delete is a bug Nice catch! Share this post Link to post
Kas Ob. 124 Posted August 9 2 hours ago, A.M. Hoornweg said: My worry is, whenever I delete an element at the beginning, would the RTL just allocate a new block and copy the old array minus the first element to the new location? . Even when that is the case, i mean allocate new and copy the content, FastMM has very nice algorithm to recycle the system allocated blocks, FastMM request and allocate chunks from OS, lets say we have 1 MB allocated, FastMM will slice it into smaller chunks and provide the application with requested sizes, when a requested size is not available, it will request another big check 1MB and repeat, the probability of you cause fragmentation is very low, as it only will happen with the 1MB(s) chunks, and to do this you will have to requested and not freed too many blocks (huge amount of small sizes) , then free have of them in specific order to cause, but when your lets say list will grow above certain size then FastMM will not utilize the sliced chunks, it will request new form the OS and free (return it to OS) it when done, hence no matter how you changed or resize your list or even tens/hundreds of them you will not be able to cause any memory fragmentation, nor memory access degradation, the performance will more or less stay the same. Relocating on single delete as Stefan pointed is a problem, so do as Dalija suggested or implement you own raw list which will perform relocate on delete, it is easy. One more thing, many thinks linked lists are good and fast, and they are but only when the items count reach something around 500k item. Another thing, if your list doesn't need to be ordered, then on delete move/copy the last item on top the deleted one and decrease the count, hence remove few overheads, in such case your list will need size (capacity) and count (actual items count), grow and resize only when count is equal to size, and never downsize !, the logic is simple here, this size needed once so it could be needed again. Share this post Link to post
Kas Ob. 124 Posted August 9 Another thing, having a record with 17 bytes is little strange, and means you are using packed record, if you are tying to save some performance and minimize time used in copying, then let me assure you are doing the opposite. Remove the packed and/or make sure it is padded to multiple of 4 bytes, meaning 20 bytes should be your minimum, you might see better a little performance right away, even if it is very small it is better, or remove the packed modifier from the record and let the compiler align it for you, unless you are using and exchanging the same records between 32bit and 64bit version, in that case you see how the compiler did it, and replicate the padding yourself with the packed one. Share this post Link to post
David Heffernan 2353 Posted August 9 57 minutes ago, Kas Ob. said: Remove the packed and/or make sure it is padded to multiple of 4 bytes, meaning 20 bytes should be your minimum, you might see better a little performance right away This was true once upon a time but is not true for modern x86-64 architectures. Share this post Link to post
Kas Ob. 124 Posted August 9 8 minutes ago, David Heffernan said: This was true once upon a time but is not true for modern x86-64 architectures. True for almost all cases but not all, there still some few edge cases. Another point, it will works for old and new CPU. Another point, it will help any new optimized memory copy functions to use the SIMD version without going through the first and unaligned 16 bytes, but for this it should be aligned to 16bytes to get the most of juice, meaning that record should be 32bytes. Share this post Link to post
Stefan Glienke 2019 Posted August 9 4 hours ago, Kas Ob. said: Another point, it will help any new optimized memory copy functions to use the SIMD version without going through the first and unaligned 16 bytes, but for this it should be aligned to 16bytes to get the most of juice, meaning that record should be 32bytes. That would be true if dynamic array allocation would be done in a way that ensures that element [0] would be at a 16 byte aligned address, which is not the case Share this post Link to post
David Heffernan 2353 Posted August 9 1 hour ago, Stefan Glienke said: That would be true if dynamic array allocation would be done in a way that ensures that element [0] would be at a 16 byte aligned address, which is not the case It is if you use a proper memory manager Share this post Link to post
David Heffernan 2353 Posted August 9 5 hours ago, Kas Ob. said: Another point, it will works for old and new CPU. Except aligned is now slower on modern cpus because of the less efficient cache usage when there is padding. Share this post Link to post
Stefan Glienke 2019 Posted August 10 (edited) 6 hours ago, David Heffernan said: It is if you use a proper memory manager Only on 64bit though - on 32bit the size of TDynArrayRec is 8 byte, if the memory manager getmem returns 16byte aligned memory that makes the elem[0] 8byte aligned. This is why i said that the RTL has to take care of it - the same is the case for strings. Relying only on the memory manager which is replaceable prevents the RTL from doing things that it could if certain data would have certain alignments. Using SSE for some string handling functions and such. Edited August 10 by Stefan Glienke Share this post Link to post
Kas Ob. 124 Posted August 10 12 hours ago, Stefan Glienke said: That would be true if dynamic array allocation would be done in a way that ensures that element [0] would be at a 16 byte aligned address, which is not the case Alas, that is the case, and this is why when performance is the target i don't use any managed type, a small record with size, count and a pointer where the item[0] are reference to raw record, with a record helper to provide addition and deteletion. 11 hours ago, David Heffernan said: It is if you use a proper memory manager All memory manager i know including system Heap ones, do 16 bytes aligning, the problem in outdated and inefficient RTL managed types design, these should be updated, no memory manager can help with unaligned structure to begin with. Share this post Link to post