Jump to content

Kas Ob.

  • Content Count

  • Joined

  • Last visited

  • Days Won


Kas Ob. last won the day on January 10

Kas Ob. had the most liked content!

Community Reputation

241 Excellent

1 Follower

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Exactly, too much of discussions, but what i wrote is simplified and shortened description of what i am familiar with, as it is mostly depends on usage case (the length of the code, threads counts, does it involve another nested wait loop, is there I/O operation involved ..etc) For me Sleep(1) is out of question in spinlock, simply because critical section is better and faster here, Sleep(0) and SwitchToThread will involve the OS, and i explained a point in the little details on how OS react to them, and also there is PAUSE ( assembly instruction) which was something around 10 cycle, but recently Intel made it around 170 cycle in modern CPU's (modern Xeon too), this change a lot in spinlock and Intel advocates it as spinlock targeted instruction, but yet it will decrease the power consumption on looping without any OS call. For me, i use few different spinlocks, hand made for specific code, some depend only on PAUSE, some spin in place with breaks, and some does use SwitchToThread, and in few cases i did loop on PAUSE with counter up to 4000 before calling SwitchToThread (assuming 4000*10 cycle should means there is a hug somewhere and the OS is starving), as it made sense . Yes you are right there, the circumstances, if you can change the design by different approach, which i believe in many cases out-of-the-box thinking will help more, like sometimes you can use spin lock only to copy the data to the stack or newly allocated memory per thread might be faster, here spinlock will act as gate in enter and exit in two places before the processing and after might be possible approach hence will be faster as the lock is protecting small portion of code to copy in and copy out only, even if on copy out the data should be discarded as outdated, but sometimes it is faster, also the memory to receive the data is preallocated before the spinlock protected code, so the spin lock is merely protecting a copy process, but again it is one example. The best way to benchmark for threading is by using complete blanc Hyper-V Windows without the shell, you can copy the binaries using PowerShell and execute them, it is time consuming process and require spare device or a dedicated/hosted one, just in case you want to do it at best accuracy available.
  2. That can and might happen in different scenarios. and here my two cents: 1) Sleep (1) will ruin the performance in production ! and that is simply because you are testing with Windows timer accuracy is been modified by the Delphi IDE to 1 instead 15, repeating the same test between Sleep(0) and Sleep(1) with all IDE's closed will give you shocking result, so never use Sleep(1) unless it is well thought and needed, it has its merit but definitely not in synchronizing code. 2) Sleep(0) render the execution to the system in unpredicted behaviour, not in bad way but in uncontrolled way, will try to explain with my English, Sleep(0) will give a hint to the OS that you want to wait , just like that, and the OS is free to choose to render the execution or not and here huge difference will be seen based on the priority of the thread against all other threads running on the OS, my Windows at this moment has +1600 threads, and priority will play huge rule with Sleep(0) and it different from the SwitchToThread. 3) What you really want to use is SwitchToThread https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-switchtothread this will instruct the OS to give the execution time slice to another thread as mentioned in the documentation based on the priority, but will activate any ready to run thread or one already stopped, <didn't use Sleep(x)>, think of SwitchToThread as sort of telling the OS that there is another thread disturbing me please try to give it a kick, and that is the main difference from Sleep(0). 4) Also many think SpinLock is faster than CS, both are slow and fast based on the usage place and the code protected, CS might cause travel to the kernel and cost thousands cycles or even hundreds thousands of them but again not always, while with spinlock it might burn the same count of the cycles without any productive gain, simply because the protected section does have long execution time ( more cycles), as example you protecting section of code that lets say will calculate the sum of 1000 integers, and lets say that algorithm takes 5k cycle, then any thread waiting with spinlock on that code in theory will burn the same amount, i think the idea is clear now , with more threads waiting and burning cycles in place the CPU will be 100% with only one thread calculating integers at a time, even the OS threads will suffer, so use SpinLock only on very short executed code, in other words with critical sections you are running at risk of wasting time and underloading the CPU, with spinlocks your are running at risk of overloading the CPU and also wasting time.
  3. Kas Ob.

    Quickly zero all local variables?

    Now i see, it does rearrange the managed types to the top of the local stack, you can see it in assembly procedure TTT; var A: Integer; B: Integer; st: string; C: Integer; D: Integer; mt: string; E: Integer; F: Integer; begin NilTheLocals(@F); // F is the last declared var Gimli.dpr.291: begin 0041C288 55 push ebp 0041C289 8BEC mov ebp,esp 0041C28B 83C4EC add esp,-$14 0041C28E 53 push ebx 0041C28F 56 push esi 0041C290 57 push edi 0041C291 33C0 xor eax,eax 0041C293 8945FC mov [ebp-$04],eax // st 0041C296 8945F8 mov [ebp-$08],eax // mt 0041C299 33C0 xor eax,eax 0041C29B 55 push ebp 0041C29C 68C1C34100 push $0041c3c1 0041C2A1 64FF30 push dword ptr fs:[eax] 0041C2A4 648920 mov fs:[eax],esp Gimli.dpr.292: NilTheLocals(@F); // F is the last declared var 0041C2A7 8D4504 lea eax,[ebp+$04] 0041C2AA 8BD0 mov edx,eax 0041C2AC 8D45EC lea eax,[ebp-$14] 0041C2AF 2BD0 sub edx,eax 0041C2B1 83EA04 sub edx,$04 0041C2B4 8D4504 lea eax,[ebp+$04] 0041C2B7 8D4DEC lea ecx,[ebp-$14] 0041C2BA 2BC1 sub eax,ecx 0041C2BC 50 push eax 0041C2BD 8D4504 lea eax,[ebp+$04] 0041C2C0 59 pop ecx 0041C2C1 2BC1 sub eax,ecx 0041C2C3 B901000000 mov ecx,$00000001 0041C2C8 E8B784FEFF call @FillChar
  4. Kas Ob.

    Quickly zero all local variables?

    Interesting question. It might be possible to some extent, but not recommended and dangerous, i am pasting it here as food for thought for curious minds // S Should be the last declared local variable procedure NilTheLocals(S:Pointer); inline; begin FillMemory(PNativeUInt(NativeUInt(AddressOfReturnAddress) - (NativeUInt(AddressOfReturnAddress) - NativeUInt(S))), NativeUInt(AddressOfReturnAddress)-NativeUInt(S)-SizeOf(Pointer),0); // change 0 to 1 to see the effect end; procedure TTT; var A:Integer; B:Integer; C:Integer; D:Integer; E:Integer; F:Integer; begin NilTheLocals(@F); // F is the last declared var //FillMemory(PNativeUInt(NativeUInt(AddressOfReturnAddress) - (NativeUInt(AddressOfReturnAddress) - NativeUInt(@F))), //NativeUInt(AddressOfReturnAddress)-NativeUInt(@F)-SizeOf(Pointer),0); A:=1; B:=2; C:=3; D:=4; E:=5; F:=6; Writeln(A); Writeln(B); Writeln(C); Writeln(D); Writeln(E); Writeln(F); end; This might work fine when compiler optimization disabled, but you should know this 1) with optimization enabled not all locals are stored on the stack, some are just reserved CPU registers, and that why it might not work, you can switch between 32bit and 64bit with optimization on and off, and watch the result yourself in the debugger local view. 2) while it looks safe for the example above, i have no idea if the compiler does rearrange the managed types, while the above does work and will not overflow the stack, but the question will it completely cover all the locals when there is managed types in between? i can't give an answer.
  5. First lets see if we can find which one is right in length and which is wrong 012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567012345670123456701234567 S o m e s h o r t d e s c r i p t i o n w i t h l o o o o o o o o o n g a d d i t i o n a l d a t a l i k e p o l i s h d i a c r i t i c a l c h a r s . . . ł ó ż ź ć ę ó ł a n d d i g i t s 0 1 2 3 4 5 6 7 8 9 B623A479FC657E31F219287CD191075575B2FB56485D0C22E9168A2BF2289C7165CDA67586A486E14115C754ABA158A84A8C3B521E0DF87505D77649A8F1CB52A03D41E205849F28BCA2DE189A9C65CDB648DBC9F7D49AF2F1704B491E9E2DE6FC357ADC8E15733394C3C75B45570AE77A2A6CB6CC4418A558A78313C0C16478A7D61538B88B486BCAE89235D8FCEEB8 B623A479FC657E31F219287CD191075575B2FB56485D0C22E9168A2BF2289C7165CDA67586A486E14115C754ABA158A84A8C3B521E0DF87505D77649A8F1CB52A03D41E205849F28BCA2DE189A9C65CDB648DBC9F7D49AF2F1704B491E9E2DE6FC357ADC8E15733394C3C75B45570AE77A2A6CB6CC4418A558A78313C0C16478 B623A479FC657E31F219287CD191075575B2FB56485D0C22E9168A2BF2289C7165CDA67586A486E14115C754ABA158A84A8C3B521E0DF87505D77649A8F1CB52A03D41E205849F28BCA2DE189A9C65CDB648DBC9F7D49AF2F1704B491E9E2DE6FC357ADC8E15733394C3C75B45570AE7 As you can see the third one is shorter that the plaintext means its been truncated and data been lost, this is wrong and will not restore the data in full. the second one is the right, the length is longer than the plaintext length but fit the block length for AES, this is right, but this will raise a question about what padding been used ?!! the first one is the longer and does have one extra full block, should means a padding scheme been used for sure. Now i explained what is going on with length, the question is this What padding are you using? Also, googling pyaes, i landed here https://github.com/ricmoo/pyaes/blob/master/README.md and i would suggest that you read it carefully and try to understand the padding usage and its importance.
  6. Nicely done by the book, very nice. One thing though and i have to ask, NIST had established two limits before reseeding for using CTR_DRBG safely , one for max bits number per request which is as max is 2^19 and the other is reseed interval based on the generate request which is 2^48, both are for AES128, AES192 and AES256, while i see mORMot2 AES-PRNG does have 2^25 bytes as threshold to reseed, so would you please point me where this 2^25 come from ? i am very interested and really appreciate it. and if may i suggest this small change procedure TAesPrng.FillRandom(out Block: TAesBlock); ... begin DoBlock(rk, iv, Block{%H-}); // block=AES(iv) inc(iv.b[15]); inc(iv.b[14], Ord(iv.b[15] = 0)); inc(iv.b[13], Ord(iv.b[15] or iv.b[14] = 0)); //if iv.b[15] = 0 then //CtrNistCarryBigEndian(iv.b); end; inc(fBytesSinceSeed, 16); inc(fTotalBytes, 16); LeaveCriticalSection(fSafe); end; If we assume a limit for the counter as 2^24 then we are safe with 3 bytes counter and there is no way to overflow, this will remove the need for the heavy weight function CtrNistCarryBigEndian, saving few cycles per block, but will decrease the reseed limit, i think you got the idea and can find nice spot between 24 bits and 48 bits.
  7. It should be. I have no Idea what is TdxAlphaColors , or why it does lead to to an array of strings, but may be colors names are initialized, any way it is irrelevant for this stage. Now you have the type of the array, now you need to pinpoint the variant that hold that array, which is a list in a generic list or an array, so searching where is that type been used and tracking it is your next step. Searching the internet for that type result in DevExpress related pages, so you might need some insight from their support on what generics types might have such double freeing or trimming, assuming you are not using them directly in some sort of singleton or global structure/var, if this is caused by DevExpress library then you are better with their support, or if it is related to theming then you are safe to ignore this error completely as it will always happen on exiting, but again you need their support confirmation.
  8. My thought on this, and i never used the library, so it is a guess work Getting FIN_WAIT_1 should have triggered a flag somewhere, so it is one of the following 1) An exception was raised and you silently ignoring it, or the kbmMW did ignored it silently, and that is wrong, this is unlikely by the library. 2) kbmMW is capturing it and passing it to an event to be handled or observed, and you missed assigning such event (callback) 3) kbmMW is buggy, it is unlikely, i think such library is already mature enough after all these years to miss such basic functionality. Also here from this I think there is something missing, here because calling TransportServer.Close ; should not be that simple to handle exactly these cases where there is an active connection, so it either you should be calling something before Close like Dissconnect, abort, StopListen... or Close itself can have a parameter like Close(True), that is the right and normal design for any sockets handling library, unless there is some properties can be should be control this process of closing/disconnecting. in any case, i think you should refer to the documentation, browse the demos and look for the right way to close/terminate.
  9. The right way to investigate, only you missed the shorter and more accurate way to do it. For exactly that you should focus on the the TypeInfo, which i mentioned above, re-reading what i wrote, i see i failed to point you to follow on that. So what you need now is either conditional break points on FinalizeArray when the TypeInfo is $692BB8 (or other any other address been passed, you can see it on the stack dump), also if break points is not your cup of tea then use DDetours to capture the TypeInfo in the question or all of them if you want, dump the calls into a file later check for the address and put an if with break point as that address should change only if you changed the source but replacing it will not change the source and the address's will still the same, not only on FInalizeArray but other RTL functions like New and InitializeArray .. you might need to go through few of them to capture the parent array, which i still can't say if it is generic array or generic list One more thing, have you tried EurekaLog ? In case you want to give it a go, i would like to hear what report it does generate, the trial version i think should be enough for this case. EurekaLog does the memory allocation and tracking differently from FastMM, so it might still have the allocation calls for that array when the free call in question been executed.
  10. I did something not so similar but as an idea i think it might work for you nicely. One client asked on behave of a client of his to add functionality to sort huge amount of small images (like icons and avatars) by colors similarity and closeness. So i thought and thought and came by this idea to compress them all into very small images then after that analyze the colors and their corners. For you it might work as i think you already have access to these big full screen frames, so try to capture one per second then compress/reencode it into small images like 64x64 or even 32x32 (16x16..10x10.. try to find what does work for you ) using PNG (jpeg might work but it is very lossy) PNG might be better option, use Windows WIC as it is fast, after that you need to get one number from these fewer pixels, so Light intensity (mentioned above) or image brightness or... any other formula will do, as it does reflect the brightness in all, only keep in mind the cofactors mentioned in SO might need to adjust for your use, Now you have one number out of the screen frame lets call it B, and because how lighting and illuminating works the lighting, the following formula might be sufficient T = E*(B^2)+L*B+C E,L,C >=0 you need to find these cofactors manually to fit your need, as it is impossible to imagine what could work for you and your hardware, also they might be unneeded if your lighting will use thresholds like three steps but you will need them in case of linear change if you lighting can/must be controlled in range. Hope that helps.
  11. To answer that we need to analyze your information first and get better understanding what is going on, only after that we can see the danger of this, if there is any ! Finding a bug by an address it rarely does, so let see these address to see what we can learn form them. 0x02099099 from the stack screenshot it does belong to FastMM debug dll, so it is not important to us here, the only thing we read from it, that your EXE is smaller than 30MB! 0x74be973f now this is very interesting as it is very high for 32bit and very close to the OS dll's default location (very high), and thing of it is the same every time, i think it is just luck, it can be changed after a restart, but to be accurate and for getting a confirmation for the following i will write, it might be due the FastMM configuration to allocate from top-to-down, so here if you tried to disable AlwaysAllocateTopDown , will this address start to become random on every run ?! Now lets see the stack calls, Halt been called that called a finalization on some unit, the lack of the name of the unit is interesting and will require from you some effort to track, Anyway, the sequence of the calls showing that 1) a generic list that hold an array of record, this record does have an array of string, 2) The location of both array's are still there, while the location of the array of strings was gone (???) at calling UStrArrayClr. 3) The record type info also gone at FinalizeRecord (..., ???) 4) Both TypeInfo of the array's are still there, judging by the address of the second one $4012BC, it is RTL supplied one, while the the first at location $692BB8 is yours. From the above the exception cause is double freeing but due to its strange case on termination to happen, FastMM4 failed to catch it, but due to my lack experience with managed record (i don't have latest Delphi versions), here some guessing by me and i might be wrong, how this can happen as FastMM in full debug should be able to catch it ? also there is a very small chance that this might be a bug in managed records, as i have no idea how global generic list or/with managed records does clean up. This might be happening because FastMM keep record of the free/alloc calls in separated block in parallel to its internal blocks allocations, and while the application is terminating many blocks is been freed as long as there is no indication to FastMM to keep, so in theory if the second array items (array of strings) been freed while the record still refer to an array where the locations of these record and array ($7F881358 and $7F88135C with only 4 byte away ), these address's belong to a block that is not freed yet. To fix this you need first to identify the culprit array of record ( or the generic list of record), then track the items of its grandchild strings in the array, where/how are these strings been freed right without updating its array. Now to answer if this is a bomb or not ? Yes it is, unless you are sure 100% that such free will only happen once on application termination, there will be a chance for unpredicted behaviour and exceptions that might be a show stopper. Hope that helps.
  12. Just few things, in that particular packet code handling might be a difference in execution time that might lead to this, what do come to mind here scenarios like these 1) Are you using Sleep(1) ! because sleep 1 is in fact most the time is sleep(15) or sleep(16) , and having the IDE opened on the device will make it Sleep(1), why because if the code processing is taking 2ms then it will make sense of these 13ms 2) What is the fate of the data of that packet, been zipped ?, i remember a bug with the IDE been shipped with unoptimized zip binary for only 64bit, or are you sending the data to DB, also here might be difference in the DB engine for 64bit vs 32bit... etc , in other words are you using 3rd-party library here ? Just food for thoughts.
  13. @Alain1533 We are getting somewhere now. Now without the failure/malfunctioning in the hardware we can look deeper into packets and their order and timing. You already marked the delay in the small red rectangle and the first one is an ACK, it is a simple acknowledgment from the PC to the PLC of receiving, also means the network adapter received the data and forwarded it to the application ! next packet is new series or may be a response, while the content is irrelevant for us now, its delay is important. So what you have to do is to log the ICS socket operation within your application with high precision timer, you must log them to a file, and each operation should have a time with 1ms precision at least. After that you can find the code responsible for this delay, which i think it is not coming form ICS but from very inefficient code for 64 bit, but to pinpoint this you need to log these parts of code and compare them.
  14. Yes that is true for atomic memory operation, but without LOCK instruction explicitly the operation will be silent success without any guarantee of atomicity (without raised exception), and here is the problem, and it is also my argument, when Thomas is asking for atomic operation, i can't care any less for its purposes, and this discuss is going good so far with resources in explaining how and when simple load/store can be atomic, and when it is not, (both cases are equally important)
  15. That is right with many other compilers by default, but what if Value/Target are in packed record after one byte field (or any odd sum of fields length)?! Also, is there a guarantee that Target/Value are not crossing a cache line ? ( 64byte alignment) So i would not depend on it, not without the context of its use safety in well documented.