Jump to content

Mahdi Safsafi

Members
  • Content Count

    383
  • Joined

  • Last visited

  • Days Won

    10

Posts posted by Mahdi Safsafi


  1. 3 hours ago, dummzeuch said:

    One question that I wanted to ask for a while:

     

    What do those "magic numbers" mean:

    • $0EEDFAE6
    • $0EEDFAE4
    • $0EEDFADE

    I googled them and found that the last two are declared as constants in system.pas:
      

    
    const
      cNonDelphiException = $0EEDFAE4;
      cDelphiException = $0EEDFADE;

     

    Yes you're right !

    Quote

    So I guess $0EEDFAE6 is an exception number that Delphi / C++ Builder use internally.

     

    Am I right?

    Yes its an exception number but honestly I don't know for what its used for. The explanation you gave is very logic and I'd adopt it until its proven otherwise. 

    Quote

    So I could declare them as constants for readability:

    I prefer this notation "if x = $0EEDFADE { cDelphiException } then" because it makes comparing to the assembly more easy. 


  2. 20 hours ago, Stefan Glienke said:

    I would guess its related to the type of the exception - I usually see this with hardware exceptions such as AV or SO.

    Yep you're totally right !

    Unlike software exception, hardware exception are not blessed to Delphi exception class. So the problem occurs because the expert is trying to read(dereference) class information for HE that does not have a class associated. Why it tries to do that ? ... Well that's my bad, it appeared that I reversed a branch when translating some assembly code to pascal :classic_sad: . 

     

    4 hours ago, Kas Ob. said:

    I did not sleep since Wednesday, and i think i am out of mental strength!

    I really appreciate your effort ... you did great 🙂 ! Thanks !

    Quote

    But found a bug in 64 bit with the exception message with wrong address's

    This happens even if you don't use GExperts ... So I guess its coming from the IDE. 

    4 hours ago, Kas Ob. said:

    What was i thinking and how did miss this?!!, IsBadReadPtr is wrong to begin with, and should not be used here as the debugged process is is not the one will be tested against GoodReadPtr.

    For IsBadReadPtr, you can use VirtualQuery to check for read access.

     

    @dummzeuch Below there is a fix ! 
    BTW Tomas, you should do a cleanup and remove all unused codes/features. I see that you're not using GetExceptionObject and yet its still being called and caused this issue. Same for sl.add(xxx).
    Another thing, I found that "Ignore All this Session" button does not scale properly when resizing the exception dialog.  

    function GetExceptionObjectNew(Thread: TThread): TAddress;
    var
      I: Integer;
      Src: TThreadOsInfoArray;
      Parsed: TParsedThreadOsInfoArray;
      P: PByte;
      C: Cardinal;
    begin
      // this function should be used only with new delphi versions
      Result := 0;
      ZeroMemory(@Src, SizeOf(Src));
      ZeroMemory(@Parsed, SizeOf(Parsed));
      I := GetThreadOsInfo(Thread, Src);
      if I <> 0 then
      begin
        case I of
          4, 6, 8, 7, 9, 10:
            Exit; // ==>
        end;
        ParseThreadOsInfo(Thread, Src, Parsed);
    
        // disasm TNativeThread::DoGetExceptionName
        P := @Parsed[0];
        Inc(P, $A8);
        P := PPointer(P)^;
        C := PCardinal(Integer(P) + $18)^;
        { !!! don't optimize me !!! }
        if (C <> $0EEDFAE6) then
        begin
          if C = $0EEDFADE then
          begin
            Inc(P, $38);
            Result := PUInt64(P)^;
          end
          else if C <> $0EEDFAE4 then
          begin
            Exit; // ==> hex.
          end
          else
          begin
            Inc(P, $38);
            Result := PUInt64(P)^;
          end;
        end
        else
        begin
          C := PCardinal(Integer(P) + $34)^;
          if C <> 0 then
          begin
            Inc(P, $48);
            Result := PUInt64(P)^;
          end
          else
          begin
            C := PCardinal(Integer(P) + $30)^;
            if C <> 1 then
            begin
              Inc(P, $48);
              Result := PUInt64(P)^;
            end
            else
            begin
              Exit;
            end;
          end;
        end;
      end;
    end;
    {$ENDIF}
    
    function GetExceptionObjectLegacy(Thread: TThread): TAddress;
    begin
      Result := 0;
      // This function should only be used with old Delphi versions, where GetExceptionObjectNew does
      // not work, that is ParseThradOsInfo does not exist.
      FDebugEventCritSect.Enter;
      if FDebugEvent.Exception.ExceptionRecord.NumberParameters > 1 then
      begin
        // Param[1] = Exception object.
        // FDebugEvent.dwProcessId = process id.
        // FDebugEvent.dwThreadId = dwThreadId id.
        // FDebugEvent.Exception.ExceptionRecord.ExceptionAddress = exception address.
        // see  TExceptionRecord for more info.
        if FDebugEvent.Exception.ExceptionRecord.ExceptionCode = $0EEDFADE { cDelphiException } then
          Result := FDebugEvent.Exception.ExceptionRecord.ExceptionInformation[1];
      end;
      FDebugEventCritSect.Leave;
    end;

     

     


  3. 2 hours ago, Stefan Glienke said:

    Updated the blog post - thanks

    There is another way without specifying explicitly the generic type param and it generates the same output. Internally slice requires dst(open array type), src(pointer type) and of course a count(integer type). It uses src to get the pointer and dst info to compute the offset meaning you can provide a fake declaration for the source:

     type TOpenArray = array[0 .. 0] of Byte;
     POpenArray  = ^TOpenArray ;
     MergeSort(Slice(POpenArray(@values[mid])^, len - mid));

     


  4. Very nice article Stefan and the way you used Slice to avoid the copy is amazing ! 

    Speaking of bugs, I found two drawback related to using open array param:

    First, I already wrote on this forum a topic about unnamed type and rtti. Compiler do not generate rtti for an unnamed type ! this is easy to avoid with most types because all what you need to do is to declare the type : type x = xxx. But for open array type that's impossible because the syntax conflicts with dynamic array syntax ! So I suppose there is no official way to have a correct rtti for a function that uses open array param.

    Second, On my D10.3, the TArray<T> example caused an "IDE/Compiler not responding" issue that always ends up by IDE restart:

     

    OpenArrayCapture.thumb.PNG.2d7a5e79dacd28ff7e3a0b576b281467.PNG

    I was able to isolate the line that causes the issue

    MergeSort(Slice(TOpenArray<T>((@values[mid])^), len - mid)); // if I comment this line everything works great !

    I tried with different build mode (debug,release) and on different project and the result is always the same :classic_sad:

    • Like 2

  5. Delphi IDE uses System.Classes.TStream.WriteDescendentRes to serialize the form/components properties. Property that belongs to the class is handled by System.Classes.TWriter.WriteProperty. However internal property such ExplicitXxx is handled by System.Classes.TWriter.DefineProperty. 

    // TPersistent.DefineProperties defines Explicit properties. and TWriter.DefineProperty encode them.
    procedure TControl.DefineProperties(Filer: TFiler);
    begin
      ...
      Filer.DefineProperty('IsControl', ReadIsControl, WriteIsControl, DoWriteIsControl);
      Filer.DefineProperty('ExplicitLeft', ReadExplicitLeft, WriteExplicitLeft, not (csReading in ComponentState) and DoWriteExplicit(edLeft));
      Filer.DefineProperty('ExplicitTop', ReadExplicitTop, WriteExplicitTop, not (csReading in ComponentState) and DoWriteExplicit(edTop));
      Filer.DefineProperty('ExplicitWidth', ReadExplicitWidth, WriteExplicitWidth, not (csReading in ComponentState) and DoWriteExplicit(edWidth));
      Filer.DefineProperty('ExplicitHeight', ReadExplicitHeight, WriteExplicitHeight, not (csReading in ComponentState) and DoWriteExplicit(edHeight));
    end;

    In a nutshell, hooking TWriter.DefineProperty will get you off EplicitXxx:

    procedure InterceptDefineProperty(Obj: TWriter; const Name: string; ReadData: TReaderProc; WriteData: TWriterProc; HasData: Boolean);
    begin
      // do nothing !!! or write a filter to allow certain ExplicitXxx.
    end;
    
    // install the hook :
    DefinePropertyPtr := GetProcAddress(GetModuleHandle('rtl260.bpl'), '@System@Classes@TWriter@DefineProperty$qqrx20System@UnicodeStringynpqqrp22System@Classes@TReader$vynpqqrp22System@Classes@TWriter$vo');
    @TrampolineDefineProperty := InterceptCreate(DefinePropertyPtr, @InterceptDefineProperty);

     

    • Like 2
    • Thanks 2

  6. 53 minutes ago, David Heffernan said:

    Why does anybody think that all this guessing would be useful? Does anybody have much experience of success when guessing?

     

    Solving eigen problems is a very challenging numerical problem.  Does anybody really believe that good solutions arise from guesswork?

     

    I am frankly embarrassed by this thread.

    For the same reason why anybody should think that your comment is constructive ! What does it add to OP ?

    I'm going to ask you gently, please stop trolling. We just ended up a hot argument and I said I'm sorry ... Please don't let me regret.  


  7. 2 hours ago, Stefan Glienke said:

    As a library developer I have this question after reading all this:

    how does this affect me.

     

    For example collections are an essential part of spring4d and they can be of any capacity from just a few items which fit into a small block up to collections that hold thousands of elements and where SetLength causes the used GetMem to use large blocks.

    So if anyone claims that the way the MM does it is not good the solution for many places that allocate variable size of memory is not to put if size < x then getmem else virtualalloc but to solve this properly inside of GetMem/the memory manager.

    Especially since GetMem is indirectly called by many things such as SetLength - if you directly allocate memory for your own sure you can choose one or the other.

    Yeah ! The real problem is not how to allocate the memory. But how to free it. In other word how efficiently someone will make difference about a pointer that belongs to a small block and a pointer from a large block without de-referencing it. 

    I believe that understanding the drawbacks is really required. i.e: Someone can benefit form it by freeing a collection items in the reverse order.

    Just one question Stefan : Why not let the user specify the nature of the collection (small, large) ?


  8. 1 hour ago, David Heffernan said:

    Not obvious at all. Usually you allocated memory because you wanted to use it.

     

    If you've swapped the entire block to disk then bringing a couple of pages back is the least of your worries. Solve your problem by avoiding swapping in the first place.

     

    Nothing you have said, in my view, supports your claim that all allocation of huge blocks should be done using VirtualAlloc.

    Far now, we only have your words ! Mention a study, benchmark or a formal statement that claims your words. 

    With all what was pointed out about the difference between GetMemory/FreeMemory and VirtualAlloc/VirtualFree and you still think that GetMemory/FreeMemory is suitable for large data ... WOW !


  9. 7 hours ago, Stefan Glienke said:

    Yes some are written to disk but usually not commonly used memory - especially not when there is plenty of RAM available.

    Yes Stefan but that does not change the fact that FreeMemory can be a bottleneck.

    The example I showed demonstrates paging. While its the most one that may have a several impact, there are other players too (cache thrashing, tlb thrashing, ...). 


  10. 1 minute ago, David Heffernan said:

    I don't see any benchmark to support the assertion that all allocation of huge blocks should be done using VirtualAlloc.

     

    Can you point to it. 

    You and Arnaud have read the article but you both didn't understand it clearly perhaps because it requires being familiar with some details. That's why I told Arnaud that he is wrong again and its not related to the malloc ! Even if someone replaces the malloc with FastMM, he will likely going to have the same result.

     

    I'll give a very simple example(I will do my best to make it understood by anyone) to explain why a VirtualFree is much better than using FreeMemory.

    Suppose you have allocated a bunch of large data using GetMemory. Obviously soon or later, a system paging will work and starts to swap pages from memory to disk. It happens that some of your pages(more likely the first allocated one) end-up on disk instead of memory. When time comes to a cleanup, you will call FreeMemory to free the allocated memory :

    // FreeMemory for large block
    function FreeLargeBlock(APointer: Pointer): Integer;
    var
      LPreviousLargeBlockHeader, LNextLargeBlockHeader: PLargeBlockHeader;
    begin
      {Point to the start of the large block}
      APointer := Pointer(PByte(APointer) - LargeBlockHeaderSize);
      {Get the previous and next large blocks}
      LPreviousLargeBlockHeader := PLargeBlockHeader(APointer).PreviousLargeBlockHeader;
      ...
    end;

    1 - As you can see the function tries to de-reference the pointer.

    2 - Because the page was on disk and not on memory, an interrupt (for simplification, think it just like an invisible exception) occurs at CPU level. CPU then suspends the process that tried to access the memory and sends an interrupt to the OS kernel : Hi kernel, a process "A" is trying to access an invalid memory at address "X".

    3 - Kernel search for a page "B" associated with that address "X". If the page was not found, it just an AV exception. Otherwise it proceeds to step 4.

    4 - Kernel moves a page "C" from memory to disk in order to make a place for the requested page "B".

    5 - Kernel loads the requested page "B" from disk to memory. 

    6 - Kernel resumes execution of the process "A" like nothing happened.

    7 - Process "A" calls VirtualFree.

    8 - Kernel dis-allocates page "B".

     

    Now, if you just used VirtualAlloc/VirtualFree, de-referencing the pointer is not required and all steps from 1 to 6 aren't necessary at all and paging is not happening too !!!

    The important thing is that some of the above steps are heavy ... and that's why on their analyze they were taking hours to free the memory. Because a swap from/to disk/memory is happening all the time.

     

    The Reverse Memory Free On Window Server 2008 R2 Datacenter Benchmark was consuming seconds instead of hours because they were clever to avoid system paging : They were freeing page in a reverse order than it was allocated. Meaning the last allocated page is getting free first(a last allocated page will likely resides on memory and not on disk) and steps 2 - 6 may not be necessary for all transaction. Yet the paging may still happen but not important as the original code.

     

    By understanding this, anyone can quickly understand that a FreeMemory (that implies using GetMemory) is an evil for large block when paging is active. And cannot at any case competes with VirtualFree. Going from hours to seconds is really a high optimization level that worth understanding and changing many bad practices/wrong believes.

     

    A better practice would be to use OS functions for larges block: 

    - They don't de-reference pointer ... no issue with paging.

    - They don't suffer from fragmentation as GetMemory does. Remember the last thing that anyone wants to see when allocating large data is to have fragmentation issue.

    - They are always aligned to SPS.

    - Thread safe : they only do one look. GetMemory/FreeMemory is looking twice !

    - Provides more additional options.

    - If portability is an issue: a wrapper will be very useful. Its not like we have a bench of system. Mostly a Windows and Posix:

     

    function GetLargeMemory(Size: SIZE_T): Pointer;
    begin
    {$IFDEF MSWINDOWS}
      Result := VirtualAlloc(nil, Size, MEM_COMMIT or MEM_RESERVE, PAGE_READWRITE);
    {$ELSE POSIX}
      Result := mmap(nil, Size, PROT_READ or PROT_WRITE, MAP_ANONYMOUS or MAP_PRIVATE or MAP_FIXED, 0, 0);
    {$ENDIF MSWINDOWS}
    end;
    
    procedure GetLargeMemory(P: Pointer; Size: SIZE_T);
    begin
    {$IFDEF MSWINDOWS}
      VirtualFree(P, Size, MEM_RELEASE);
    {$ELSE POSIX}
      munmap(P, size);
    {$ENDIF MSWINDOWS}
    end;

    I hope by now, everyone here is understanding the difference between OS functions and Delphi MM functions.


  11. 1 hour ago, Arnaud Bouchez said:

    @Mahdi Safsafi
    Your article refers to the C malloc on Windows - which is known to be far from optimized - much less optimized than the Delphi MM.
    For instance, the conclusion of the article doesn't apply to the Delphi MM: "If you have an application that uses a lot of memory allocation in relatively small chunks, you may want to consider using alternatives to malloc/free on Windows-based systems. While VirtualAlloc/VirtualFree are not appropriate for allocating less than a memory page they can greatly improve database performance and predictability when allocating memory in multiples of a single page.". This is exactly what FastMM4 does.

    When I wrote fragmentation won't increase for HUGE blocks, I meant > some MB blocks. With such size, I would probably reuse the very same buffer per thread if performance is needed.

     

    @Kas Ob.

    You are just proving my point: if you use very specific OS calls, you may need buffer aligned on memory page.

     

    Again you're completely wrong ! Even if you change C MM with Delphi MM. There is no way by any chance that FastMM can outperform OS functions. Allocating memory perhaps may give just a little closet result (single thread) to what OS may give but freeing memory will never outperform specially if you're under an environment where system paging is actively working. Should I explain more what does that mean ? I'd be very interested if you have some benchmark that prove otherwise.


    What you're missing Arnaud is the following : At a software level, it may look from the first sigh that there is no difference when using C/Delphi MM against OS Api. But the fact is the memory management is a very complex thing and requires collaboration from different component(RAM, Disk, Software, OS and even CPU). Without understanding the nature/relationship between those components ... you will never understand the full story. 
     


  12. 3 hours ago, Arnaud Bouchez said:

    Allocating 4KB more for huge blocks is not an issue.

    First, I'm not sure if you're aware about that. But 4KB is not the only available page size ! while most environment supports it, there're some that are using more than 4KB (link) !!!

    Second, you should know better than anyone what does this mean for fragmentation. Try to convince someone that he runs on a limited environment that for each large allocation, a 4KB or whatever is not an issue ! Personally, the last thing that I want to have when allocating some Gb is dealing with unnecessary fragmentation that can be easily avoided.

    Third, if I tell you that some OS functions are thread safe, what will be your answer ? Spoiler: Think twice before answering because the answer you're preparing is not what I'm expecting !

    Quote

    If you want the buffer aligned with system page granularity, then it is a very specific case, only needed by other OS calls, like changing the memory protection flags. It is theoritically possible, but very rare. This is the only reason when using the internal MM is not to be used.

    You're completely wrong on this ! its not like you need an aligned memory at SPS only to do some basic stuff like changing memory protection. There're many other case. Take a look at the high-performance I/O : ReadFileScatter, WriteFileGather.
    Do you know that for a read/write operation between mem/disk, its better to have an alignment at SPS. i.e: sections of PE files, large files. 
    Do you know that disk manufacture today tries to align their sector size at SPS (today we have 4Kb, 8Kb)?  See File buffering.

    Quote

     If you expect to see any performance benefit of using memory page-aligned, you are pretty wrong for huge blocks 

    I'm just going to pretend that I didn't hear that. You knew exactly what I was referring to !

     

    Final thing Arnaud and just to make things clear : I didn't invented the rule that says "using OS functions for large allocation is better than MM" myself. In fact, if you just did some research on the internet, you will find that many developers recommend using OS function for large data against any MM. I remember I saw a statement from Microsoft too !

    There was also an interesting benchmark comparing c malloc against Linus/Windows function for allocating/freeing small/large block ... you should definitely take a look at that: 

    https://raima.com/memory-management-allocation/

    @David Heffernan I also recommend that you read the above article. It will change your mind about when to call it "Premature optimisation" and when not !


     


  13. On 8/6/2020 at 4:11 PM, RDP1974 said:

    (I cannot understand why they don't invest in compiler and rtl that's the core value of the whole toolchain ... )

    I guess there is so little developers are currently interested about assembly.

    Use nasm(its very well maintained) to assemble your code and then use gdb/ndisasm to get the opcodes. Also, worth checking FPC (if I remember correctly, it already supports avx).

     


  14. 56 minutes ago, Arnaud Bouchez said:

    Delphi MM (FastMM4) is just a wrapper around the OS API for big blocks. No benefit of calling direclty the OS function, which is system-specific, and unsafe. Just use getmem/fremem/reallocmem everywhere.

    I know that already and I agree to all what you said except the thing that says there is no benefit for using OS function :

    Large data tends to be aligned, calling Delphi MM will likely result to allocate one extra page for storing pointer info if the requested size is at the system granularity. Also pointer is not aligned at the system page granularity. Moreover OS functions provides many options against DMM.

    So if portability is not an issue ... I really don't understand why someone will not use OS functions.

×