Jump to content

Mahdi Safsafi

Members
  • Content Count

    383
  • Joined

  • Last visited

  • Days Won

    10

Everything posted by Mahdi Safsafi

  1. @Dave Novo As David pointed out, you may need to use @stk.List[xxx].
  2. Yes that's right.
  3. "@" operator requires a variable/constant/..., Here is your way : type PMyRec = ^TMyRec; TMyRec = record s: string; public function Address(): PMyRec; inline; end; var stk: TStack<TMyRec>; rec: TMyRec; prec: PMyRec; { TMyRec } function TMyRec.Address: PMyRec; begin Result := @Self; end; begin stk := TStack<TMyRec>.Create; rec.s := 'Hello'; stk.push(rec); prec := stk.Peek().Address(); Writeln(prec.s); prec.s := 'Goodbye'; stk.Free; end.
  4. Mahdi Safsafi

    TMemoryStream.Write

    Its developer responsibility to make the cleanup. As I said before, I'm not sure(I may be wrong) ... So better way to find out is to try 😉
  5. Mahdi Safsafi

    TMemoryStream.Write

    @Kas Ob. Well I didn't try Delphi Sydney but I think(I'm not sure) that managed record does not have try/finally section because the record isn't allocated dynamically.
  6. Mahdi Safsafi

    TMemoryStream.Write

    This only applies to dcc32 (Win32) as it uses stack-based-exception mechanism. Exception handler executes faster but this known to add 15% overhead when there is no exception. On the other hand, Win64 uses table-based-exception mechanism. Exception handler executes a little bit slower since its handled exclusively by the runtime but does not add any overhead when there is no exception.
  7. Mahdi Safsafi

    TMemoryStream.Write

    I see 🙂 For the first time when a CPU sees a branch, CPU uses static predictor : Mostly all CPU assumes that a backward branch is going to be taken for the reason I explained with the loop example. For forward branch, many CPU assumes (but not all) that a forward branch is not going to be taken. Some of them do a random prediction like core2. UPDATE: @Attila KovacsI forgot to answer your if/else question. remember what I said, for first time seen, CPU assumes "if" is taken because we intend to execute if-statement. So it's else section that is not going to be taken.
  8. Mahdi Safsafi

    TMemoryStream.Write

    Don't worry ! I'll try to give a simple explanation 🙂 There're many kind of jump (direct, indirect, relative, ...). Relative jmp is the most used one and the efficient one. Relative means that the offset is relative to Program Counter PC(PC is a register that holds the current instruction pointer ... on x86 its a protected register ... on an ugly implementation such aarch32(ARM) its a public register.). Forward jump means the offset of jmp is positive (hence we are jumping down). Backward jump means offset is negative(jumping up). # address # opcodes # instruction # comment # (in decimal) backward_label: 00000000 85C0 test eax,eax 00000002 7407 jz forward_label ; PC=00000002 OFFSET=7 ; dest_addr = PC + OFFSET + SizeOf(CurrentInstruction) = 2 + 7 + 2 = 11 00000004 B801000000 mov eax,$00000001 00000009 EBF5 jmp backward_label ; PC=00000009 OFFSET=0xF5(-11) : dest_addr = Same_Formula_Above = 9 - 11 + 2 = 0 forward_label: 00000011 C3 ret Now I believe forward/backward branches are clear for you. The interesting part about backward branch prediction is that CPU assumes that its taken because usually backward branches are used for loop: // pascal code: begin for i = 0 to 10 do begin # dosomething ... end; # dosomething2 ... end; // asm version: xor ecx,ecx backward_label: # dosomething ... inc ecx cmp ecx, 11 jnz backward_label ; backward branch state2: # dosomething2 ... //------------------------------------------------------------------------------------------ if CPU assumes that the backward branch is not taken ... then it's a performance penalty ! For each iteration, CPU wastes time to execute state2 instructions and when it realizes that BP was wrong, it tries to recover ! there are 11 iteration !!! this can lead to a huge overhead if we are processing a large amount of data. On the other hand, if it assumes that the backward branch is taken, it would save a lot of time and recover only on the last n+1 item. Now I believe you are understanding the concept 🙂 and you can easily answer your own question 😉
  9. Mahdi Safsafi

    TMemoryStream.Write

    They're considered as first seen when there is no previous information available. When CPU has no prior information about a branch, it assumes that the jump is taken (because most of the time we intend to execute the code inside "if statement"). BP uses complex algorithms for the prediction (85-90% of the prediction are correct !). Modern CPU achieves up to 95% ! Those algorithms kept improved since the first time. As I said before, for recent CPU, there is a full specialized CPU-unit for BP. While your program is running, CPU records all executed branches. When calling a logic for the second, third time, ... CPU uses previous available information to predicates whether a branch is going to be taken or not. Note that this technology is widely used by many architectures (not only for x86). However, implementation varies (some of them have a dedicated BP unit, others not, some of them are more able of OoOE, others have limited support, ... ).
  10. Mahdi Safsafi

    TMemoryStream.Write

    Longint does not have a fixed width on all platforms. // Delphi 10.3 // System unit // line 242: {$IF SizeOf(LongInt) = 8} {$DEFINE LONGINT64} {$DEFINE LONGINTISCPPLONG} {$ENDIF} So I prefer to stick with the original declaration.
  11. Mahdi Safsafi

    TMemoryStream.Write

    @Kas Ob. You may need to take a look at Intel performance counter monitor(PCM). I didn't tried it myself but a friend of mine recommended it for me.
  12. Mahdi Safsafi

    TMemoryStream.Write

    AFAIK, there is no implementation that can handle this massive amount of data. Stream uses Int64 to allow a stream to handle > 4GB but not 2^33GB. Why you replaced LongInt with Integer for count ? Is this a typos ?
  13. Mahdi Safsafi

    TMemoryStream.Write

    OoOE is an old technology designed to improve CPU idle. That means CPU is free to rearrange the order of execution of a logic but must yield the same expectation. Branch prediction (BP) was needed to make that feature works smoothly. So in short, there was two important time-frame here. an era before P4, where developers/compilers were forced to cooperate with CPU in order to benefit from this feature. CPU provided special opcode prefix for branch called hint_prediction ( 0x2E: Branch Not Taken; 0x3E: Branch Taken). Compiler such gcc provided built in function to support that (__builtin_expect ). Developers were forced to use the asm version or the builtin function to benefit from BP. Obviously, this wasn't good because only high qualified developers(that clearly understood the technology) were able to use it. Besides they couldn't even cover all their logic (it was just impossible to cover all your if statements). CPU maker realized that and replaced the ugly prefix 0x2E, 0x3E by an automatic solution that does not require developer cooperation (>P4). Today, CPU maker are working heavy to improve OoOE/BP because it was clear that on a multiple run (not a single run) this can give a high performance (They dedicated special CPU unit for just BP). Now life is more easy but yet you still need to understand the concept in order to make a high performance version of your logic. For example, the second implementation of TMemoryStream.Write has two state and spins on Result:=0. Original implementation has mostly one state and highly optimized for Count > 0 on a multiple run. function TMemoryStream.Write(const Buffer; Count: Integer): Longint; var Pos: Int64; begin // there is a hight chance that the condition is false. if CPU predicates to true ... thats a waste of time. if (FPosition < 0) or (Count <= 0) then begin Result := 0; Exit; // cpu must wait to validate BP(idle). end else begin // state2: Pos := FPosition + Count; if Pos > FSize then begin if Pos > FCapacity then SetCapacity(Pos); FSize := Pos; end; System.Move(Buffer, Pointer(Longint(FMemory) + FPosition)^, Count); FPosition := Pos; Result := Count; // if CPU prediction was wrong, it must recover from state2(heavy). end; end; function TMemoryStream.Write(const Buffer; Count: Longint): Longint; var Pos: Int64; begin // there is a hight chance that the condition is true. If CPU predicates to true ... it saved time. if (FPosition >= 0) and (Count >= 0) then begin Pos := FPosition + Count; if Pos > 0 then begin if Pos > FSize then begin if Pos > FCapacity then SetCapacity(Pos); FSize := Pos; end; System.Move(Buffer, (PByte(FMemory) + FPosition)^, Count); FPosition := Pos; Result := Count; Exit; // cpu may wait here to validate BP(s). end; end; // state2: Result := 0; // recovering from state2 is not heavy. end;
  14. Mahdi Safsafi

    TMemoryStream.Write

    Theoretically what you said is true. However, in practice Count could be (-) ! Moreover, CPU likes the first implementation over yours, the probability to call write with Count > 0 is much higher than calling it with zero ! (Out-of-order execution).
  15. Mahdi Safsafi

    Unnamed types and RTTI

    Hello, Take a look at the code/outputs bellow : type TMyClass = class FldEnum: (A, B, C); FldSet: set of (D, E, F); FldSubRange: 5 .. 10; FldRec: record FA: Integer; FB: Integer; end; FldInteger: Integer; FldString: string; FldArray: array [0 .. 2] of Integer; FldList: TList < (G, H, I) >; FldArrayOfRec: array [0 .. 2] of record A: Char; B: Char; end; end; type TMyClass2<T> = class(TMyClass) FldEnum: (A2, B2, C2); FldSet: set of (D2, E2, F2); FldSubRange: 5 .. 10; FldRec: record FA: Integer; FB: Integer; end; FldInteger: Integer; FldString: string; FldArray: array [0 .. 2] of Integer; FldList: TList < (G2, H2, I2) >; FldArrayOfRec: array [0 .. 2] of record A: Char; B: Char; end; end; procedure ShowRtti(AObj: TObject); var LCtx: TRttiContext; LType: TRttiType; LField: TRttiField; LFieldType: TRttiType; begin LCtx := TRttiContext.Create(); LType := LCtx.GetType(AObj.ClassInfo); Writeln('------------ RTTI for ', AObj.ToString, ' ------------'); for LField in LType.GetFields() do begin LFieldType := LField.FieldType; if Assigned(LFieldType) then begin Writeln(LField.Name:15, ' -> ', LFieldType.Name); end; end; Writeln(''); LCtx.Free(); end; var Obj1: TMyClass; Obj2: TMyClass2<Integer>; begin Obj1 := TMyClass.Create(); Obj2 := TMyClass2<Integer>.Create(); ShowRtti(Obj1); ShowRtti(Obj2); Obj1.Free(); Obj2.Free(); Readln; end. // --- outputs --- { ------------ RTTI for TMyClass ------------ FldEnum -> :TMyClass.:1 FldRec -> :TMyClass.:3 FldInteger -> Integer FldString -> string FldList -> TList<Project1.:TMyClass.:4> ------------ RTTI for TMyClass2<System.Integer> ------------ FldEnum -> TMyClass2<System.Integer>.:1 FldSet -> TMyClass2<System.Integer>.:3 FldSubRange -> TMyClass2<System.Integer>.:4 FldRec -> TMyClass2<System.Integer>.:5 FldInteger -> Integer FldString -> string FldArray -> TMyClass2<System.Integer>.:7 FldList -> TList<Project1.TMyClass2<System.Integer>.:8> FldArrayOfRec -> TMyClass2<System.Integer>.:11 FldEnum -> :TMyClass.:1 FldRec -> :TMyClass.:3 FldInteger -> Integer FldString -> string FldList -> TList<Project1.:TMyClass.:4> } As you can see, for TMyClass, some unnamed types(record, enum,) have associated RTTI. But types such (subrange, sets, array) don't have ! On the other hand, all fields of TMyClass2 have associated RTTI. This is definitely a bug as the compiler should only accept one behavior(either enables RTTI for all unnamed types or disables them). ... but the question I'm asking is : what is the correct behavior ? In other word, should an unnamed type have RTTI or not ? All typed languages I'm familiar with solved this by not allowing unnamed/anonymous types. C/C++ are an exception ! They allow both unnamed and anonymous types but they don't have RTTI system (at least an advanced system like Delphi). So its kind hard to know whats the correct behavior when there is no reference around. BTW, I'd love to see how FPC is handling it. In my opinion, I think that compiler should generate RTTI for unnamed types ... But when I think deeply I say no ! this is an unnamed type (most likely to be anonymous. You declared it implicitly ... why you should expect to have explicit RTTI in return ?) Please guys, I'm not asking for a workaround/good practice/historical reason ... just focus on the question 🙂
  16. Mahdi Safsafi

    Unnamed types and RTTI

    @Kas Ob. for FldList, it sounds Ok. Compiler used Integer for just the name ... all other RTTI information are valid (e.g: MaxValue). BTW, Notepad++ provides a user defined language. You can quickly make your pascal-variant 😉
  17. Mahdi Safsafi

    Unnamed types and RTTI

    Wow ! I missed that ... I'll investigate and let you know. No, but sounds interesting ... I'll take a look later.
  18. As long as the compiler allows compiler directives... There would be no reliable tools.
  19. @Stefan Glienke Aha I see ! I wasn't meaning "TArray<T>" in particular. My example even didn't used it .
  20. Just for clarification, I used two different word : little overhead & noticeable overhead to distinguish between two different usage of TArray<T>. You definitely understood my example In fact, for Unit3, compiler only emitted interface for the alias-type without implementation (without machine code generation). For Unit1 and Unit2 it emitted the interface and the implementation (generated code).
  21. Yes, there is a disadvantage of using TArray<TItem> instead of TItems. In the same unit where class is declared, whenever compiler found explicit generic type(TArray<TItem>), compiler must do extra work : matching arguments, checking for constraints, ... In a large unit that uses generics massively, this may add a little overhead. TItems on the other side, works as a cache (compiler does not need to check constraints for example). Using TArray<TItem> from another unit adds a noticeable overhead as the compiler must generate the type in-situ for that unit. In fact do the following test yourself: unit Unit1; interface uses System.SysUtils, System.Generics.Collections, System.Classes; type TObject<T> = class a: T; b: T; procedure foo(a, b: T); end; TListOfInteger = TList<TObject<Integer>>; implementation { TObject<T> } procedure TObject<T>.foo(a, b: T); begin end; end. // --------------------------------------- unit Unit2; interface uses System.SysUtils, System.Generics.Collections, System.Classes, Unit1; type TListOfInteger2 = TList<TObject<Integer>>; implementation end. //----------------------------------------- unit Unit3; interface uses System.SysUtils, System.Generics.Collections, System.Classes, Unit1; type TListOfInteger3 = TListOfInteger; // alias implementation end. Now, check the size of Unit1.dcu, Unit2.dcu, Unit3.dcu. Final thing, TItems is more friendly for typing and reading !
  22. Mahdi Safsafi

    Class Constructor in Delphi 10.4

    Indeed ! D official doc is also super good. Perhaps Andrei was behind it too.
  23. Mahdi Safsafi

    Class Constructor in Delphi 10.4

    If it was true ... then we're definitely using Elphi 😉
  24. Mahdi Safsafi

    Class Constructor in Delphi 10.4

    You're absolutely right Stefan. Particularly, the awesome D language has initialization and finalization sections, but it implements them in a very sexy way : Static constructors(initialization) are executed to initialize a module(unit)'s state. Static destructors(finalization) terminate a module's state. A module may have multiple static constructors and static destructors. The static constructors are run in lexical order, the static destructors are run in reverse lexical order. Non-shared static constructors and destructors are run whenever threads are created or destroyed, including the main thread. Shared static constructors are run once before main() is called. Shared static destructors are run after the main() function returns. But as you said, it can be a source of lot of bugs.
  25. Mahdi Safsafi

    Class Constructor in Delphi 10.4

    Class constructors are executed in the lexical order in which they implemented (not declared). Class destructor executed in the reverse order in which they implemented. NOTE: Many languages follow the same principle.
×