Jump to content

Primož Gabrijelčič

Members
  • Content Count

    246
  • Joined

  • Last visited

  • Days Won

    11

Everything posted by Primož Gabrijelčič

  1. When writing libraries you sometimes want to provide users (that is, programmers) with a flexible API. If a specific part of your library can be used in different ways, you may want to provide multiple overloaded methods accepting different combinations of parameters. For example, IOmniPipeline interface from OmniThreadLibrary implements three overloaded Stage functions. function Stage(pipelineStage: TPipelineSimpleStageDelegate; taskConfig: IOmniTaskConfig = nil): IOmniPipeline; overload; function Stage(pipelineStage: TPipelineStageDelegate; taskConfig: IOmniTaskConfig = nil): IOmniPipeline; overload; function Stage(pipelineStage: TPipelineStageDelegateEx; taskConfig: IOmniTaskConfig = nil): IOmniPipeline; overload; Delphi’s own System.Threading is even worse. In class TParallel, for example, there are 32 overloads of the &Forclass function. Thirty two! Not only it is hard to select appropriate function; it is also hard to decode something useful from the code completion tip. Check the image below – can you tell which overloaded version I’m trying to call? Me neither! Because of all that, it is usually good to minimize number of overloaded methods. We can do some work by adding default parameters, but sometimes this doesn’t help. Today I’d like to present an alternative solution – configuration records and operator overloading. To simplify things, I’ll present a mostly made-up problem. You can download it from github. An example type TConnector = class public procedure SetupBridge(const url1, url2: string); overload; procedure SetupBridge(const url1, proto2, host2, path2: string); overload; procedure SetupBridge(const proto1, host1, path1, proto2, host2, path2: string); overload; // procedure SetupBridge(const proto1, host1, path1, url2: string); overload; end; This class expects two URL parameters but allows the user to provide them in different forms – either as a full URL (for example, ‘http://www.thedelphigeek.com/index.html’) or as (protocol, host, path) triplets (for example, ‘http’, ‘www.thedelphigeek.com’, ‘index.html’). Besides the obvious problem of writing – and maintaining – four overloads this code exhibits another problem. We simply cannot provide all four alternatives to the user! The problem lies in the fact that the second and fourth (commented out) overload both contain four string parameters. Delphi doesn’t allow that – and for a good reason! If we could define both at the same time, the compiler would have absolutely no idea which method to call if we write SetupBridge(‘1’, ‘2’, ‘3’, ‘4’). Both versions would be equally valid candidates! So – strike one. We cannot even write the API that we would like to provide. Even worse – the user may get confused and may expect that we did provide the fourth version and they try to use it. Like this: conn := TConnector.Create; try conn.SetupBridge('http://www.thedelphigeek.com/index.html', 'http://bad.horse/'); conn.SetupBridge('http://www.thedelphigeek.com/index.html', 'http', 'bad.horse', ''); conn.SetupBridge('http', 'www.thedelphigeek.com', 'index.html', 'http', 'bad.horse', ''); // this compiles, ouch: conn.SetupBridge('http', 'www.thedelphigeek.com', 'index.html', 'http://bad.horse/'); finally FreeAndNil(conn); end; Although the last call to SetupBridge compiles, it does something that user doesn’t expect. The code calls the second SetupBridge overload and sets url 1 to ‘http’ and url 2 to (‘www.thedelphigeek.com’, ‘index.html’, ‘http://bad.horse/’). Strike two. The output of the program proves that (all ‘1:’ lines should be equal, as should be all ‘2:’ lines): Last but not least – the API is not very good. When we need to pass lots of configuration to a method, it is better to pack the configuration into meaningful units. So – strike three and out. Let’s rewrite the code! A solution Records are good solution for packing configuration into meaningful units. Let’s try and rewrite the API to use record-based configuration. TURL = record end; TConnector2 = class public procedure SetupBridge(const url1, url2: TURL); end; Much better. Just one overload! Still, there’s a problem of putting information inside the TURL record. I could add a bunch of properties and write: url1.Proto := 'http'; url1.Host := 'www.thedelphigeek.com'; url1.Path := 'index.html'; url2.URL := 'http://bad.horse/'; conn2.SetupBridge(url1, url2); Clumsy. I have to declare two variables and type lots of code. No. I could also create two constructors and write: conn2.SetupBridge(TURL.Create('http', 'www.thedelphigeek.com', 'index.html'), TURL.Create('http://bad.horse/')); conn2.SetupBridge(TURL.Create('http://www.thedelphigeek.com/index.html'), TURL.Create('http://bad.horse/')); That looks better, but still – in the second SetupBridge call both TURL.Create calls look completely out of place. Do I have to pull back and rewrite my API like this? TConnector = class public procedure SetupBridge(const url1, url2: string); overload; procedure SetupBridge(const url1: string; const url2: TURL); overload; procedure SetupBridge(const url1, url2: TURL); overload; procedure SetupBridge(const url1: TURL; const url2: string); overload; end; Well, yes, this is a possibility. It solves the problem of supporting all four combinations and it nicely puts related information into one unit. Still, we can do better. Operators to the rescue! I’m quite happy with the Create approach for providing an information triplet. it is the other variant – the one with just a single URL parameter – that I would like to simplify. I would just like to provide a simple string when the URL is in one piece. To support that, we only have to add an Implicit operator which converts a string into a TURL record. (Another one converting TURL into a string is also helpful as it simplifies the use of TURL inside the TConnector class.) Here is full implementation for TURL: TURL = record strict private FUrl: string; public constructor Create(const proto, host, path: string); class operator Implicit(const url: string): TURL; class operator Implicit(const url: TURL): string; end; constructor TURL.Create(const proto, host, path: string); begin FURL := proto + '://' + host + '/' + path; end; class operator TURL.Implicit(const url: string): TURL; begin Result.FURL := url; end; class operator TURL.Implicit(const url: TURL): string; begin Result := url.FURL; end; Simple, isn’t it? The implementation uses the fact that TConnector has no need to access separate URL components. It is quite happy with the concatenated version, created in the TURL.Create. This allows us to provide parameters in a way that is – at least for me – a good compromise. It allows for a (relatively) simple use and the implementation is also (relatively) simple: conn2 := TConnector2.Create; try conn2.SetupBridge('http://www.thedelphigeek.com/index.html', 'http://bad.horse/'); conn2.SetupBridge('http://www.thedelphigeek.com/index.html', TURL.Create('http', 'bad.horse', '')); conn2.SetupBridge(TURL.Create('http', 'www.thedelphigeek.com', 'index.html'), TURL.Create('http', 'bad.horse', '')); // this works as expected: conn2.SetupBridge(TURL.Create('http', 'www.thedelphigeek.com', 'index.html'), 'http://bad.horse/'); finally FreeAndNil(conn2); end; The output from the program shows that everything is OK now:
  2. Primož Gabrijelčič

    Aligned and atomic read/write

    More data, some old, some new. Firstly, two very old CPUs (the oldest I could find in the company): This pattern repeats very consistently every 64 bytes (size of cache line): Very interesting pattern but the worst thing is the terrible slowdown when memory access crosses the cache line. Similar data can be seen in a Xeon of a similar age: For a moment I thought I used the wrong data files - that's how similar both results are! And now a suprise! A very modern & fast AMD Ryzen Threadripper: Wow! The cache line is only 32 bytes and memory access across that line is still slow! Interestingly, accessing 4-aligned 8-byte data in 64-bits works great even when straddling cache line. MemAtomic proves that cache line is only 32-byte: 1: 2: 15 31 47 63 79 95 111 127 4: 13 14 15 29 30 31 45 46 47 61 62 63 77 78 79 93 94 95 109 110 111 125 126 127 8: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 No wonder Intel is still a king for non-optimized software!
  3. Primož Gabrijelčič

    Aligned and atomic read/write

    Yes, of course. My question was related to one thread reading and one thread writing.
  4. Primož Gabrijelčič

    Aligned and atomic read/write

    To answer myself - writer doesn't need to use 'lock'. The proof is here: https://github.com/gabr42/GpDelphiCode/tree/master/MemAtomic The code runs tests with 1/2/4/8 byte data on offsets from 0 to 127 (relative to a well-aligned memory block). It writes out all offsets where reads/writes were not atomic. 32-bit 1: 2: 63 127 4: 61 62 63 125 126 127 8: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 2/4 bytes: access is not atomic when straddling a cache line 8 bytes: access is never atomic 64-bit 1: 2: 63 127 4: 61 62 63 125 126 127 8: 57 58 59 60 61 62 63 121 122 123 124 125 126 127 2/4/8 bytes: access is not atomic when straddling a cache line
  5. Primož Gabrijelčič

    Aligned and atomic read/write

    Can you please specify what exactly do you mean with this statement? If nobody is writing to the memory then the statement is obviously true. It doesn't matter whether reads are atomic or not - the data will always be correct. If there is a writer - does it have to be writing with 'lock' prefix or no?
  6. Primož Gabrijelčič

    Aligned and atomic read/write

    Measurements from my i7: Basically the same as on the Xeon. A bit larger speed difference between 32-bit DWORD and 64-bit QWORD. Remember - smaller is faster.
  7. Primož Gabrijelčič

    Aligned and atomic read/write

    I put together a simple test measuring aligned vs. unaligned access speed. You can find it here: https://github.com/gabr42/GpDelphiCode/tree/master/MemSpeed The code simply runs in a tight loop and does the following 1 million times: pData^ := {$IFDEF X64}$F0F0F0F0F0F0F0F0{$ELSE}$F0F0F0F0{$ENDIF}; pData^ := {$IFDEF X64}$0F0F0F0F0F0F0F0F{$ELSE}$0F0F0F0F{$ENDIF}; All this is repeated for each offset in a 1024-byte buffer. All above is repeated ten times. Memory is allocated with VirtualAlloc which gives back nicely aligned blocks. At the end, the shortest time for each offset is logged into a file of your choosing. I was only interested in relative differences so I did nothing to convert data to "real" time unit. Warning: The code needs more than a minute to run on my slow Xeon. If I graph the result in Excel, I get this: There's basically no difference (besides the noise - all the junk I got installed on Windows was running along the test program). There's no difference over the whole 1024-byte range. QWORD access in 64-bit is slightly faster than DWORD access in 32-bit and that's that. Any contribution to the code will be welcome.
  8. Primož Gabrijelčič

    Aligned and atomic read/write

    Can you please give me some references that proof your statement I can confirm (from experience) that this is indeed true. I cannot find any definitive document about that, but it looks like since 2011/12 unaligned access doesn't hurt very much (at least on Intel platform). Some 3rd party posts that confirm my finding: https://lemire.me/blog/2012/05/31/data-alignment-for-speed-myth-or-reality/ https://www.reddit.com/r/programming/comments/2la6qc/data_alignment_for_speed_myth_or_reality/
  9. Primož Gabrijelčič

    How to combine a byte and a word as a hotkey word?

    Indeed, I got the last line wrong. Glad that you fixed it!
  10. Primož Gabrijelčič

    How to combine a byte and a word as a hotkey word?

    HotKeyMod := ( TextToShortcut( edHotKey.Text ) ); if cbALT.Checked then HotKeyMod := ( HotKeyMod or HOTKEYF_ALT ); if cbCTRL.Checked then HotKeyMod := ( HotKeyMod or HOTKEYF_CONTROL ); if cbSHIFT.Checked then HotKeyMod := ( HotKeyMod or HOTKEYF_SHIFT ); SLI.HotKey := (HotKeyMod SHL 8) OR (SLI.HotKey AND $FF);
  11. Primož Gabrijelčič

    Set Tab Order expert

    I always execute the Set Tab Order expert by clicking on the GExperts menu and the clicking on the expert. No keyboard, no popup.
  12. Primož Gabrijelčič

    Set Tab Order expert

    I'm using your experimental version 1.3.9.59 in Berlin, if that's of any help. As I was using quite old GExperts, I now tested with 1.3.10.63 experimental and with 1.3.11.64 experimental. Works fine in both. I have recorded a short video showing how Set Tab Order works for me (and how it had always worked). https://www.dropbox.com/s/t4p06tm58q4pj92/settab.mp4?dl=0 IIRC, this way of setting tab order was actually the primary mode for this expert in the beginning. Delphi had its clumsy Edit, Tab order and GExperts added the "select all, activate expert" expert. IIRC in the beginning there was no dialog - tab order got set and that was that. (I could be mistaken, of course.)
  13. Primož Gabrijelčič

    Set Tab Order expert

    I don't understand what "reverse order" you are talking about. Click on first control, shift-click on the second control, shift-click on the third control. GExperts, Set Tab Order Controls are listed in the order I clicked them: 1, 2, 3.
  14. Primož Gabrijelčič

    Time bomb

    What are you actually asking? Defuse what?
  15. Primož Gabrijelčič

    Set Tab Order expert

    I almost exclusively use the "select controls in the desired order, activate the expert" mode. Been using it since who-knows-when.
  16. Primož Gabrijelčič

    Directions for ARC Memory Management

    Then the initalization process is called and it uses RTTI and it is slow, indeed. But it makes no difference in almost any use-case on this planet. (Except for the few cases where it makes a big difference. 🙂 )
  17. Primož Gabrijelčič

    Directions for ARC Memory Management

    uses Spring; var ms := Shared.New(TStringList.Create); ...
  18. Primož Gabrijelčič

    How to start?

    This is mentioned in the book: http://www.omnithreadlibrary.com/book/chap10.html#howto-com P.S. Make sure that you don't call any VCL code from your background threads!
  19. Primož Gabrijelčič

    How to start?

    Run it in the debugger and check where this exception pops up.
  20. Primož Gabrijelčič

    How to start?

    @KodeZwerg Fixed, thank you!
×