Jump to content

Zacherl

Members
  • Content Count

    23
  • Joined

  • Last visited

Posts posted by Zacherl


  1. 1 hour ago, Primož Gabrijelčič said:

    If there is a writer - does it have to be writing with 'lock' prefix or no? 

    You will need a lock prefix only, if you have concurrent threads reading, modifying (e.g. incrementing by one) and writing values. This is to make sure the value does not get changed by another thread in a way like this:

    T1: read 0

    T2: read 0

    T2: inc

    T2: write 1

    T1: inc

    T1: write 1

     

    By locking that operation, read + increment + write is always performed atomic:

    T1: read 0

    T1: inc

    T1: write 1

    T2: read 1

    T2: inc

    T3: write 2

     

    For forther explanation read this:

    6 hours ago, Zacherl said:

    For multi threaded read-modify-write access, read this (TLDR: you will need to use `TInterlocked.XXX()`):

    https://stackoverflow.com/a/5421844/9241044

     

    If you only want to write values without reading and modifying of the previous value, no `LOCK` prefix is needed.


  2. 51 minutes ago, Primož Gabrijelčič said:

    I can confirm (from experience) that this is indeed true. I cannot find any definitive document about that, but it looks like since 2011/12 unaligned access doesn't hurt very much (at least on Intel platform).

    Well, okay seems like the Intel SDM needs to be updated. Did anybody test the same szenario with multi-threaded atomic write operations (`lock xchg`, `lock add`, and so on)? Could imagine different results in terms of performance here.

     

    Anyways .. I guess the original question is answered. To summarize this:

    • 1 byte reads are always atomic
    • 2/4/8 byte reads are atomic, if executed on a P6+ and fitting in a single cache line (any sane compiler should use correct alignments by itself)

    For multi threaded read-modify-write access, read this (TLDR: you will need to use `TInterlocked.XXX()`):

    https://stackoverflow.com/a/5421844/9241044

    • Like 1

  3. 4 hours ago, David Heffernan said:

    Actually both of these statements are wrong. Reading unaligned memory is not atomic for reads that straddle cache lines.

    Sorry, but did you actually read my post? 

     

    Thats exactly what I quoted from the latest Intel SDM:

    13 hours ago, Zacherl said:

     • Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

     

    4 hours ago, David Heffernan said:

    And unaligned memory access is not slow on modern processors. 

    Can you please give me some references that proof your statement? Intel SDM says (you might be correct for normal data access, but using the `LOCK` prefix is something else):

    13 hours ago, Zacherl said:

    nonaligned data accesses will seriously impact the performance of the processor and should be avoided.

     


  4. Reading unaligned values from memory is slow but should still be atomic (on a Pentium6 and newer). This behavior is described in the Intel SDM:

    Quote

    8.1.1 Guaranteed Atomic Operations

    The Intel486 processor (and newer processors since) guarantees that the following basic memory operations will
    always be carried out atomically:
     • Reading or writing a byte
     • Reading or writing a word aligned on a 16-bit boundary
     • Reading or writing a doubleword aligned on a 32-bit boundary
    The Pentium processor (and newer processors since) guarantees that the following additional memory operations
    will always be carried out atomically:
     • Reading or writing a quadword aligned on a 64-bit boundary
     • 16-bit accesses to uncached memory locations that fit within a 32-bit data bus
    The P6 family processors (and newer processors since) guarantee that the following additional memory operation
    will always be carried out atomically:
     • Unaligned 16-, 32-, and 64-bit accesses to cached memory that fit within a cache line

    The above is only valid for single-core CPUs. For multi-core CPUs you will need to utilize the "bus control signals" (click here for explanation:

    Quote

    Accesses to cacheable memory that are split across cache lines and page boundaries are not guaranteed to be atomic by the Intel Core 2 Duo, Intel® Atom™, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors. The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, and P6 family processors provide bus control signals that permit external memory subsystems to make split accesses atomic; however, nonaligned data accesses will seriously impact the performance of the processor and should be avoided.

     

    Delphi implements the System.SyncObjs.TInterlocked class which provides some functions to help with atomic access (e.g. `TInterlocked.Exchange()` for an atomic exchange operation). You should always use these functions to make sure your application does not run into race conditions on multi-core systems.

     

    BTW: I released an unit with a few atomic type wrappers some time ago (limited functionality compared to the `std::atomic<T>` C++ types as Delphi does not allow overloading of the assignment operator):

    https://github.com/flobernd/delphi-utils/blob/master/Utils.AtomicTypes.pas


  5. 14 hours ago, Kryvich said:

    The best way to fix a bug is to identify it as early as possible. I use asserts to validate the input parameters of subroutines, to verify the result of a function, and sometimes inside a subroutine if needed. Asserts always enabled during development, at alpha and beta testing stages. After release, they can be disabled if they noticeably affect the speed of the application.

    You are correct ofc, but assertions are only used to detect programming errors (should be at least ^^) and allow for advanced tests like fuzzing, etc. They should in no case be used as a replacement for runtime exceptions. If a runtime exception occurs on a productive system, you need a real logging functionality included in your product to be able to reproduce the bug (independently of the assertions, which are always good to have 🙂 ).

     

    There is madExcept and stuff like that which will provide you callstacks in case of a runtime exception, but sometimes event that is not sufficient (e.g. in cases where code fails silently without exception and so on).


  6. I remember a that that discussed a similar topic .. it was something like "How to detect, if I can append data to a file". The solution was: Try it and catch errors. There are just too many variables that you would have to check (file exists, access privileges, exclusive access, ...). 

     

    If you need cross-platform, I can just quote myself:

    3 hours ago, Zacherl said:

    Just trying to open the file is already the correct solution IMHO. You should catch the exception and retry until it works.

     

    There are other approaches, if cross-platform support is not needed, but none of them are easy (on Windows you could enumerate all open handles to the file e.g. or work with global `CloseHandle` hooks).

    • Thanks 1

  7. Isn't the optimizer smart enough to remove empty function calls? Well ... I know ... it's Delphi ... and Delphi is not really known for perfectly optimized code, but detecting empty functions is a really easy task. Using `IFDEF`s inside the logging function itself would be a better solution in this case.

     

    Edit;

    2 minutes ago, Zacherl said:

    Isn't the optimizer smart enough to remove empty function calls?

    Well tested it, and ... ofc it does not eliminate calls to empty functions :classic_dry: Generates a `CALL` that points to a single `RET` instruction instead. This should not happen in 2018.


  8. Assert is fine, but sometimes you want to have logging code in your productive version. Asserting is not a good idea here as it will raise an exception. Besides that only one event can be logged at a time (as the application crashes after the first assertion / or executes the exception handler what will them prevent execution of successive assertions).


  9. 5 hours ago, Kryvich said:

    As a side note:

    
    ...
    mov [ebp-$04],eax
    mov eax,[ebp-$04]    // <--- ???
    call @UStrAddRef
    ...

    Compiler optimization is on. There is a place for optimizations. 😉 

    This shitty code has been generated for ages now. Don't think they will ever fix this.


  10. 1 minute ago, Markus Kinzler said:

    The JavaScript part seems much mory complex. The difference between vBulletin ( german) and IPS (here) is higher on less powerfull machines.

    As I said: I only measured the pure network load times, not the CPU/Render/JS stuff. Ok, some content might be loaded by JS, but my machine is not slow at all. i9-7900X @ 4.6GHz, 64GiB RAM, Samsung SSD 950 Pro. I'm going to upload some screenshots.


  11. 4 minutes ago, Markus Kinzler said:

    It depend also on the local machine.

    Sure, but both times were measured on the same machine. The German DP has an average load time of 350ms for me while the Englisch DP takes 1300 seconds to load. This is what the Chrome "Network" graph tells me (pure network load time, without any render/CPU related delays).

×