Jump to content

Leaderboard


Popular Content

Showing content with the highest reputation on 04/27/19 in all areas

  1. Earlier this week a long-time customer asked me why FastMM allocates large memory blocks from the top of the memory and if that option could safely be turned off. Their reasoning was that such allocations are much slower than normal ones. The question surprised me as I didn’t know of any such difference in speed so I did what I usually do–I wrote a benchmark application and measured it. TL;DR Yes, allocating from the top is slower. No, the difference is not big and in most cases you’ll not even notice it. There were, however, other interesting results that my simple benchmark pointed out. More on that in a moment, but first… Allocating from bottom and top In Windows, the code can ask for a memory block by calling VirtualAlloc with flag MEM_COMMIT and Windows will give you a suitable chunk of memory. This memory will usually be allocated from the start of the virtual memory visible to the program. The code can, however, call VirtualAlloc with flag MEM_COMMIT OR MEM_TOP_DOWN and Windows will return a block from the end of virtual memory available to the process. In a typical 32-bit Delphi program first such memory block will have address close to $7FF00000 (but a bit lower). You may want to allocate memory “from the top” if your program has two very distinct modes of allocating memory and you don’t want to mix them. For example, a frequently reallocated memory could come “from the bottom” and large blocks that are used for long periods of time “from the top”. This can reduce memory fragmentation, but the potential advantages are specific to each program. In other words – maybe it will help, maybe it will hurt. Another good scenario for MEM_TOP_BOTTOM is testing 64-bit code ported from 32-bits. For example, a typical “from the top” allocated block in a 64-bit program will have an address like this: $7FF4FDE30000. If your code at some point stores pointers into 4-byte integers, part of the address will be lost and as soon as that integer is converted back into a pointer and the code accesses that pointer, you’ll quite probably get an access violation. If a memory comes “from the bottom”, such problems would not be so easily detected. FastMM4 allocates large blocks (with sizes greater or equal to 258.5 KB) “from the top”. If I recall correctly, this was done to prevent memory fragmentation. Additionally, it can allocate all other block “from the top” if you define conditional symbol AlwaysAllocateTopDown and rebuild. (You have to use FastMM4 from github instead of the built-in Delphi version to use this functionality.) You can use this mode to test 32-bit programs ported to 64-bit code. MEM_TOP_DOWN is slower? The article my customer pointed to claimed that allocating from the top works much slower than allocating from the bottom. Even worse, the allocation algorithm was supposed to work in O(n^2) time so each additional allocation needs more time to execute. To top that off, the official documentation for the MEM_TOP_DOWN flag mentions: This can be slower than regular allocations, especially when there are many allocations. To verify that claim, I wrote a trivial benchmarking app (download it here). It allocates from 1 to 6000 blocks of size 264752 and measures the time needed. Block size 264752 was picked because at that size FastMM4 starts allocating memory “from the top”. 6000 blocks can safely be allocated in a 32-bit application (6000 * 264752 = 1.5 GB). In my tests I could allocate 6105 such blocks before I ran “out of memory” but just to be on the safe side I reduced the number in the released application. Results, measured on my fresh new notebook with a i7-8750 processor, were much closer to my expectations than to some O(n^2) algorithm. The “Top” algorithm is slightly slower (needs more time to execute) but the difference is not drastically large. What’s going on then? Is MEM_TOP_DOWN slow or not? As it turned out, the article I was referring to was written in 2011 and Windows have improved a lot since then. I don’t know which Windows version has fixed the “top allocation” problem, but it definitely doesn’t appear in Windows 7 and 10. Another interesting result is that the first 200 MB (approximately) are almost “free”. Somewhere around that number, the execution time jumps from around 3 ms to 50 ms and then continues to grow in more-or-less linear fashion. The benchmarking program measures each test only once and is therefore very susceptible to measurement errors but the result clearly shows an O(n) algorithm. Why are allocations smaller than 200 MB so fast? I’m guessing that Windows maps such amount of physical memory into the process’ virtual space when the process is started. When you exceed that limit, the allocator needs more time to allocate physical memory and map it into the process’ virtual space. That’s, however, just a guess. If you know better, please let me know in the comments. How fast are YOUR allocations? Just for the sake of completeness I rerun tests on my main PC (HP z820 with two E5 Xeons) and the results completely surprised me. The shape of the curve is almost the same–but notice the difference in speed! On the laptop, 4000 allocations execute in 250 ms. On the Xeon machine, over 1000 ms is needed for the same job. This machine is quite old (around 4 years IIRC) and it obviously contains a much slower memory. I know that computers can have faster or slower memory chips, but I never expected to see such a big difference in VirtualAllocspeed. (And yes, both machines are running latest Windows 10.) Now the whole shebang started to interest me even more, and I did some further tests on a few PCs used by fellow programmers. All of them were running Windows 10. As you can see below, there is some difference between them but none are so slow than my main computer 😞 Maybe the time has come to upgrade… If you want to download raw data and compare it to your own results, you can access it here. MEM_TOP_DOWN or not? The difference in speed is not that big–and most programs will not notice it–but I have to agree with the customer. The time has come to remove hard-coded MEM_TOP_DOWN from FastMM4 and replace it with a conditional {$IFDEF AllocateLargeBlocksTopDown}MEM_TOP_DOWN{$ENDIF}. I have created pull request for that change: https://github.com/pleriche/FastMM4/pull/75 (Original blog post: https://www.thedelphigeek.com/2019/04/fastmm4-large-memory.html)
  2. David Heffernan

    Forked VSCode for Delphi

    There's absolutely no reason why you should not do this.
  3. Remy Lebeau

    On-demand ARC feature discussed

    Many C++ compilers do offer that as an extension. C++Builder has 'try/__finally'. VC++ and CLang have '__try/__finally', etc. But proper use of RAII trumps the need for try/finally at all. But if you want, try/finally can be emulated in pure C++ using lambdas and pre-compiler macros (see this, for example).
  4. Stefan Glienke

    Forked VSCode for Delphi

    I guess you are a bit naive in your assumption of what it takes to get from something like VSCode to a fully featured IDE like RAD Studio or Visual Studio... Because otherwise why is there still that thing called Visual Studio that MS is putting significant resources into it if they could slap a few extensions onto VSCode?
  5. Uwe Raabe

    New VCL Style from DelphiStyles.com

    Indeed - and DelphiStyles has shown to be a competent and reliable partner for that. Highly recommended!
  6. dummzeuch

    GExperts Select Components Expert

    Does anybody know whether GExperts ever included a "Select Components" expert? It looks like this: or like this in expanded mode: I just found that there is source code for such an expert in source\experts\GX_SelectComonents but it has not been added to the project(s), so it does not show up in the experts list. It seems that it was originally written by Rossen Assenov who put in on CodeCentral with the following comment: The source was already part of the GExperts sources when they were moved to the SVN repository on SourceForge in 2007 but they were not part of the projects even then (so it wasn't me who accidentally deleted them). It also seems to work, even though it lacks the flexibility of its CnPack counterpart "Component Selector":
  7. Antonello

    MARS Linux apache module

    Great! now works fine!
  8. Angus Robertson

    ICS V8.61 announced

    ICS V8.61 has been released at: http://wiki.overbyte.eu/wiki/index.php/ICS_Download ICS is a free internet component library for Delphi 7, 2006 to 2010, XE to XE8, 10 Seattle, 10.1 Berlin, 10.2 Tokyo and 10.3 Rio, and C++Builder 2006 to XE3, 10.2 Tokyo and 10.3 Rio. ICS supports VCL and FMX, Win32, Win64 and MacOS targets. The distribution zip includes the latest OpenSSL 1.1.1 win32, with other versions of OpenSSL being available from the download page. Changes in ICS V8.61 include: 1 - Added two new components using the new HTTPS REST component, which are both useful and illustrate how simply they can created, TIcsSms and TDnsQueryHttps, both in the OverbyteIcsSslHttpRest.pas unit with demos in OverbyteIcsHttpRestTst. 2 - The new TIcsSms component sends SMS text messages via an HTTP bureau, you will need an account. Initially supporting https://www.kapow.co.uk/ from where you set-up an account for £6.50 (about $9) which gives 100 message credits. Other similar bureaus can be added, provided there is an account for testing. The component has three methods, SendSMS sends an SMS to a mobile number and returns an ID, CheckSMS checks if the SMS with a specific ID has been delivered, pending or failed and CheckCredit returns remaining credit for the account. Messages longer than 140 characters should be sent as multiple messages, if supported by the network. 3 - The new TDnsQueryHttps component makes DNS queries over HTTPS (DOH), to ensure integrity and privacy from interception by ISPs or proxies. It includes a list of public DOH servers from Cloudfare, Google, Quad9 and others, and will make all common DNS queries, including all which does the seven most common queries together. The original TDnsQuery component has also been updated to support all the common queries and return them in using a single AnswerRecord array, rather than an array per query type, but remains backward compatible for existing queries. It now also returns alternate responses. Supports IPv6. The OverbyteIcsNsLookup sample uses TDnsQuery while the OverbyteIcsHttpRestTst sample uses TDnsQueryHttps. The latter sample also illustrates DNS over HTTPS using Json as a REST demo. 4 - Improved HTTP client and server NTLM authentication by adding Single Sign On with NTLM Session on Windows Domain to get credentials without needing them specified in code. 5 - Improvements in the HTTPS REST component to prevent TSslHttpCli events being overwritten by TSslHttpRest events. ResponseXX properties are now available in both OnRequestDone and OnRestRequestDone event handler. IcsHtmlToStr returns javascript content as well as XML and Json and does not ignore very short content. 6 - Improvements in the HTTP client, added more header response properties: RespDateDT, RespLastModDT, RespExpires and RespCacheControl. NoCache now sends Cache-Control: no-cache for HTTP/1.1. 7 - Fixed SSL certificate ValidateCertChain to check certificate start and expiry dates in UTC time instead of local time. Previously certificates issued in North America with UTC/GMT time stamps may have been seen as not yet valid. 8 - The FTP client now accepts badly formatted FEAT PROT responses. 9 - The Browser Demo sample using HtmlViewer now correctly supports authentication methods where a site requires a login, and has an improved log window that no longer slows down display of complex pages. Angus
  9. Lars Fosdal

    New VCL Style from DelphiStyles.com

    We sure do, because this makes me cringe. Too much clutter. That said - I also hate the currently popular naked "flat UI" design philosophy which takes away most UI guidance and makes you guess if something is a link, a button, or just text - and you have to hover and/or click in the right places to find out.
×