Jump to content

Anders Melander

Members
  • Content Count

    1147
  • Joined

  • Last visited

  • Days Won

    51

Everything posted by Anders Melander

  1. Anders Melander

    64bit Debugger Not Handling Memory Problems

    http://docwiki.embarcadero.com/RADStudio/Sydney/en/Using_CodeGuard
  2. Anders Melander

    Delphi 10.4.2 Professional

    The feature nobody ever asked for strikes again. I wish they would realize their mistake and make it opt-in instead of opt-out-if-you-know-how-to. Think of all the people that doesn't even know about this and just assume it's because Delphi now suck balls - learn to live with it.
  3. Anders Melander

    MAP2PDB - Profiling with VTune

    As it seems that the most recent version of VTune also suffer from the performance problem I mentioned earlier, I 've now added a note about the problem to the repository readme and uploaded the files that can be used to fix it. https://bitbucket.org/anders_melander/map2pdb/src/master/#markdown-header-performance-problems-with-intel-vtune
  4. Anders Melander

    MAP2PDB - Profiling with VTune

    It took me a bit longer than expected to get here but I believe I've finally reached the goal. The following shows VTune profiling a Delphi application, with symbol, line number and source code resolution: Download Get the source here: https://bitbucket.org/anders_melander/map2pdb/ And a precompiled exe here: https://bitbucket.org/anders_melander/map2pdb/downloads/ The source has only been tested with Delphi 10.3 - uses inline vars so it will not compile with older versions. Usage map2pdb - Copyright (c) 2021 Anders Melander Version 2.0 Parses the map file produced by Delphi and writes a PDB file. Usage: map2pdb [options] <map-filename> Options: -v Verbose output -pdb[:<output-filename>] Writes a PDB (default) -yaml[:<output-filename>] Writes an YAML file that can be used with llvm-pdbutil -bind[:<exe-filename>] Patches a Delphi compiled exe file to include a reference to the pdb file -test Works on test data. Ignores the input file Example: Configure your project linker options to output a Detailed map file. Compile the project. Execute map2pdb <map-filename> -bind Profile the application with VTune (or whatever) Known issues The -bind switch must occur after the filename contrary to the usage instructions. PDB files larger than 16Mb are not valid. This is currently by design. 64-bit PE files are not yet supported by the -bind option. As should be evident I decided not to go the DWARF route after all. After using a few days to read the DWARF specification and examine the FPC source I decided that it would be easier to leverage the PDB knowledge I had already acquired. Not that this has been easy. Even though I've been able to use the LLVM PDB implementation and Microsoft's PDB source as a reference LLVM's implementation is incomplete and buggy and the LLVM source is "modern C++" which means that it's close to unreadable in places. Microsoft's source, while written in clean C and guaranteed to be correct, doesn't compile and is poorly commented. Luckily it was nothing a few all-nighters with a disassembler and a hex editor couldn't solve. Enjoy!
  5. Anders Melander

    MAP2PDB - Profiling with VTune

    Yes. I'm on Windows 7 too.
  6. Anders Melander

    MAP2PDB - Profiling with VTune

    Sure. Send me the map file by PM.
  7. Anders Melander

    Delphi 10.4.2 Professional

    Not really. Many of them are cheaper than a quality Cherry MX keyboard. And in case of zombie apocalypse you are well prepared. https://i.imgur.com/g2Yo3.gif (no inline GIFs...? 😕)
  8. Anders Melander

    Delphi 10.4.2 Professional

    AFAIK Unicomp bought the rights to the key design but unfortunately they missed the fact that it's just as much the quality that made this keyboard the legend it is. From what I've read the money are better spent on a used original IBM (or Lexmark) model M. Nice!
  9. Anders Melander

    Delphi 10.4.2 Professional

    Well, according to the label under my keyboard it's from 1995 so it's seen some stuff. I've got a spare that's even older. You can still get a model M in good (or refurbished) condition on eBay. The main problem will be the mainboard keyboard connector. I have one keyboard with the "new" 6 pin mini-DIN PS/2 connector (the one I use) and one with the old 5 pin DIN connector. The cable is detachable so the cable with the PS/2 connector fits both keyboards. Anyway, even if your mainboard has a PS/2 connector it might not be able to supply enough power through it. The model M needs a bit of power when it "boots" so sometimes I have to reboot the PC a few times before there's life in the keyboard. Usually not a problem as I always go to standby. Also be aware that most USB-PS/2 adapters doesn't supply enough power. I also have a backup keyboard with Cherry MX Blue (tactile, clicky) switches but it's just not the same.
  10. Anders Melander

    Delphi 10.4.2 Professional

    @Dany Marmur My own desktop system is built around a 9 year old Asus mainboard in a 16 years old Lian Li PC-V2100B Plus II case. AFAIR I upgraded the CPU 5 years ago. It was built from parts, so fairly cheap, and it works fine. I've never had much success (performance) with the prebuilt systems my various employers have forced on me. No matter how much money they used on it. My laptop though is a Lenovo Thinkpad X1 Extreme but I seldom use it. I need a mans keyboard 🙂 Wow. Looking at the picture I just noticed the gunk between the keys 🤮. I usually don't look at the keyboard.
  11. Anders Melander

    Delphi 10.4.2 Professional

    How does a transparent case and fans with LED lighting help with that?
  12. Anders Melander

    Delphi 10.4.2 Professional

    Have you disabled LiveBindings?
  13. Anders Melander

    MAP2PDB - Profiling with VTune

    It turned out that the culprit was the version of msdia140.dll that came bundled with the version of VTune I'm using. There's a bug in it that causes exponential slowdown on large pdb's. Replacing the dll with a new version fixed the problem. The symbol resolve time of my test project fell from hours/days to ~10 minutes. The old msdia140.dll was version 14.10.25017.0, the new is 14.28.29913.0. Any version from VS2019 or later should do AFAIK. A side effect of trying to solve this performance problem was that I added segment/section filters. You can now specify what segments to include/exclude from the pdb. For example since almost all code is in segment 0001 you can exclude all modules and symbols that reside in other segments. This can cause a significant reduction in the size of the pdb. Try this: map2pdb -v -include:0001 foobar.map or try with the -debug switch to get all the details. I'm considering just adding this 0001 filter as a default. I've uploaded a new version (2.6) with all the latest changes (there aren't that many): https://bitbucket.org/anders_melander/map2pdb/downloads/ Also the repository finally has a readme.md
  14. Anders Melander

    MAP2PDB - Profiling with VTune

    Works for me so there was probably something wrong with the pdb at that time. I've tried both with a small and a very large application. On the positive side uProf resolved a lot faster than VTune but I'm a bit surprised about how basic the uProf feature set is and I can't really imagine what I would use it for. Also, it has pie charts... WTF?
  15. Anders Melander

    Delphi 5 Printing

    TPrinter/TCanvas is GDI printing. It's likely that there are bugs in TPrinter in Delphi 5 that has since been fixed. I seem to recall that there were quite a lot of them. Buffer overflows and whatnot. What has happened is probably that your application has been using GDI in a way that was invalid but was worked around by Windows and now they've stopped working around it.
  16. Anders Melander

    Is a "bare-minimum" EurekaLog possible?

    One thing to be aware of with EurekaLog is that it, in my experience, makes the link stage unbearable slow for large projects. This alone has made me replace it with madExcept in a few projects. I have my small grievances with madExcept too though. In particular the fact that it pumps the message queue, for no good reason, when processing silent exceptions.
  17. Anders Melander

    Is a "bare-minimum" EurekaLog possible?

    I agree. I believe it's been discussed with Mathias several times but for some reason he's not seen the light.
  18. Anders Melander

    Determining why Delphi App Hangs

    You're in good company; We've all been there
  19. Unfortunately inline vars are not always equivalent to "with". I'm currently working on a project that has, um.. let's be polite and say, "liberal" use of record arrays. So for example: type TFoo = record // Lots of stuff here end; TBar = record Foo: array of TFoo; // Even more stuff end; TFooBar = array of TBar; var FooBar: TFooBar; begin with FooBar[i].Foo[j] do WhatEver := 42; var Foo := FooBar[i].Foo[j]; Foo.WhatEver := 42; // Nope. end; Using an inline var to access an inner record will create a copy of the record while using "with" will use a reference to the record. Only way around that is to use record pointers but the code is horrible enough as it is.
  20. It's the size in dwords (i.e. 32-bit RGBA). AFAIR it should work with the 64-bit compiler too.
  21. Yes, it's relatively costly to create a thread but if you use a thread pool then the threads will only have to be created once. I don't think I follow you. I can't see why the intermediate buffer would need to be a bitmap; It's just a chunk of memory. Also the transpose if faster than you'd think. After all it's much faster to do two row-by-row passes and two transpositions, than one row-by-row pass and one column-by-column pass. One might then think that it would be smart to do the transposition in place while doing the row-by-row pass, after all you already have the value that needs to be transposed, but that isn't so as writing at the transposed location will flush the cache. Anyway, here's the aptly named SuperDuperTranspose32 (I also have a FastTranspose (MMX) and a SuperTranspose ). I've been using it in an IIR gaussian blur filter. Zuuuuper fast. // MatrixTranspose by AW // http://masm32.com/board/index.php?topic=6140.msg65145#msg65145 // 4x4 matrix transpose by Siekmanski // http://masm32.com/board/index.php?topic=6127.msg65026#msg65026 // Ported to Delphi by Anders Melander procedure SuperDuperTranspose32(Src, Dst: Pointer; W, Height: cardinal); register; type dword = cardinal; // Parameters: // EAX <- Source // EDX <- Destination // ECX <- Width // Stack[0] <- Height // Preserves: EDI, ESI, EBX var Source, Destination: Pointer; Width: dword; X4x4Required: dword; Y4x4Required: dword; remainderX: dword; remainderY: dword; destRowSize: dword; sourceRowSize: dword; savedDest: dword; asm push edi push esi push ebx mov Destination, Dst mov Source, Src mov Width, W // How many cols % 4? mov eax, Width mov ebx, 4 mov edx, 0 div ebx mov X4x4Required, eax mov remainderX, edx // How many rows %4? mov eax, Height mov ebx, 4 mov edx, 0 div ebx mov Y4x4Required, eax mov remainderY, edx mov eax, Height shl eax, 2 mov destRowSize, eax mov eax, Width shl eax, 2 mov sourceRowSize, eax mov ebx, 0 @@loop1outer: cmp ebx, Y4x4Required // while ebx<Y4x4Required // Height % 4 jae @@loop1outer_exit // find starting point for source mov eax, ebx mul sourceRowSize shl eax, 2 mov esi, Source add esi, eax mov ecx, esi // save // find starting point for destination mov eax, ebx shl eax, 4 mov edi, Destination add edi, eax mov savedDest, edi // save push ebx mov ebx,0 @@loop1inner: cmp ebx, X4x4Required// while ebx<X4x4Required jae @@loop1inner_exit mov eax, ebx shl eax, 4 mov esi, ecx add esi, eax movups xmm0, [esi] add esi, sourceRowSize movups xmm1, [esi] add esi, sourceRowSize movups xmm2, [esi] add esi, sourceRowSize movups xmm3, [esi] movaps xmm4,xmm0 movaps xmm5,xmm2 unpcklps xmm4,xmm1 unpcklps xmm5,xmm3 unpckhps xmm0,xmm1 unpckhps xmm2,xmm3 movaps xmm1,xmm4 movaps xmm6,xmm0 movlhps xmm4,xmm5 movlhps xmm6,xmm2 movhlps xmm5,xmm1 movhlps xmm2,xmm0 mov eax, destRowSize shl eax, 2 mul ebx mov edi, savedDest add edi, eax movups [edi], xmm4 add edi, destRowSize movups [edi], xmm5 add edi, destRowSize movups [edi], xmm6 add edi, destRowSize movups [edi], xmm2 inc ebx jmp @@loop1inner @@loop1inner_exit: pop ebx inc ebx jmp @@loop1outer @@loop1outer_exit: // deal with Height not multiple of 4 cmp remainderX, 1 // .if remainderX >=1 jb @@no_extra_x mov eax, X4x4Required shl eax, 4 mov esi, Source add esi, eax mov eax, X4x4Required shl eax, 2 mul destRowSize mov edi, Destination add edi, eax mov edx, 0 @@extra_x: cmp edx, remainderX // while edx < remainderX jae @@extra_x_exit mov ecx, 0 mov eax, 0 @@extra_x_y: cmp ecx, Height // while ecx < Height jae @@extra_x_y_exit mov ebx, dword ptr [esi+eax] mov dword ptr [edi+4*ecx], ebx add eax, sourceRowSize inc ecx jmp @@extra_x_y @@extra_x_y_exit: add esi, 4 add edi, destRowSize inc edx jmp @@extra_x @@extra_x_exit: @@no_extra_x: // deal with columns not multiple of 4 cmp remainderY, 1 // if remainderY >=1 jb @@no_extra_y mov eax, Y4x4Required shl eax, 2 mul sourceRowSize mov esi, Source add esi, eax mov eax, Y4x4Required shl eax, 4 mov edi, Destination add edi, eax mov edx,0 @@extra_y: cmp edx, remainderY // while edx < remainderY jae @@extra_y_exit mov ecx, 0 mov eax, 0 @@extra_y_x: cmp ecx, Width // while ecx < Width jae @@extra_y_x_exit mov ebx, dword ptr [esi+4*ecx] mov dword ptr [edi+eax], ebx add eax, destRowSize inc ecx jmp @@extra_y_x @@extra_y_x_exit: add esi, sourceRowSize add edi, 4 inc edx jmp @@extra_y @@extra_y_exit: @@no_extra_y: pop ebx pop esi pop edi end;
  22. While this isn't related to your threading problem, it seems you are processing the bitmap by column instead of by row. This is very bad for performance since each row of each column will start with a cache miss. I think you will find that if you process all rows, transpose (so columns becomes rows), process all rows, transpose again (rows back to columns), the performance will be significantly better. I have a fast 32-bit (i.e. RGBA) blocked transpose if you need one. Another thing to be aware of when multiple threads read or write to the same memory is that if two threads read and write to two different locations, but those two locations are within the same cache line, then you will generally get a decrease in performance as the cores fight over the cache line.
  23. Instead of just posting your source and let us figure out what you're doing, it would be nice if you instead described exactly what your doing. I.e. what does the overall job do (e.g. it resamples a bitmap), how does it do that (describe the algorithm), how are you dividing the job, what does your individual tasks do, etc. Describe it as if we didn't have the source. This is basically also what your source comments should do.
  24. Anders Melander

    MAP2PDB - Profiling with VTune

    New version (2.5) uploaded. Changes since last upload: Include/exclude modules/units from pdb. This helps keep the size of the pdb down and thus reduces the symbol resolve time in VTune. You no longer need to link your projects with debug info. map2pdb will reuse the existing debug section in the exe/dll/bpl if there is one. Otherwise it will create a new one. https://bitbucket.org/anders_melander/map2pdb/downloads/ What's next: Refactoring of the logging code. The current logging is basically just some functions that calls WriteLn. This should be replaced with a pluggable log framework so the whole logging mechanism can be replaced. The end goal is to enable integration of the map2pdb core into other projects. A jdbg reader. Embarcadero does not supply map files for the RTL/VCL rune time packages. Instead they ship jdbg files that can be read with the JEDI debug functions. The jdbg are built from map files so supposedly they contains much, if not all, of the information we need. The task here is to write a reader for the jdbg file format so we can produce pdb files from them. Figure out why VTune is so slow. A never ending task it seems.
×