Jump to content

Leaderboard


Popular Content

Showing content with the highest reputation on 04/12/21 in all areas

  1. While this isn't related to your threading problem, it seems you are processing the bitmap by column instead of by row. This is very bad for performance since each row of each column will start with a cache miss. I think you will find that if you process all rows, transpose (so columns becomes rows), process all rows, transpose again (rows back to columns), the performance will be significantly better. I have a fast 32-bit (i.e. RGBA) blocked transpose if you need one. Another thing to be aware of when multiple threads read or write to the same memory is that if two threads read and write to two different locations, but those two locations are within the same cache line, then you will generally get a decrease in performance as the cores fight over the cache line.
  2. OK, long timeout was a bit confusing and I could not determine how long each operation should last. As far as I can see there are no obvious reasons why your approach fails with timeout. It is a bit longish code, so I am not excluding possibility of not catching error in code and possibility of some memory corruption, but that would commonly show much sooner. Using TTask and reusing threads through its thread pool is definitely faster and more correct approach comparing to creating anonymous threads. Anonymous threads was "fallback" suggestion in case your overall time spent in calculation is long enough so that creating thread overhead pays off. If the TTask is really the issue here (which is also hard to say) another option would be trying some other library. One thing you probably can confirm, by using anonymous threads is that TTask is the culprit here (even though you can never say that with 100% certainty).
  3. Yes, it's relatively costly to create a thread but if you use a thread pool then the threads will only have to be created once. I don't think I follow you. I can't see why the intermediate buffer would need to be a bitmap; It's just a chunk of memory. Also the transpose if faster than you'd think. After all it's much faster to do two row-by-row passes and two transpositions, than one row-by-row pass and one column-by-column pass. One might then think that it would be smart to do the transposition in place while doing the row-by-row pass, after all you already have the value that needs to be transposed, but that isn't so as writing at the transposed location will flush the cache. Anyway, here's the aptly named SuperDuperTranspose32 (I also have a FastTranspose (MMX) and a SuperTranspose ). I've been using it in an IIR gaussian blur filter. Zuuuuper fast. // MatrixTranspose by AW // http://masm32.com/board/index.php?topic=6140.msg65145#msg65145 // 4x4 matrix transpose by Siekmanski // http://masm32.com/board/index.php?topic=6127.msg65026#msg65026 // Ported to Delphi by Anders Melander procedure SuperDuperTranspose32(Src, Dst: Pointer; W, Height: cardinal); register; type dword = cardinal; // Parameters: // EAX <- Source // EDX <- Destination // ECX <- Width // Stack[0] <- Height // Preserves: EDI, ESI, EBX var Source, Destination: Pointer; Width: dword; X4x4Required: dword; Y4x4Required: dword; remainderX: dword; remainderY: dword; destRowSize: dword; sourceRowSize: dword; savedDest: dword; asm push edi push esi push ebx mov Destination, Dst mov Source, Src mov Width, W // How many cols % 4? mov eax, Width mov ebx, 4 mov edx, 0 div ebx mov X4x4Required, eax mov remainderX, edx // How many rows %4? mov eax, Height mov ebx, 4 mov edx, 0 div ebx mov Y4x4Required, eax mov remainderY, edx mov eax, Height shl eax, 2 mov destRowSize, eax mov eax, Width shl eax, 2 mov sourceRowSize, eax mov ebx, 0 @@loop1outer: cmp ebx, Y4x4Required // while ebx<Y4x4Required // Height % 4 jae @@loop1outer_exit // find starting point for source mov eax, ebx mul sourceRowSize shl eax, 2 mov esi, Source add esi, eax mov ecx, esi // save // find starting point for destination mov eax, ebx shl eax, 4 mov edi, Destination add edi, eax mov savedDest, edi // save push ebx mov ebx,0 @@loop1inner: cmp ebx, X4x4Required// while ebx<X4x4Required jae @@loop1inner_exit mov eax, ebx shl eax, 4 mov esi, ecx add esi, eax movups xmm0, [esi] add esi, sourceRowSize movups xmm1, [esi] add esi, sourceRowSize movups xmm2, [esi] add esi, sourceRowSize movups xmm3, [esi] movaps xmm4,xmm0 movaps xmm5,xmm2 unpcklps xmm4,xmm1 unpcklps xmm5,xmm3 unpckhps xmm0,xmm1 unpckhps xmm2,xmm3 movaps xmm1,xmm4 movaps xmm6,xmm0 movlhps xmm4,xmm5 movlhps xmm6,xmm2 movhlps xmm5,xmm1 movhlps xmm2,xmm0 mov eax, destRowSize shl eax, 2 mul ebx mov edi, savedDest add edi, eax movups [edi], xmm4 add edi, destRowSize movups [edi], xmm5 add edi, destRowSize movups [edi], xmm6 add edi, destRowSize movups [edi], xmm2 inc ebx jmp @@loop1inner @@loop1inner_exit: pop ebx inc ebx jmp @@loop1outer @@loop1outer_exit: // deal with Height not multiple of 4 cmp remainderX, 1 // .if remainderX >=1 jb @@no_extra_x mov eax, X4x4Required shl eax, 4 mov esi, Source add esi, eax mov eax, X4x4Required shl eax, 2 mul destRowSize mov edi, Destination add edi, eax mov edx, 0 @@extra_x: cmp edx, remainderX // while edx < remainderX jae @@extra_x_exit mov ecx, 0 mov eax, 0 @@extra_x_y: cmp ecx, Height // while ecx < Height jae @@extra_x_y_exit mov ebx, dword ptr [esi+eax] mov dword ptr [edi+4*ecx], ebx add eax, sourceRowSize inc ecx jmp @@extra_x_y @@extra_x_y_exit: add esi, 4 add edi, destRowSize inc edx jmp @@extra_x @@extra_x_exit: @@no_extra_x: // deal with columns not multiple of 4 cmp remainderY, 1 // if remainderY >=1 jb @@no_extra_y mov eax, Y4x4Required shl eax, 2 mul sourceRowSize mov esi, Source add esi, eax mov eax, Y4x4Required shl eax, 4 mov edi, Destination add edi, eax mov edx,0 @@extra_y: cmp edx, remainderY // while edx < remainderY jae @@extra_y_exit mov ecx, 0 mov eax, 0 @@extra_y_x: cmp ecx, Width // while ecx < Width jae @@extra_y_x_exit mov ebx, dword ptr [esi+4*ecx] mov dword ptr [edi+eax], ebx add eax, destRowSize inc ecx jmp @@extra_y_x @@extra_y_x_exit: add esi, sourceRowSize add edi, 4 inc edx jmp @@extra_y @@extra_y_exit: @@no_extra_y: pop ebx pop esi pop edi end;
  4. Bill Meyer

    Delphi IDE on AMD cpu?

    I am very happy with my AMD in a system where it is not unusual to have multiple VMs running.
  5. Lajos Juhász

    Timer game delphi 7

    You're here calculating the total seconds and minutes. Try: seconds:=t div 1000; minutes:=seconds div 60; seconds:=seconds-(minutes*60);
  6. I got a bug report for GExperts and Delphi 10.4 that’s really curious: When a secondary editor window is open in the IDE and the FMX form designer is active, trying to insert a component from the clipboard into the form inserts the textual description of that component into the editor windows instead. I could immediately reproduce this but finding the culprit took quite a bit longer. (Read on in the blog post) TLDR: The workaround is to disable the "Goto Previous / Next Modification" editor experts.
  7. It seems that disabling two editor experts (and restarting the IDE) fixes this problem: Goto Previous Modification Goto Next Modification No idea yet, what causes it. Could you please confirm this @Diego Simonini ? I guess it's the way these experts add themselves to the editor popup menu. This probably makes Ctrl+V always call the editor popup menu's Paste entry (or the associated action) even if the editor window does not have the focus.
  8. Anders Melander

    MAP2PDB - Profiling with VTune

    New version (2.5) uploaded. Changes since last upload: Include/exclude modules/units from pdb. This helps keep the size of the pdb down and thus reduces the symbol resolve time in VTune. You no longer need to link your projects with debug info. map2pdb will reuse the existing debug section in the exe/dll/bpl if there is one. Otherwise it will create a new one. https://bitbucket.org/anders_melander/map2pdb/downloads/ What's next: Refactoring of the logging code. The current logging is basically just some functions that calls WriteLn. This should be replaced with a pluggable log framework so the whole logging mechanism can be replaced. The end goal is to enable integration of the map2pdb core into other projects. A jdbg reader. Embarcadero does not supply map files for the RTL/VCL rune time packages. Instead they ship jdbg files that can be read with the JEDI debug functions. The jdbg are built from map files so supposedly they contains much, if not all, of the information we need. The task here is to write a reader for the jdbg file format so we can produce pdb files from them. Figure out why VTune is so slow. A never ending task it seems.
  9. Remy Lebeau

    Where I can find the SSL DLLs for Indy?

    What does Indy's WhichFailedToLoad() function report after the error occurs? Are you using 1.0.2u for BOTH DLLs? Do you have other versions of OpenSSL on your system? Try using SysInternals Process Monitor to make sure your app is actually attempting to load the correct DLLs you are expecting, and not some other DLLs. It should work, yes.
  10. Hi, I have several TEdit in an Android App developed with Delphi Rio. With a virtual keyboard that covers almost 50% of the display area, several TEdit are hidden under the virtual keyboard when you want to enter data. Does anyone know of an approach to avoid this behaviour? Thank you in advance
  11. sgcWebSockets is a complete package providing access to HTML5 WebSockets API (WebSocket is a web technology providing for bi-directional, full-duplex communications channels, over a single Transmission Control Protocol (TCP) socket) allowing to create WebSocket Servers, and WebSocket clients in VCL, Lazarus and Firemonkey Applications. What's new latest versions - New Telegram API Component for Windows, Android, OSX and Linux. - Improved MQTT client component: support for 3.1.1 and 5.0. - Improved Indy Server + IOCP. - Improved Binance and Kraken APIs, now support full WebSockets and REST Protocols. - Fixed some bugs using OpenSSL 1.1.1 and TLS 1.3 - Several improvements about performance and stability.  Main Features: - WebSocket and HTTP Support: sgcWebSockets includes client and server-side implementations of the WebSocket protocol (RFC 6455). HTTP/s is also full supported. Support for plain TCP is also included. - SSL/TLS for Security: Your messages are secure using our SSL/TLS implementation. Widest compatibility via support for modern TLS 1.3, TLS 1.2, TLS 1.1 and TLS 1.0 - Protocols and APIs: Several protocols are supported: MQTT (3.1.1 and 5.0), STOMP, WEBRTC, SIGNALR CORE, WAMP... Built-in protocols support Transactions, Datasets, QoS, big file transfers and more. APIs supported for third-parties like Pusher, Bitfinex, Huobi, CEX... - Cross-platform: Share your code using our WebSockets library for your Delphi VCL, Firemonkey, Intraweb, Javascript and C# projects. Includes Server, Clients and several protocols for building and connecting to WebSocket applications. - High Performance WebSocket Server based on Microsoft HTTP Framework and IOCP. Trial Version: https://www.esegece.com/websockets/download Compiled Demos: http://www.esegece.com/download/sgcWebSockets_bin.zip More Info: http://www.esegece.com/websockets
  12. esegece

    ANN: sgcWebSockets 4.4.1

    Hi, All other components support iOS, as Android, Windows, OSX, Linux, Lazarus... the only one which doesn't support is Telegram client component. Thanks for letting me know. Kind regards, Sergio
  13. esegece

    sgcWebSockets 4.3.2

    sgcWebSockets is a complete package providing access to HTML5 WebSockets API (WebSocket is a web technology providing for bi-directional, full-duplex communications channels, over a single Transmission Control Protocol (TCP) socket) allowing to create WebSocket Servers, and WebSocket clients in VCL, Lazarus and Firemonkey Applications. What's new 4.3.2 - Added support for Android 64bits in Rad Studio 10.3.3 Rio. - Added support for OpenSSL 1.1.1 for Indy based components. *Requires custom Indy version (Beta) (Trial doesn't includes this version). - Added Support for ALPN (Application-Layer Protocol Negotiation) for Server and Client components based on Indy. *Requires custom Indy version (Beta) (Trial doesn't includes this version) - Some improvements about performance and stability.  Main Features: - WebSocket and HTTP Support: sgcWebSockets includes client and server-side implementations of the WebSocket protocol (RFC 6455). HTTP/s is also full supported. Support for plain TCP is also included. - SSL/TLS for Security: Your messages are secure using our SSL/TLS implementation. Widest compatibility via support for modern TLS 1.3, TLS 1.2, TLS 1.1 and TLS 1.0 - Protocols and APIs: Several protocols are supported: MQTT (3.1.1 and 5.0), STOMP, WEBRTC, SIGNALR CORE, WAMP... Built-in protocols support Transactions, Datasets, QoS, big file transfers and more. APIs supported for third-parties like Pusher, Bitfinex, Huobi, CEX... - Cross-platform: Share your code using our WebSockets library for your Delphi VCL, Firemonkey, Intraweb, Javascript and C# projects. Includes Server, Clients and several protocols for building and connecting to WebSocket applications. - High Performance WebSocket Server based on Microsoft HTTP Framework and IOCP. Trial Version: http://www.esegece.com/download/sgcWebSockets.zip Compiled Demos: http://www.esegece.com/download/sgcWebSockets_bin.zip Demo Chat has been updated to show how works OpenSSL 1.1 (Server and Client component) More Info: http://www.esegece.com/websockets
×