Jump to content
Registration disabled at the moment Read more... ×
Sign in to follow this  
Maxidonkey

EdgeAudio: real-time audio pipeline for Delphi/VCL with WebView2

Recommended Posts

Hi all,

 

I have just uploaded the project EdgeAudio integrate mic capture, audio playback, a high‑pass filter, VAD, hysteresis, and a “Talkover” mode from Delphi, orchestrated through a bidirectional JS bridge inside TEdgeBrowser.

The architecture centers on TEdgeAudioControl and TAudioSettings, applied in real time on the WebAudio side, with clean VCL integration (virtual host, typed events).

 

Key points

  • Clear architecture: capture (TEdgeAudioCapture), playback (TEdgeAudioPlayer with VAD/Talkover), filtering (THighPassFilter), and a WebView2 bridge; TEdgeAudioControl exposes settings and ready‑to‑use events.
  • Extensible event engine: aggregates/routes JSON events (audio_play, audio_pause, audio_segment, etc.) via TEventEngineManager and IAudioEventHandler.
  • Capabilities: tunable VAD (threshold/silenceMs/timeslice), Talkover with cooldown/ratios to avoid “ping‑pong,” playback/streaming (play/pause/seek/stop), setSinkId, volume boost, built‑in notifications/animations, and optional auto‑blocking of capture during playback.

 

Quick start

  • Install the EdgeAudioDesign.dproj package to register TEdgeAudioControl in the Palette.
  • Two paths:

               (1) already have TEdgeBrowser → use the Edge.Audio unit;

               (2) drop TEdgeAudioControl. Copy the web/tools folders into your project and place WebView2Loader.dll (x86/x64) next to the executable.

 

           Sample projects (AudioEdgeTest1/2.zip) are provided; add “EDGEAUDIO\SOURCE” (and “OPENAI\SOURCE”) to your project search paths.

  • Dependencies: Delphi 12+, WebView2 Runtime, and ffmpeg if you need audio conversion (configurable ffmpegPath).

 

Learn more Diagrams, event flow, and extension points are detailed in the “Dev note – Architecture & Mechanics” sections and the deep‑dive in the repo.

 

 

Preview

 

Preview.gif

Edited by Maxidonkey
  • Like 4

Share this post


Link to post

Hi,

 

Thank you for sharing, though can't compile this project or test as don't have Delphi 12 and never used Edge, but seeing this in the readme 

Quote

Things You Should Know

  • Consider enabling autoBlockCaptureDuringPlayback to avoid echo while playing; tune Talkover cooldown/thresholds on the fly via JS commands.
  • WebView2 navigation uses a local “virtual host” to serve assets and avoid CORS.

Trigger me to ask and may be point you to a path you didn't know of, or you tried, in case you already tried or researched, then please share with us your result, i myself very interested in your findings.

 

1) Edge does support WebRTC, WebRTC has Acoustic Echo Cancellation (AEC), and it does work fine, removing the the need to block capture when playing, though switching media from EdgeAudio to WebRTC might not be a small adjustment and not by any mean a trivial task, yet small part is feasible, like Audio Capture and Playback, what is your experience on that?  have you tried it ? in case of yes then why ditched it ?

What issues did you face with WebRTC audio capture and play?

 

2) CORS is pain in the back, that we know, but what about injecting/loading the app directly without the need to navigate after the navigation to empty page, or... there is other means like there is "NavigateToString" https://learn.microsoft.com/en-us/microsoft-edge/webview2/reference/win32/icorewebview2?view=webview2-1.0.2210.55#navigatetostring

At these lines https://github.com/MaxiDonkey/EdgeAudio/blob/main/source/Edge.Audio.pas#L596-L602 i see virtualhostfolder is set yet it followed by Navigate, i expected to be followed by NavigateToString

This one allow to load the content from memory removing the virtual host need,

have you tried it ? ( i mean feed all the content from memory, even if they are files on disk) 

can JSBridge (the really nice and impressive bridge you made) be used with it ?

will it simplify the structure in whole?

in case of it didn't work then please share with us the "Why?" (your finding about feeding the data/content from memory)

What issues did you face ?

Share this post


Link to post

Hi, thanks again for your insightful feedback and questions!

 

Delphi Version:

  • All my GitHub projects (including EdgeAudio) are developed and tested with Delphi Community Edition (CE)  (currently 12.1) , which is freely available. So you don’t need Delphi 12 Pro/Enterprise to try it.

 

Why this technical choice / Why VCL?

  • My main goal was to learn WebView2 and Edge. VCL was the only practical option, since Embarcadero hasn’t provided an FMX wrapper for WebView2 yet.
     

WebRTC & AEC:

  • I haven’t integrated WebRTC/AEC yet, but it’s next on my roadmap, especially as I plan to experiment with OpenAI’s realtime API (https://platform.openai.com/docs/api-reference/realtime).
    Your questions are very relevant. I’ll report back once I explore those aspects.

 

JSBridge compatibility & asset loading:

  • Yes, EdgeAudio’s JSBridge is designed to control WebAudio in WebView2 via ExecuteScript and to receive JSON events via OnWebMessageReceived. That’s its native mode.
    For the UI, EdgeAudio expects an index HTML and all assets from a local WebPath. The recommended approach is to use NavigateToIndex, mapping a virtual host for a secure context and proper CORS handling.
    Using NavigateToString (i.e., injecting everything from memory) is technically possible but not aligned with the current architecture. Without the virtual host, you lose the “secure context” and CORS protection; and if assets aren’t served via WebPath, the audio UI doesn’t function as intended.

 

Many of your questions are the same ones I’ll be tackling soon as I move forward with EdgeAudio. Your feedback is a great help, and I’ll be sure to share findings and updates as I continue development!

 

Thanks again!

  • Like 1

Share this post


Link to post

 

3 hours ago, Maxidonkey said:

For the UI, EdgeAudio expects an index HTML and all assets from a local WebPath. The recommended approach is to use NavigateToIndex, mapping a virtual host for a secure context and proper CORS handling.
Using NavigateToString (i.e., injecting everything from memory) is technically possible but not aligned with the current architecture. Without the virtual host, you lose the “secure context” and CORS protection; and if assets aren’t served via WebPath, the audio UI doesn’t function as intended.

I see it now, NavigateToString doesn't have an origin, hence doesn't have SecurityContext, and no SecurityContext means no media access as these are protected.

 

Thank you and good luck !

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×