EdgeAudio: real-time audio pipeline for Delphi/VCL with WebView2

Maxidonkey · 2025-08-25T01:31:58Z

Hi all,

I have just uploaded the project EdgeAudio integrate mic capture, audio playback, a high‑pass filter, VAD, hysteresis, and a “Talkover” mode from Delphi, orchestrated through a bidirectional JS bridge inside TEdgeBrowser.

The architecture centers on TEdgeAudioControl and TAudioSettings, applied in real time on the WebAudio side, with clean VCL integration (virtual host, typed events).

Key points

Clear architecture: capture (TEdgeAudioCapture), playback (TEdgeAudioPlayer with VAD/Talkover), filtering (THighPassFilter), and a WebView2 bridge; TEdgeAudioControl exposes settings and ready‑to‑use events.
Extensible event engine: aggregates/routes JSON events (audio_play, audio_pause, audio_segment, etc.) via TEventEngineManager and IAudioEventHandler.
Capabilities: tunable VAD (threshold/silenceMs/timeslice), Talkover with cooldown/ratios to avoid “ping‑pong,” playback/streaming (play/pause/seek/stop), setSinkId, volume boost, built‑in notifications/animations, and optional auto‑blocking of capture during playback.

Quick start

Install the EdgeAudioDesign.dproj package to register TEdgeAudioControl in the Palette.
Two paths:

(1) already have TEdgeBrowser → use the Edge.Audio unit;

(2) drop TEdgeAudioControl. Copy the web/tools folders into your project and place WebView2Loader.dll (x86/x64) next to the executable.

Sample projects (AudioEdgeTest1/2.zip) are provided; add “EDGEAUDIO\SOURCE” (and “OPENAI\SOURCE”) to your project search paths.

Dependencies: Delphi 12+, WebView2 Runtime, and ffmpeg if you need audio conversion (configurable ffmpegPath).

Learn more Diagrams, event flow, and extension points are detailed in the “Dev note – Architecture & Mechanics” sections and the deep‑dive in the repo.

Preview

Edited 19 hours ago by Maxidonkey

Kas Ob. · 2025-08-25T08:25:46Z

Hi,

Thank you for sharing, though can't compile this project or test as don't have Delphi 12 and never used Edge, but seeing this in the readme

Quote

Things You Should Know

Consider enabling autoBlockCaptureDuringPlayback to avoid echo while playing; tune Talkover cooldown/thresholds on the fly via JS commands.

WebView2 navigation uses a local “virtual host” to serve assets and avoid CORS.

Trigger me to ask and may be point you to a path you didn't know of, or you tried, in case you already tried or researched, then please share with us your result, i myself very interested in your findings.

1) Edge does support WebRTC, WebRTC has Acoustic Echo Cancellation (AEC), and it does work fine, removing the the need to block capture when playing, though switching media from EdgeAudio to WebRTC might not be a small adjustment and not by any mean a trivial task, yet small part is feasible, like Audio Capture and Playback, what is your experience on that? have you tried it ? in case of yes then why ditched it ?

What issues did you face with WebRTC audio capture and play?

2) CORS is pain in the back, that we know, but what about injecting/loading the app directly without the need to navigate after the navigation to empty page, or... there is other means like there is "NavigateToString" https://learn.microsoft.com/en-us/microsoft-edge/webview2/reference/win32/icorewebview2?view=webview2-1.0.2210.55#navigatetostring

At these lines https://github.com/MaxiDonkey/EdgeAudio/blob/main/source/Edge.Audio.pas#L596-L602 i see virtualhostfolder is set yet it followed by Navigate, i expected to be followed by NavigateToString

This one allow to load the content from memory removing the virtual host need,

have you tried it ? ( i mean feed all the content from memory, even if they are files on disk)

can JSBridge (the really nice and impressive bridge you made) be used with it ?

will it simplify the structure in whole?

in case of it didn't work then please share with us the "Why?" (your finding about feeding the data/content from memory)

What issues did you face ?

Maxidonkey · 2025-08-25T11:34:28Z

Hi, thanks again for your insightful feedback and questions!

Delphi Version:

All my GitHub projects (including EdgeAudio) are developed and tested with Delphi Community Edition (CE) (currently 12.1) , which is freely available. So you don’t need Delphi 12 Pro/Enterprise to try it.

Why this technical choice / Why VCL?

My main goal was to learn WebView2 and Edge. VCL was the only practical option, since Embarcadero hasn’t provided an FMX wrapper for WebView2 yet.

WebRTC & AEC:

I haven’t integrated WebRTC/AEC yet, but it’s next on my roadmap, especially as I plan to experiment with OpenAI’s realtime API (https://platform.openai.com/docs/api-reference/realtime).
Your questions are very relevant. I’ll report back once I explore those aspects.

JSBridge compatibility & asset loading:

Yes, EdgeAudio’s JSBridge is designed to control WebAudio in WebView2 via ExecuteScript and to receive JSON events via OnWebMessageReceived. That’s its native mode.
For the UI, EdgeAudio expects an index HTML and all assets from a local WebPath. The recommended approach is to use NavigateToIndex, mapping a virtual host for a secure context and proper CORS handling.
Using NavigateToString (i.e., injecting everything from memory) is technically possible but not aligned with the current architecture. Without the virtual host, you lose the “secure context” and CORS protection; and if assets aren’t served via WebPath, the audio UI doesn’t function as intended.

Many of your questions are the same ones I’ll be tackling soon as I move forward with EdgeAudio. Your feedback is a great help, and I’ll be sure to share findings and updates as I continue development!

Thanks again!

Kas Ob. · 2025-08-25T14:47:34Z

3 hours ago, Maxidonkey said:

For the UI, EdgeAudio expects an index HTML and all assets from a local WebPath. The recommended approach is to use NavigateToIndex, mapping a virtual host for a secure context and proper CORS handling.
Using NavigateToString (i.e., injecting everything from memory) is technically possible but not aligned with the current architecture. Without the virtual host, you lose the “secure context” and CORS protection; and if assets aren’t served via WebPath, the audio UI doesn’t function as intended.

I see it now, NavigateToString doesn't have an origin, hence doesn't have SecurityContext, and no SecurityContext means no media access as these are protected.

Thank you and good luck !

Sign In

EdgeAudio: real-time audio pipeline for Delphi/VCL with WebView2

Recommended Posts

Maxidonkey 30

Share this post

Link to post

Kas Ob. 152

Things You Should Know

Share this post

Link to post

Maxidonkey 30

Share this post

Link to post

Kas Ob. 152

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity