Bitmaps to Video for Mediafoundation

Renate Schaaf · June 24

I've just uploaded an update to my project

https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation

What it does:
Contains a VCL-class which encodes a series of bitmaps and video-clips together with an audio-file to video.

The result is an .mp4-file with H264 or H265 compression together with AAC-audio.

It uses windows mediafoundation, which is usually contained in windows. Hardware-encoding is supported, if your graphics-card can do it.

Requires:
Headers for mediafoundation from FactoryXCode: https://github.com/FactoryXCode/MfPack
Windows 10 or higher
Encoder (MF-Transform) for H264/H265, usually come with the graphics-driver
Delphi XE7 or higher, if I haven't messed it up again, I've only got the CE and Delphi2006
(Win32 and Win64 should be working, but Win64 recently crashes for me with "The session was disconnected".)

The demo-project shows some uses:
Record a series of canvas-drawings to video
Make a slideshow from image-files (.bmp,.jpg,.png,.gif) with music (.wav, .mp3, .wmv, ...) and 2 kinds of transitions
Insert a videoclip into a slideshow (anything that windows can decode should work)
Transcode a video-file including the first audio-stream.

Improvements:
I think I now better understand how to feed frames to the encoder. With the right settings it makes stutter-free videos with good audio-video-synchronization. It's now usable for me in my "big" project, and I no longer need to rely on ffmpeg - dlls.

More info in changes.txt.

Just try it, if you're interested, I'd be glad.

Renate

Kas Ob. · June 25

Hi @Renate Schaaf ,

Friendly reminder i pointed to in the past, the audio duration and handling is having hidden problem, it might be not visible (ok hearable) now, but it render the library not future proof and any change in the way codec API works will cause either errors or desynced audio/video

so here my thoughts on this part

about TBitmapEncoderWMF.WriteAudio https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation/blob/main/Source/uBitmaps2VideoWMF.pas#L1523-L1618

1) Ditch "goto Done;" and use try..finally it is safer and here there is no need for goto and loop is not complex, it is just exit.

2) https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation/blob/main/Source/uBitmaps2VideoWMF.pas#L1685 doesn't check, fail or warn about audio failure

3) after reading samples with pSourceReader.ReadSample https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation/blob/main/Source/uBitmaps2VideoWMF.pas#L1556-L1562 you should check the returned bytes size, are they aligned with the requested audio format ?!

This what bring the last discussion if my memory didn't fail me, the audio duration should be aligned, in other words

AudioBlock := nChannels * wBitsPerSample / 8 , this will give the exact amount in bytes that can't be divided, so any audio data passing should be multiple of AudioBlocks, but we almost always have these as integers then

AudioBlock := (nChannels * wBitsPerSample) div 8 ; this should do

Now to do the the extra check for AudioDuration you can have it like this

AudioDuration := (BufferSize / AudioBlock) * (10000000 / nSamplesPerSec);

The difference between Audio and Video i am sure you know a lot about, but you may be didn't experience or witness when codec start to

1) fail with errors

2) desync the audio-video due dropping the less than block

3) corrupt the quality with sound artefacts due internal padding of the samples on its own.

each of these is a bug could be in any codec, they all evolve and change as their implementation keep optimized and worked on.

I remember this very clearly, it was pain in the back with ASF and WMV, sometimes works next day doesn't on the same Windows, the root cause was the block alignment, even if the other codec handling the audio decoding did the mistake and returned wrong size you should hold on the left over and feed it later, example 2channels with 16bit samples the size is 4 bytes, for 6 channels and 24bits the size is 18 bytes , you can test different audio files like with 5.1 and 7.1 (6 channels and 8 channels) using sample from https://www.jensign.com/bdp95/7dot1voiced/index.html

Hope that help.

ps this part

    // fAudioDuration can be false!
    // if fAudioTime >= fAudioDuration then
    // fAudioDone := true;
    if fAudioDone then
      hr := pSinkWriter.NotifyEndOfSegment(fSinkStreamIndexAudio);
    // The following should not be necessary in Delphi,
    // since interfaces are automatically released,
    // but it fixes a memory leak when reading .mkv-files.
    SafeRelease(pAudioSample);

Is disturbing,

1) The commented "if fAudioTime >= fAudioDuration then" is right and should be used but "fAudioDuration can be false!" well i would love to hear how this happen.

2) "but it fixes a memory leak when reading .mkv-files." return us to (1) from above using try..finally is best and will prevent memory leak, but such a case for .mkv files is strange and should be investigated deeper as it could be serious problem and might cause huge leak in the loop it self depleting OS memory specially for 64bit.

Renate Schaaf · June 25

Hi Kas,

Good to see you again, and sorry for the long time of inactivity on my part. Thank you for the detailed input, which I need to digest first. Since you already invested so much thought, wouldn't you like to be a contributor? When I incorporate the changes you mention, I wouldn't even know how to list you as contributor. The issues you mention definitely need to be looked into. For the audio-part I was just glad it worked, and haven't put much thought into it lately. The wrong audio-duration was returned by some .vobs, which aren't really supported in the first place. The missing SafeRelease(pAudioSample) has caused memory leaks for me in a totally different context too, when I tried to write some code which simply plays an audio file through the default-device.

Renate

Anders Melander · June 25

1 hour ago, Kas Ob. said:

1) Ditch "goto Done;" and use try..finally it is safer and here there is no need for goto and loop is not complex, it is just exit.

There's not even a need for try..finally since there no resources to protect; Get rid of hr, assign the results to Result directly and just exit.

Also, instead of:

raise Exception.Create('Fail in call nr. ' + IntToStr(Count) + ' of ' +
  ProcName + ' with result $' + IntToHex(hr));

I would use:

raise Exception.CreateFmt('Fail in call no. %d of %s with result %x', [Count, ProcName, hr]);

for readability.

Renate Schaaf · June 25

Hi Anders,

Thanks for that. I hate the format-strings, because I can never remember the code for the place-holders. I had already thought before, that I should get used to them, though. Now I also see, that I forgot to use IntToHex(hr,8) 🙂

Anders Melander · June 25

1 hour ago, Renate Schaaf said:

I hate the format-strings, because I can never remember the code for the place-holders.

If only there was some magic key you could press in the editor to display help about various relevant topics... 🙂

I only ever use %s %d, %n and %x - and I use those a lot so that helps but I sometime need to consult that magic key when it comes to the precision or index specifiers.

Kas Ob. · June 25

4 hours ago, Renate Schaaf said:

Hi Kas,

Good to see you again, and sorry for the long time of inactivity on my part. Thank you for the detailed input, which I need to digest first. Since you already invested so much thought, wouldn't you like to be a contributor? When I incorporate the changes you mention, I wouldn't even know how to list you as contributor. The issues you mention definitely need to be looked into. For the audio-part I was just glad it worked, and haven't put much thought into it lately. The wrong audio-duration was returned by some .vobs, which aren't really supported in the first place. The missing SafeRelease(pAudioSample) has caused memory leaks for me in a totally different context too, when I tried to write some code which simply plays an audio file through the default-device.

Renate

I didn't contribute anything, and very much thank you for offering, but don't worry about this.

One more thing, about the whole duration thing, but to explain i want to go back in time to many decades back, when the standard of highest audio quality chose 44100hz as CD quality and best quality, this is strange number at first, when you know how they came up with it, things get clearer, read this https://en.wikipedia.org/wiki/44,100_Hz#Recording_on_video_equipment

So, to interleave the audio and video because things was very different back then and storing or buffering the audio was very expensive and complicated using simple circuits available back then, they needed a number to make sure to divide and support 50fps and 60fps with 3 sample per line, so they can interleave the samples with the lines data for video.

Fast forward decades later, and we don't have the only PAL and NTSC system, we have so many combination of FPS with sizes, but still most used standard fps are 23.976 and 29.97 (among less used 24, 25, 29.97, 30, 50, 60..), strange ? that is question have to do with old systems, the internet have so many resources answering this question, yet web broadcasting and multiplexing streams changed things a lot, so we can't depend on these only, while most likely you don't see 44100 like it was back then.

Anyway, in the past they changed the audio duration based on video to be compliant, but for modern times and better sample rate which at least 48k, also technology that allow or relief the need to output directly, but use the buffer ahead, the whole thing still need syncing, and to make sure the audio-video are synced then they should follow a rule, the audio should be aligned into sample per second and the video should should accommodate this unlike what was happening in the past,

So even the video duration should be multiple of audio sample duration, consider this if you need your encoding %100 synced or the best it can be, in other word correct the video duration too, while audio sample rate is high number you have more flexible fps per second, the difference could be between 40 and 40.00002, yet if your video is 40.00002 i never saw i player show it as 40.00002 but always 40, this difference will prevent desyncing.

Hope that was clear and thank you again for the work on this and for the offering,

Renate Schaaf · June 25

5 hours ago, Kas Ob. said:

So even the video duration should be multiple of audio sample duration, consider this if you need your encoding %100 synced or the best it can be, in other word correct the video duration too, while audio sample rate is high number you have more flexible fps per second, the difference could be between 40 and 40.00002, yet if your video is 40.00002 i never saw i player show it as 40.00002 but always 40, this difference will prevent desyncing.

Hope that was clear and thank you again for the work on this and for the offering,

If I understand this right, I should match video-timestamp and duration to the closest blockalign-boundary of audio? If the difference in frame rate is really that negligable that should be doable, if I can get the math right :). Talk about not contributing, you just forced a new way of seeing things down my throat, not a small achievement.

Edited June 25 by Renate Schaaf

Kas Ob. · June 26

9 hours ago, Renate Schaaf said:

Talk about not contributing, you just forced a new way of seeing things down my throat, not a small achievement.

Oh, dear, it is the last thing i want is to waste your time or complicate things.

9 hours ago, Renate Schaaf said:

If the difference in frame rate is really that negligable that should be doable,

Ok, here a little miss understanding, the difference is always so small that is might be negligible, but might not, to get to this i want to waste few minutes to look at this page, but before reading or wasting your time,

This is about ffmpeg playing style and how the player do try to fix the syncing or enforce best result, that tutorial is nice but the relevancy is limited to this case, it is nice read and yet it might confuse the reader so i will point to the needed paragraph

http://dranger.com/ffmpeg/tutorial05.html

Quote

First is the issue of knowing when the next PTS will be. Now, you might think that we can just add the video rate to the current PTS — and you'd be mostly right. However, some kinds of video call for frames to be repeated. This means that we're supposed to repeat the current frame a certain number of times. This could cause the program to display the next frame too soon. So we need to account for that.

The second issue is that as the program stands now, the video and the audio chugging away happily, not bothering to sync at all. We wouldn't have to worry about that if everything worked perfectly. But your computer isn't perfect, and a lot of video files aren't, either. So we have three choices: sync the audio to the video, sync the video to the audio, or sync both to an external clock (like your computer). For now, we're going to sync the video to the audio.

That from player point of view, all players has limited way to fix it at runtime, so the better timing from the encoder, the better player will play, by player i am talking about decoder and any player out there advanced or stupid, in other words if the encoder did the right things then the decoder will right thing and give right result ( synced audio-video)

Main issue as i pointed to in PAL/NYSC earlier, is the difference between audio and video, just reminder, video can be at arbitrary duration can delay or go faster a fraction of second and human eyes most likely will not notice, audio on the other hand we can't stop playing at intervals like skipping 1/1000 of second every 1/20 of second will generate acoustic artefact and will be hearable, if we over supplied more samples then the player happily will play them and cause desynced audio-video, with that in mind lets see how and what to adjust

9 hours ago, Renate Schaaf said:

If I understand this right, I should match video-timestamp and duration to the closest blockalign-boundary of audio?

Right, that is it.

I think you already went the wrong path and overengineering the whole thing, this is not re-write and in fact (i think) it will be less than 100 lines of changes here and there, small ones including altering an existing lines.

I went again to try to compile the demo on my XE8 , and there is small bugs

1) LogicalCompare doesn't compile as StrCmpLogicalW is not found

2) IntToHex Must have second parameter

3) removed TArray<string> , this one is very strange, causing access violation as the array is not initialized, the AV appear in the caller as it cause corrupt stack, or may be i am living under a rock and modern Delphi compiler do initialize it.

procedure TDirectoryTree.GetAllFiles(
  const aStringList: TStringlist;
  const aFileMask:   string);
..
{$IF COMPILERVERSION < 30.0}
  i: integer;
  //Strings: TArray<string>;
  ClassicStrings: TStringDynArray;
{$ENDIF}
..
    // Copy all fields to the new array
    for i := Low(ClassicStrings) to High(ClassicStrings) do
      aStringList.Add(ClassicStrings[i]);   // <--
      //Strings[i] := ClassicStrings[i];    // <--

    //aStringList.AddStrings(Strings);      // <--

Now it do compile, the steps i did

1) changed nothing in the settings, just selected SlideShow..

2) i have one icon in that path and it is selected by default

3) checked "Display dialog to add audio..

4) checked "Adjust presentation time to audio time

5) clicked make slideshow

6) It did ask for audio file, so i selected "Nums_5dot1_24_48000.wav" from earlier post the file with 6 channels (5.1)

7) message popped up with "Calculated image time: 7049 ms" this is strange as the audio file is reported 9 seconds by my MPC-HC player

Clicking yes report

Slideshow time: 9049 ms (00:00:09 [h:m:s])
Output video duration: 9066 ms (00:00:09 [h:m:s])
Audio duration: 9049 ms
File size: 0.23 MB

and the result is this ".mkv" video file Example_H264.mp4 Example_H264.zip (had to compress it to prevent the forum from encoding or messing with it)

It looks nice, but no way to check for problems without deeper debugging !

So downloaded ffmpeg-git-full.7z from https://www.gyan.dev/ffmpeg/builds/

then ran this command to see the frames

Quote

ffprobe -show_frames Example_H264.mp4 > AVframes.txt

Here is the output file AVframes.txt

the frames are interleaved as they should be some video frames followed by audio frames, the pattern is correct and very similar to any other video file, BUT....

Looking at the last audio frame and comparing it with the last video frame shows this

Quote

[FRAME]
media_type=video
stream_index=0
key_frame=0
pts=271000
pts_time=9.033333
pkt_dts=271000
pkt_dts_time=9.033333
best_effort_timestamp=271000
best_effort_timestamp_time=9.033333
duration=1000
duration_time=0.033333
pkt_pos=233240
pkt_size=238
width=1280
height=720
crop_top=0
crop_bottom=0
crop_left=0
crop_right=0
pix_fmt=yuv420p
sample_aspect_ratio=1:1
pict_type=P
interlaced_frame=0
top_field_first=0
lossless=0
repeat_pict=0
color_range=unknown
color_space=unknown
color_primaries=unknown
color_transfer=unknown
chroma_location=left
[/FRAME]
[FRAME]
media_type=audio
stream_index=1
key_frame=1
pts=434176
pts_time=9.045333
pkt_dts=434176
pkt_dts_time=9.045333
best_effort_timestamp=434176
best_effort_timestamp_time=9.045333
duration=1024
duration_time=0.021333
pkt_pos=235043
pkt_size=313
sample_fmt=fltp
nb_samples=1024
channels=2
channel_layout=stereo
[/FRAME]

This is clearly desyncing, small or big is an argument, but lets don['t forget the whole video is 9 seconds and already drifted by 12ms, with this same settings and input but with an hour length video the drift will be (with fast math) around 4.8 seconds !

I will submit this replay before any blackout that will make me cry and out of laziness and frustration i might ditch the whole subject, as received an SMS to inform me of planned blackout

Anders Melander · June 26

FWIW & FYI:

54 minutes ago, Kas Ob. said:

IntToHex Must have second parameter

Newer versions of Delphi have IntToHex overloads for NativeInt to match the size of a pointer. I think it got introduced in Delphi 10.

52 minutes ago, Kas Ob. said:

removed TArray<string> , this one is very strange, causing access violation as the array is not initialized, the AV appear in the caller as it cause corrupt stack, or may be i am living under a rock and modern Delphi compiler do initialize it.

TArray<T>, being a dynamic array, is a manages type and string too, so TArray<string> should be initialized automatically. Probably a compiler bug.

Kas Ob. · June 26

Now i show that there is problem in syncing, confirming my theoretical doubts looking at how audio frame duration against video frame duration was calculated.

And to reiterate about how important this is, if you take long video, a fully correct and synced one, and split/extract the audio and video streams into files then used the same/similar code to generate mkv again the result will not be the same as the original and they will be desynced, (there is a chance that the result synced but it will be pure luck based on the parameter).

Now to more math reason why this happen in my generated demo video

Video: MPEG4 Video (H264) 1280x720 30fps 59kbps [V: h264 high L3.1, yuv420p, 1280x720, 59 kb/s]
Audio: AAC 48000Hz stereo 148kbps [A: aac lc, 48000 Hz, stereo, 148 kb/s]

1) Audio from the input is irrelevant as clearly had been re-encoded from 6channels to 2channels

Audio is 48000hz so, lets get one audio sample duration

  AudioSampleDuration := 1.0 / 48000;   // 20.833 microsecond (microsecond = 1/1000000 of second)
  VideoFps := 30.0;
  VideoFrameDuration := 1.0 / VideoFps; // 0.033333 ms

Now we need to decide how many audio frames is needed to accommodate the VideoFrameDuration, we after all will interleave them, so we can put video frames and compensate with wider interleave, in other words we don't need to make sure after one video frame we should add audio frames

  AudioSamlesPerVideoF := round(VideoFrameDuration / AudioSampleDuration); // ~1600 sample per video frame
  AudioFrameDuration := AudioSamlesPerVideoF * AudioSampleDuration;        // 0.033333 ms
  AdjudstedVideoFPS := 1.0 / AudioFrameDuration;                           // ~30.0003 fps
  // so the frame rate aspect ratio (MF_MT_FRAME_RATE) should be
  fpsNum := 300003;
  fpsDen := 10000;

see, if we use 0.03333 instead of 0.033333 the result will be 30.003fps, it happen for this exact parameters the difference is so small,

Anyway, this shows small desyncing, and with the way audio frames requested and written cause the little 12ms drift, as we have 30*9=270 frame, while the math above doesn't explain the whole 12ms but and (i hope) clears the math i talked about earlier.

Now to solve and prevent this, we have two ways, but first we must keep in mind two essential things

1) Video frames and their duration are very forgiving, so increasing by 0.01ms or 0.001ms is OK, as we saw in ffmpeg it does its algorithm to correct as long the increase is momentary not accumulated for every frame.

2) Audio is not forgiving if we were short (under-buffer) by even one sample, this will be heard, with audio you can't stop playing the speaker will stop for fraction of second corrupt the wave frequency, in short skips it might be hissing or buzzing or all sort of acoustic artefacts, and if we over-buffer the audio will be desynced, the player can keep track for so much then it will give up, like under-buffer also will desync except without the artefacts.

The solution is either

1) Adjust the video FPS like above to something 30.0003 or 3.003, but even this is not enough.

2) Feed audio frames with dynamic length/size, this might also allow us to not touch video FPS,

but HOW ?

The answer is refactor your loops, we calculated the real and synced video frame duration and we have audio frame duration but we must calculate and then add on the audio sample duration

In other words this can be solved within the audio writing loop to check if we under the exact duration then we add a sample or more, but we must keep a record and keep t(time we overflow) how much we overflow, it is better to always overflow with at least one sample, then on the next loop of writing we use that we write less by t, next we will just rinse and repeat,

In other words and audio frames will not and should not be fixed count, it will more followed by fewer then more then fewer ....

And that is it, this should ensure the output video is synced.

Hope that is clear enough.

Again don't overthink it, because i think you went in that direction, just close your eyes and imagine TV signal PAL/NTSC and the interleave, back then there was one format, now we have different numbers so will compensate be creating dancing equilibrium,

And again it should be on Video side, but i think my very this suggestion of dancing or twitching audio frames size should be logically and factually correct, unless some codecs doesn't accept arbitrary audio frames and will only accept constant frame duration, in that case the video frame duration must be corrected and as it is the forgiving one, this is the codecs hell.

From my understanding of the current MediaFoundation it is forgiving in the audio frame duration.

PS;

1) the speed of video seeking in player is highly affected by this overflow and underflow, so the more accurate and close for correction we do the faster nay player will be.

2) there is noting you add to these loops that can affect the speed of encoding or playing, so more math lines, even the long ones has zero on the whole process decoding video and audio takes hundreds of millions of CPU cycles, so there is sense to cut the the feeding algorithm no matter how slow it is or how much math and variable storing is involved.

Ps2; just don't feed or accept audio buffers (in bytes) in any direction (feeding in or out) if they are not aligned to the bytes per sample, it will ruin everything, i can't emphasize enough how this is important, so more checks are needed and the best you can do is fail to proceed.

Kas Ob. · June 26

2 minutes ago, Anders Melander said:

TArray<T>, being a dynamic array, is a manages type and string too, so TArray<string> should be initialized automatically. Probably a compiler bug.

Even without SetLength ?

Does the following work ?!

Procedure Test;
var
  Strings: TArray<string>;
begin
  strings[5] := 'Hi';
end;

Because it is used there similar to this.

Anders Melander · June 26

Just now, Kas Ob. said:
Does the following work ?!
Procedure Test;
var
  Strings: TArray<string>;
begin
  strings[5] := 'Hi';
end;
Because it is used there similar to this.

🙂

No, of course that doesn't work

A dynamic array is initialized to an empty array so there needs to be a SetLength first.

Kas Ob. · June 26

1 minute ago, Anders Melander said:

No, of course that doesn't work

A dynamic array is initialized to an empty array so there needs to be a SetLength first.

Well i am bad writer, i said strange as this is exactly how it is used https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation/blob/main/Utilities/uDirectoryTree.pas#L313-L314

And how it didn't raise AV on Renate debugger, that is strange thing.

Anders Melander · June 26

7 minutes ago, Kas Ob. said:

Well i am bad writer, i said strange as this is exactly how it is used https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation/blob/main/Utilities/uDirectoryTree.pas#L313-L314

And how it didn't raise AV on Renate debugger, that is strange thing.

Likely because Renate is using a newer version of Delphi. As far as I can tell the faulty code was contributed by someone else.

Renate Schaaf · June 26

Thanks everybody. Now I have a lot to think about, a great chance to expand my horizon at the age of 74:). I'll fix the code. But then I need a bit of time to think. The info is great.

1 minute ago, Kas Ob. said:

And how it didn't raise AV on Renate debugger, that is strange thing.

Because my poor debugger didn't run the code, because I didn't tell it to do so. I pasted that compatibility code in without checking, probably missed another piece. Mistake I won't do again. So I need to disable LogicalCompare for more compiler versions, or write a header for StrCmpLogicalW.

Renate Schaaf · June 26

1 hour ago, Kas Ob. said:

message popped up with "Calculated image time: 7049 ms" this is strange as the audio file is reported 9 seconds by my MPC-HC player

presentation time = image time + effect time (2000).

Renate Schaaf · June 26

On 6/25/2025 at 10:47 AM, Kas Ob. said:

2) https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation/blob/main/Source/uBitmaps2VideoWMF.pas#L1685 doesn't check, fail or warn about audio failure

I don't think that's quite true, if it fails the rest of WriteOneFrame isn't executed and in Line 1713 an exception is raised with errorcode hr. I could translate it into an EAudioFormatException, though, at the spot you indicate.

On 6/25/2025 at 10:47 AM, Kas Ob. said:

The commented "if fAudioTime >= fAudioDuration then" is right and should be used

It was meant as an extra safety check, since the code already checks for EndOfStream, and that hasn't failed so far. But I've put it back in.

Edited June 26 by Renate Schaaf

Kas Ob. · June 26

1 minute ago, Renate Schaaf said:

I don't think that's quite true,

Yes, my mistake i saw it as inside the loop, stupid hasty looking.

2 minutes ago, Renate Schaaf said:

On 6/25/2025 at 11:47 AM, Kas Ob. said:

The commented "if fAudioTime >= fAudioDuration then" is right and should be used

It was meant as an extra safety check, since the code already checks for EndOfStream, and that hasn't failed so far. But I've put it back in

If you are going to adjust the audio frame duration then it will be used but you must save/remember the difference for the next frame to subtract/reduce, so in its current usage now it is redundant.

Switching to dynamic audio frames will be way better and more accurate, and will prevent desyncing.

Renate Schaaf · June 26

10 hours ago, Kas Ob. said:

And to reiterate about how important this is, if you take long video, a fully correct and synced one, and split/extract the audio and video streams into files then used the same/similar code to generate mkv again the result will not be the same as the original and they will be desynced, (there is a chance that the result synced but it will be pure luck based on the parameter).

With a little change you can perform that test from within the demo, I think. Just put a little change into TBitmapEncodeWMF.AddVideo:

procedure TBitmapEncoderWMF.AddVideo(
  const
  VideoFile: string;
  TransitionTime: integer = 0;
  crop:           boolean = false;
  stretch:        boolean = false);
var
  VT: TVideoTransformer;
  bm: TBitmap;
  TimeStamp, Duration, VideoStart: int64;
begin
  if not fInitialized then
    exit;
  VT := TVideoTransformer.Create(
    VideoFile,
    fVideoHeight,
    fFrameRate);
  try
    bm := TBitmap.Create;
    try
      if not VT.NextValidSampleToBitmap(bm, TimeStamp, Duration) then
        exit;
      if TransitionTime > 0 then
        CrossFadeTo(
          bm,
          TransitionTime,
          crop,
          stretch);
      VideoStart := fWriteStart;
      // fill gap at beginning of video stream
      if TimeStamp > 0 then
        AddStillImage(
          bm,
          Trunc(TimeStamp / 10000),
          crop,
          stretch);
      while (not VT.EndOfFile) and fInitialized do
      begin
        BitmapToRGBA(
          bm,
          fBmRGBA,
          crop,
          stretch);
        bmRGBAToSampleBuffer(fBmRGBA);
        // !!!!! Change is here for extra hard sync-check:
        // WriteOneFrame(
        // VideoStart + TimeStamp,
        // Duration);

        // Write the decoded video stream in exactly the same way as AddFrame would.
        // I.e. with the same timestamps, not taking any timestamps from the
        // video-input
        WriteOneFrame(
          fWriteStart,
          fSampleDuration);
        if not VT.NextValidSampleToBitmap(bm, TimeStamp, Duration) then
          Break;
      end;
      // FrameCount*FrameTime > Video-end? (shouldn't differ by much)
      // if fWriteStart > VideoStart + TimeStamp + Duration then
      // Freeze((fWriteStart - VideoStart - TimeStamp - Duration) div 10000);
    finally
      bm.Free;
    end;
  finally
    VT.Free;
  end;
end;

Then transcode a movie on the Demo-Tab "Use TBitmapEncoderWMF as a transcoder". It uses the procedure TranscodeVideoFile, treating the video- and audiostream of an input-video as totally independent inputs. AddVideo decodes the video-stream into a stream of bitmaps, and the input-video is used again as audiofile. I encoded 40 minutes of "Fellowship of the Ring" this way, and did not see any desynching. You'll probably say that's no proof, and you'd be right, but it might be an indication that the problem isn't as severe.

Or the video player is just very good at making something usable out of the input.

Renate Schaaf · June 26

On 6/25/2025 at 1:02 PM, Anders Melander said:
I would use:
raise Exception.CreateFmt('Fail in call no. %d of %s with result %x', [Count, ProcName, hr]);
for readability.

Hi, Anders,

CreateFmt uses internally

constructor Exception.CreateFmt(const Msg: string;
  const Args: array of const);
begin
  FMessage := Format(Msg, Args);
end;

and help says that this version of Format isn't threadsafe, since it uses the locale for the decimal separator. Now I'm not using decimal-separators here, and I guess once the exception is raised in a thread thread-safety doesn't really matter anymore?

Another thing: Is %x.8 doing the same as IntToHex(hr,8)?

Renate

Edited June 26 by Renate Schaaf

Anders Melander · June 26

4 minutes ago, Renate Schaaf said:

help says that this version of Format isn't threadsafe

"Not thread-safe" in this case doesn't mean crash and burn. It just means that if one thread modifies the global FormatSettings then it will affect all other threads also using it.

Hardly a problem - even if you did output floating point values in the exception message.

Kas Ob. · June 27

12 hours ago, Renate Schaaf said:

I encoded 40 minutes of "Fellowship of the Ring" this way, and did not see any desynching.

Great, and you are on the right way, make use of real video for testing.

12 hours ago, Renate Schaaf said:

You'll probably say that's no proof, and you'd be right

You are right and i am right here, but you are ignoring one essential fact, that your daily player as mine are good, they are so good that they fix this stuff at runtime to their best effort, i provided earlier a link to ffmpeg and how it does its magic to fix these to its best effort.

As for proof, i will provide one, but let me make it clear here, your player not showing the problem is solemnly based on their advanced algorithm for corrections, BUT and it is huge but, if you are you use similar code with basic algorithms and APIs just like yours the problem will manifest badly.

now to the proof;

1) i went to this page https://archive.org/details/4-k-hdr-60-fps-dolby-vision-demo-2160p-60fps-vp-9-128kbit-aac and didn't download the video with highest quality, i used the "H.264" on the right side under "Download Options" it is 16.6mb , that is enough.

2) I have ffprobe (the link to download is in earlier post) and extracted the frames and their timing for debugging the video its self and see how original with quality encoder have interleaved the frames and their duration.

3) i transcoded the video without changing the default setting and repeated the process with output video and restoring the FPS to 60 instead of 30, the downloaded/original video is 59.94 fps

4) used the ffprobe on these 3 video files

5) made small parser for the ffprobe output, here is its parsing loop, it is easier to see and follow than using Notepad++

procedure TForm10.ProcessFile(const FileN: string);
var
  StrList: TStringList;
  Line: string;
  i, p: Integer;
begin
  StrList := TStringList.Create;
  try
    StrList.LineBreak := '[/FRAME]'#13#10;
    StrList.LoadFromFile(FileN);
    if StrList.Count = 0 then
      Exit;
    Memo1.Lines.BeginUpdate;
    try
      for i := 0 to StrList.Count - 1 do
      begin
        Line := StrList[i];
        p := PosEx('pkt_pos', Line);
        if p > 0 then
          Line := LeftStr(Line, p - 1);
        if Length(Line) > Length('[FRAME] media_type=') then
          Line := RightStr(Line, Length(Line) - Length('[FRAME] media_type=') - 1);
        Line := ReplaceStr(Line, #13#10, ' ');
        Memo1.Lines.Add(Line);
      end;
    finally
      Memo1.Lines.EndUpdate;
    end;
  finally
    StrList.Free;
  end;
end;

Now i went to compare and check for the timestamps

1) the original (downloaded) some of the begging and middle and end of that original video, how commercial (or may be even not commercial) and high end encoder interleaved the frames and their durations

What we see, smooth interleave with fixed frames length, but it compensate with extra video when it is short, and extra audio frame when it is short, notice the audio and video end at the same, the green box show at the end the difference is small, yet after that it will push both streams to the end but the difference through out all video didn't go above this 10-13 ms,

2) Lets see the transcoded version at 30 fps (default setting)

and from the middle

We see a desyncing in the middle the last video frame in the video sequence is 99.133333 while the following audio sections of frames, the last one corresponding to that video frame is 99.118271 the difference is 15ms

at the end the difference is 33ms, so we correct and synced video now is desynced and audio is drifting by something 10-11ms per minutes, not much , i agree and in fact it is almost within the allowed standard for 10ms drift, but again this small drifting is for this combination for this video.

3) lets see the one i transcoded further from the generated 60 fps into 60fps and the result , taking only the first frames and the last

Now it is more synced at the end than the first transcoded one, the drift is only 6ms !

How this happen ?

Just look at how the first frames and their time stamps, the audio start at "pts=2229 pts_time=0.046437" meaning the audio is drifted forward 46ms since the the start and lost 40ms till the end.

Dear Renate, in no way i am saying it is bad or wrong, pointing to anything, and pretty please don't take my posts here in the wrong way, you did awesome job and pretty neat usage MF, i merely want to show you a better way, make it commercial worth and avoid these hidden bugs, my points here are for anyone who love to know, or care.

Now, is it huge adjustment, no you are overthinking it, it is the small details, like when the audio frames should be pushed and interleaved?, how many of these streams and frames of sort before switching to another? do we need to make sure of the duration in dynamic way instead of simple loops?

All of these questions are very simple and the change in your code should be minimum, again not asking you to change it, just to point when it fail if it failed beyond acceptable video quality, the culprit was the fixed frames count and duration, and that is it.

Debugging (in this case dissecting) the video itself to its frames while keeping an eye on how best practice is being utilized is way better then watching video as player play their roles, and not being sure of using best practice or accurate timing.

To summarize what could be better;

1) push less frames before switching between streams like shown above, like you LOTR video, 1or 2 then switch instead of pushing over 8,

2) make sure the the audio frames start form 0 time

3) if you don't like adjusting video duration, it is ok, we have many ways to skin a cat, don't know what the cat did but follow best practice and the cat will be skinned, adjust audio frames, even this is not really necessary, and both can be skipped, and not touched

4) third way to fix 3, we use the most naive way, and make sure there is no drifting skipping (3) , just make audio frames try to hit the video duration or more, if it did hit more even with 1ms then the next chunk of audio push less frames and skip an audio frame, because we are pushing at 1 video frame then 1 audio frame, then it is easy as skip this time.

and we fixed synced video.

Anyway, that is all and good luck, it is your code and it is your call.

Renate Schaaf · June 27

First of all, thanks. I'll be back as soon as I understand better what the analysis is showing, and then I might be able to do the "little code changes" you mention :). Just keep in mind, that the sinkwriter isn't giving you any control over dts, pts., or how video and audio are interleaved.

Renate

1 hour ago, Kas Ob. said:

Anyway, that is all and good luck, it is your code and it is your call.

Kas Ob. · June 27

31 minutes ago, Renate Schaaf said:

First of all, thanks. I'll be back as soon as I understand better what the analysis is showing, and then I might be able to do the "little code changes" you mention :). Just keep in mind, that the sinkwriter isn't giving you any control over dts, pts., or how video and audio are interleaved.

Great, just pay attention to the samples i provided above and download stuff from the internet, use LOTR video as role of thumb...

One thing though, double check why is the difference which is very visible from the downloaded video in my last post in the pts (Presentation Time Stamp), in that video it is 200 for video and 1024 for audio, while transcoded video and the one built from slideshow shows 1000 for video pts and the while still 1024 for audio, while 1024 for audio is correct for 48khz, i can't understand how 1000 is there for both 30fps and 60fps (even after transcoding into 60fps from 30 still the same)

Sign In

Bitmaps to Video for Mediafoundation

Recommended Posts

Renate Schaaf 81

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Anders Melander 2137

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Renate Schaaf 81

Share this post

Link to post

Kas Ob. 160

Share this post

Link to post

Create an account or sign in to comment

Create an account