

Renate Schaaf
Members-
Content Count
139 -
Joined
-
Last visited
-
Days Won
6
Renate Schaaf last won the day on July 7
Renate Schaaf had the most liked content!
Community Reputation
74 ExcellentAbout Renate Schaaf
- Birthday 07/06/1951
Technical Information
-
Delphi-Version
Delphi Community Edition
Recent Profile Visitors
The recent visitors block is disabled and is not being shown to other users.
-
I've been wondering about the same, maybe the setup isn't right. But I find it so hard to even figure out what settings you can specify. You lost me here. What 10m, and what's gps? Indeed, that looks interesting. Thanks, your links have already helped me a lot to understand better. Renate
-
I think I solved the audio-syncing ... kind of. First observation: Audio and video are perfectly synced if the audio comes from a .wav-file. You can check this using the optimal frame-rates 46.875 or 31.25. So for optimal synching, compressed audio should be converted to .wav first. I have added a routine in uTransformer.pas which does this. In the demo there are some checkboxes to try this out. Second observation: For compressed input the phase-shift in audio happens exactly at the boundaries of the IMFSamples read in. So this is what I think happens: The encoder doesn't like the buffer-size of these samples and throws away some bytes at the end. This causes a gap in the audio-stream and a phase-shift in the timing. I have a notorious video where you can actually hear these gaps after re-encoding. If I transform the audio to .wav first, the gaps are gone. One could try to safekeep the thrown-away bytes and pad them to the beginning of the next sample, fixing up the time-stamps... Is that what you were suggesting, @Kas Ob.? Well, I don't think i could do it anyway :). So right now, first transforming audio-input to .wav is the best I can come up with. For what I use this for it's fine, because I mix all the audio into one big .wav before encoding. Renate
-
There is a new version at https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation. New stuff: Some rewrite of audio, making sure that gaps at the beginning of a stream are filled with silence. 2 optimized frame-rates for audio-synching, see below. Most importantly: One can now run @Kas Ob.'s frame analysis from within the demo, if one enables the hidden tab "Analysis". I just made the lines a bit shorter, as the rest was just repeating the same values for all I tested, as far as I could see. The file ffprobe.exe needs to be in the same directory as DemoWMF.exe. ffprobe is part of ffmpeg-git-essentials.7z on https://www.gyan.dev/ffmpeg/builds/. I spent a good amount of time trying to figure out what I can and what I cannot control about audio-synching, tracing into the relevant code and running the analysis. Results of audio-rethynching follow (beware, it's long): The math is for audio-sample-rate of 48000 and the time units are all s. Audio-blockalign is always 4 Bytes for what I do. There are at least 2 different meanings of "sample": PCMSample: as in samples per second. ByteSize: Channels*BitsPerSample/8 = 2*16/8 = 4 Bytes. Time: 1/48000 s IMFSample: Chunk of audio returned by IMFSourceReader.ReadSample. It contains a buffer holding a certain amount of uncompressed PCMsamples, and info like timestamp, duration, flags ... The size of these samples varies a lot with the type of input. Some observed values: .mp3-file 1: Buffersize = 96 768 Bytes Duration = 0.504 (96768 bytes = 96768/4 PCMSamples = 96768/4/48000 s OK) .mp3-file 2: Buffersize = 35 108 Bytes Duration = 0.1828532 (35108/4/48000 = 0.182854166.. not OK) .wmv-file: Buffersize = 17 832 Bytes Duration = 0.092875 (17832/4/48000 = 0.092875 OK) Except for the first sample read, the values don't differ from sample to sample. Those are the samples I can write to the sinkwriter for encoding. Breaking them up seems like a bad idea. I have to trust MF to handle the writing correctly. The buffers seem to always be block-aligned. I've added some redundant variables in TBitmapEncoderWMF.WriteAudio so these values can be examined in the debugger. A related quantity are audio-frames. Similarly to the video-stream the audio-stream of a compressed video consists of audio-frames. 1 audio-frame contains the compressed equivalent of 1024 PCMSamples. So: AudioFrameDuration = 1024/48000 AudioFrameRate = 48000/1024 I can only control the writing of the video by feeding the IMFSamples of video and audio to the sinkwriter in good order. The samples I write to the sinkwriter are collected in a "Leaky-Bucket"-buffer. The encoder pulls out what it needs to write the next chunk of video. It hopefully waits until there are enough samples to write something meaningful. Problems arise if the bucket overflows. There need to be enough video- and audio-samples to correctly write both streams. So here is the workflow, roughly (can be checked by stepping into TBitmapEncoderWMF.WriteOneFrame): Check if the audio-time written so far is less than the timestamp of the next video-frame. Yes: Pull audio-samples out of the sourcereader and write them to the sinkwriter until audio-time >= video-timestamp. Looking at the durations above, one sample might already achieve this. Write the next video-frame Repeat In the case of mp3-file 1 the reading and writing of 1 audio-sample would be followed by the writing of several video-samples. The encoder now breaks the bucket-buffer up into frames, compresses them and writes them to file. It does that following its own rules, which I have no control over. Frame-analysis can show the result: A group of video-frames is followed by a group of audio-frames, which should cover the same time-interval as the video-frames. In the output I have seen so far, the audio-frame-period is always 15 audio-frames. For video-framerate 30, the video-frame-period is 9 or 10 frames. Why doesn't it make the audio- and video-periods smaller? No idea. Guess that's the amount of info the players can handle nowadays, and these periods are a compromise between optimal phase-locking of audio- video- periods and the buffer-size the player can handle. Theoretically, at framerate 30, 16 video-frames should phase-lock with 25 audio-frames. Here is one of those video-audio-groups. Video-framerate is 30. video stream_index=0 key_frame=0 pts=39000 pts_time=1.300000 duration_time=0.033333 video stream_index=0 key_frame=0 pts=40000 pts_time=1.333333 duration_time=0.033333 video stream_index=0 key_frame=0 pts=41000 pts_time=1.366667 duration_time=0.033333 video stream_index=0 key_frame=0 pts=42000 pts_time=1.400000 duration_time=0.033333 video stream_index=0 key_frame=0 pts=43000 pts_time=1.433333 duration_time=0.033333 video stream_index=0 key_frame=0 pts=44000 pts_time=1.466667 duration_time=0.033333 video stream_index=0 key_frame=0 pts=45000 pts_time=1.500000 duration_time=0.033333 video stream_index=0 key_frame=0 pts=46000 pts_time=1.533333 duration_time=0.033333 video stream_index=0 key_frame=0 pts=47000 pts_time=1.566667 duration_time=0.033333 video stream_index=0 key_frame=0 pts=48000 pts_time=1.600000 duration_time=0.033333 audio stream_index=1 key_frame=1 pts=62992 pts_time=1.312333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=64016 pts_time=1.333667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=65040 pts_time=1.355000 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=66064 pts_time=1.376333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=67088 pts_time=1.397667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=68112 pts_time=1.419000 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=69136 pts_time=1.440333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=70160 pts_time=1.461667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=71184 pts_time=1.483000 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=72208 pts_time=1.504333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=73232 pts_time=1.525667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=74256 pts_time=1.547000 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=75280 pts_time=1.568333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=76304 pts_time=1.589667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=77328 pts_time=1.611000 duration_time=0.021333 pts stands for "presentation time stamp" and pts_time is of interest. Video-time-intervall: from 1.300000 to 1.600000+0.033333=1.633333 Audio-time-intervall: from 1.312333 to 1.611000+0.021333=1.632333 Audio is a bit ahead at the beginning and a tiny bit behind at the end. pts should be multiples of 1024, but they aren't hmm. The difference is still 1024, but they are phase-shifted. Phase-shift is 62992 mod 1024 = 528 (or -496). The interval from a bit further ahead: Video: From 8.066667 to 8.366667+0.033333=8.400000 Audio: From 8.053667 to 8.352333+0.021333=8.373666 pts-phase-shift: still 528 (-496) Audio is lagging behind. To really see what is happening I will have to implement better statistics than just looking at things 🙂 One further test: I tried to phase-lock audio and video optimally: VideoFrameRate: f. AudioFrameRate: 48000/1024, so f = 48000/1024 = 46,875. I've added this frame-rate to the demo. Result: Perfect sync for the first audio-video group. In the middle of the second group the pts-phase-shift is again 528, and audio lags behind. For the rest of the groups the lag doesn't get bigger, it is always corrected to some degree. But the file should have identical audio and video timestamps in the first place! There is another new frame-rate, which is the result of trying to phase-lock 2 video-frames to 3 audio-frames. 2/f = 3*1024/4800 results in f = 2*48000/3/1024 = 31.25 I will try to find out what causes the phase-shift in audio by parsing the ffprobe-output a bit more (sigh). Maybe generate a log-file for the samples written, too. (Sigh). No, so far it's still fun. For those, who made it up to here: Thanks for your patience. Renate
-
First of all, thanks. I'll be back as soon as I understand better what the analysis is showing, and then I might be able to do the "little code changes" you mention :). Just keep in mind, that the sinkwriter isn't giving you any control over dts, pts., or how video and audio are interleaved. Renate
-
Hi, Anders, CreateFmt uses internally constructor Exception.CreateFmt(const Msg: string; const Args: array of const); begin FMessage := Format(Msg, Args); end; and help says that this version of Format isn't threadsafe, since it uses the locale for the decimal separator. Now I'm not using decimal-separators here, and I guess once the exception is raised in a thread thread-safety doesn't really matter anymore? Another thing: Is %x.8 doing the same as IntToHex(hr,8)? Renate
-
With a little change you can perform that test from within the demo, I think. Just put a little change into TBitmapEncodeWMF.AddVideo: procedure TBitmapEncoderWMF.AddVideo( const VideoFile: string; TransitionTime: integer = 0; crop: boolean = false; stretch: boolean = false); var VT: TVideoTransformer; bm: TBitmap; TimeStamp, Duration, VideoStart: int64; begin if not fInitialized then exit; VT := TVideoTransformer.Create( VideoFile, fVideoHeight, fFrameRate); try bm := TBitmap.Create; try if not VT.NextValidSampleToBitmap(bm, TimeStamp, Duration) then exit; if TransitionTime > 0 then CrossFadeTo( bm, TransitionTime, crop, stretch); VideoStart := fWriteStart; // fill gap at beginning of video stream if TimeStamp > 0 then AddStillImage( bm, Trunc(TimeStamp / 10000), crop, stretch); while (not VT.EndOfFile) and fInitialized do begin BitmapToRGBA( bm, fBmRGBA, crop, stretch); bmRGBAToSampleBuffer(fBmRGBA); // !!!!! Change is here for extra hard sync-check: // WriteOneFrame( // VideoStart + TimeStamp, // Duration); // Write the decoded video stream in exactly the same way as AddFrame would. // I.e. with the same timestamps, not taking any timestamps from the // video-input WriteOneFrame( fWriteStart, fSampleDuration); if not VT.NextValidSampleToBitmap(bm, TimeStamp, Duration) then Break; end; // FrameCount*FrameTime > Video-end? (shouldn't differ by much) // if fWriteStart > VideoStart + TimeStamp + Duration then // Freeze((fWriteStart - VideoStart - TimeStamp - Duration) div 10000); finally bm.Free; end; finally VT.Free; end; end; Then transcode a movie on the Demo-Tab "Use TBitmapEncoderWMF as a transcoder". It uses the procedure TranscodeVideoFile, treating the video- and audiostream of an input-video as totally independent inputs. AddVideo decodes the video-stream into a stream of bitmaps, and the input-video is used again as audiofile. I encoded 40 minutes of "Fellowship of the Ring" this way, and did not see any desynching. You'll probably say that's no proof, and you'd be right, but it might be an indication that the problem isn't as severe. Or the video player is just very good at making something usable out of the input.
-
I don't think that's quite true, if it fails the rest of WriteOneFrame isn't executed and in Line 1713 an exception is raised with errorcode hr. I could translate it into an EAudioFormatException, though, at the spot you indicate. It was meant as an extra safety check, since the code already checks for EndOfStream, and that hasn't failed so far. But I've put it back in.
-
presentation time = image time + effect time (2000).
-
Thanks everybody. Now I have a lot to think about, a great chance to expand my horizon at the age of 74:). I'll fix the code. But then I need a bit of time to think. The info is great. Because my poor debugger didn't run the code, because I didn't tell it to do so. I pasted that compatibility code in without checking, probably missed another piece. Mistake I won't do again. So I need to disable LogicalCompare for more compiler versions, or write a header for StrCmpLogicalW.
-
If I understand this right, I should match video-timestamp and duration to the closest blockalign-boundary of audio? If the difference in frame rate is really that negligable that should be doable, if I can get the math right :). Talk about not contributing, you just forced a new way of seeing things down my throat, not a small achievement.
-
Hi Anders, Thanks for that. I hate the format-strings, because I can never remember the code for the place-holders. I had already thought before, that I should get used to them, though. Now I also see, that I forgot to use IntToHex(hr,8) 🙂
-
Hi Kas, Good to see you again, and sorry for the long time of inactivity on my part. Thank you for the detailed input, which I need to digest first. Since you already invested so much thought, wouldn't you like to be a contributor? When I incorporate the changes you mention, I wouldn't even know how to list you as contributor. The issues you mention definitely need to be looked into. For the audio-part I was just glad it worked, and haven't put much thought into it lately. The wrong audio-duration was returned by some .vobs, which aren't really supported in the first place. The missing SafeRelease(pAudioSample) has caused memory leaks for me in a totally different context too, when I tried to write some code which simply plays an audio file through the default-device. Renate
-
I've just uploaded an update to my project https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation What it does: Contains a VCL-class which encodes a series of bitmaps and video-clips together with an audio-file to video. The result is an .mp4-file with H264 or H265 compression together with AAC-audio. It uses windows mediafoundation, which is usually contained in windows. Hardware-encoding is supported, if your graphics-card can do it. Requires: Headers for mediafoundation from FactoryXCode: https://github.com/FactoryXCode/MfPack Windows 10 or higher Encoder (MF-Transform) for H264/H265, usually come with the graphics-driver Delphi XE7 or higher, if I haven't messed it up again, I've only got the CE and Delphi2006 (Win32 and Win64 should be working, but Win64 recently crashes for me with "The session was disconnected".) The demo-project shows some uses: Record a series of canvas-drawings to video Make a slideshow from image-files (.bmp,.jpg,.png,.gif) with music (.wav, .mp3, .wmv, ...) and 2 kinds of transitions Insert a videoclip into a slideshow (anything that windows can decode should work) Transcode a video-file including the first audio-stream. Improvements: I think I now better understand how to feed frames to the encoder. With the right settings it makes stutter-free videos with good audio-video-synchronization. It's now usable for me in my "big" project, and I no longer need to rely on ffmpeg - dlls. More info in changes.txt. Just try it, if you're interested, I'd be glad. Renate
-
I have update the repo on GitHub https://github.com/rmesch/Parallel-Bitmap-Resampler Changes made to the "modern" VCL- and FMX-version in the folder BitmapScaling: New resampling filters: Mitchell, Robidoux, RobidouxSharp, RobidouxSoft. Simplified and corrected MakeGaussContributors in uScaleCommon.pas. @Anders Melander: It will pass the uniform color tests now. But it will fail the Gauss-RMS, since I changed to RadiusToSigma back. Tried to make Gamma-correction a little more precise. I tried nonetheless. You already spent so much time digging through that ancient attachment, give the repo a shot. I also added the option in DemoScale.dpr to use a test-bitmap similar to yours. I can't see any of the color-artefacts you describe, though.
-
Right. You want to add 1 frame of your animation at a time, but you use bme.addStillImage, which is meant for adding the same image for multiple frames. So it will only work (roughly) correctly if the ShowTime is much larger than the frame time of the movie. Try to use bme.AddFrame instead. That just won't work, it's a codec limitation. You have to use at least even numbers, for some codecs the sizes might even have to be multiples of 4. I would stick to multiples of 4 to be on the safe side. Another thing you might consider is to shorten the chain from animation to movie. To show the animation and make screenshots seems a bit roundabout to me, there must be a shorter way. There must be, but I haven't yet bothered to look at it 🙂, maybe I will.