

Renate Schaaf
Members-
Content Count
137 -
Joined
-
Last visited
-
Days Won
6
Everything posted by Renate Schaaf
-
I've just uploaded an update to my project https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation What it does: Contains a VCL-class which encodes a series of bitmaps and video-clips together with an audio-file to video. The result is an .mp4-file with H264 or H265 compression together with AAC-audio. It uses windows mediafoundation, which is usually contained in windows. Hardware-encoding is supported, if your graphics-card can do it. Requires: Headers for mediafoundation from FactoryXCode: https://github.com/FactoryXCode/MfPack Windows 10 or higher Encoder (MF-Transform) for H264/H265, usually come with the graphics-driver Delphi XE7 or higher, if I haven't messed it up again, I've only got the CE and Delphi2006 (Win32 and Win64 should be working, but Win64 recently crashes for me with "The session was disconnected".) The demo-project shows some uses: Record a series of canvas-drawings to video Make a slideshow from image-files (.bmp,.jpg,.png,.gif) with music (.wav, .mp3, .wmv, ...) and 2 kinds of transitions Insert a videoclip into a slideshow (anything that windows can decode should work) Transcode a video-file including the first audio-stream. Improvements: I think I now better understand how to feed frames to the encoder. With the right settings it makes stutter-free videos with good audio-video-synchronization. It's now usable for me in my "big" project, and I no longer need to rely on ffmpeg - dlls. More info in changes.txt. Just try it, if you're interested, I'd be glad. Renate
-
There is a new version at https://github.com/rmesch/Bitmaps2Video-for-Media-Foundation. New stuff: Some rewrite of audio, making sure that gaps at the beginning of a stream are filled with silence. 2 optimized frame-rates for audio-synching, see below. Most importantly: One can now run @Kas Ob.'s frame analysis from within the demo, if one enables the hidden tab "Analysis". I just made the lines a bit shorter, as the rest was just repeating the same values for all I tested, as far as I could see. The file ffprobe.exe needs to be in the same directory as DemoWMF.exe. ffprobe is part of ffmpeg-git-essentials.7z on https://www.gyan.dev/ffmpeg/builds/. I spent a good amount of time trying to figure out what I can and what I cannot control about audio-synching, tracing into the relevant code and running the analysis. Results of audio-rethynching follow (beware, it's long): The math is for audio-sample-rate of 48000 and the time units are all s. Audio-blockalign is always 4 Bytes for what I do. There are at least 2 different meanings of "sample": PCMSample: as in samples per second. ByteSize: Channels*BitsPerSample/8 = 2*16/8 = 4 Bytes. Time: 1/48000 s IMFSample: Chunk of audio returned by IMFSourceReader.ReadSample. It contains a buffer holding a certain amount of uncompressed PCMsamples, and info like timestamp, duration, flags ... The size of these samples varies a lot with the type of input. Some observed values: .mp3-file 1: Buffersize = 96 768 Bytes Duration = 0.504 (96768 bytes = 96768/4 PCMSamples = 96768/4/48000 s OK) .mp3-file 2: Buffersize = 35 108 Bytes Duration = 0.1828532 (35108/4/48000 = 0.182854166.. not OK) .wmv-file: Buffersize = 17 832 Bytes Duration = 0.092875 (17832/4/48000 = 0.092875 OK) Except for the first sample read, the values don't differ from sample to sample. Those are the samples I can write to the sinkwriter for encoding. Breaking them up seems like a bad idea. I have to trust MF to handle the writing correctly. The buffers seem to always be block-aligned. I've added some redundant variables in TBitmapEncoderWMF.WriteAudio so these values can be examined in the debugger. A related quantity are audio-frames. Similarly to the video-stream the audio-stream of a compressed video consists of audio-frames. 1 audio-frame contains the compressed equivalent of 1024 PCMSamples. So: AudioFrameDuration = 1024/48000 AudioFrameRate = 48000/1024 I can only control the writing of the video by feeding the IMFSamples of video and audio to the sinkwriter in good order. The samples I write to the sinkwriter are collected in a "Leaky-Bucket"-buffer. The encoder pulls out what it needs to write the next chunk of video. It hopefully waits until there are enough samples to write something meaningful. Problems arise if the bucket overflows. There need to be enough video- and audio-samples to correctly write both streams. So here is the workflow, roughly (can be checked by stepping into TBitmapEncoderWMF.WriteOneFrame): Check if the audio-time written so far is less than the timestamp of the next video-frame. Yes: Pull audio-samples out of the sourcereader and write them to the sinkwriter until audio-time >= video-timestamp. Looking at the durations above, one sample might already achieve this. Write the next video-frame Repeat In the case of mp3-file 1 the reading and writing of 1 audio-sample would be followed by the writing of several video-samples. The encoder now breaks the bucket-buffer up into frames, compresses them and writes them to file. It does that following its own rules, which I have no control over. Frame-analysis can show the result: A group of video-frames is followed by a group of audio-frames, which should cover the same time-interval as the video-frames. In the output I have seen so far, the audio-frame-period is always 15 audio-frames. For video-framerate 30, the video-frame-period is 9 or 10 frames. Why doesn't it make the audio- and video-periods smaller? No idea. Guess that's the amount of info the players can handle nowadays, and these periods are a compromise between optimal phase-locking of audio- video- periods and the buffer-size the player can handle. Theoretically, at framerate 30, 16 video-frames should phase-lock with 25 audio-frames. Here is one of those video-audio-groups. Video-framerate is 30. video stream_index=0 key_frame=0 pts=39000 pts_time=1.300000 duration_time=0.033333 video stream_index=0 key_frame=0 pts=40000 pts_time=1.333333 duration_time=0.033333 video stream_index=0 key_frame=0 pts=41000 pts_time=1.366667 duration_time=0.033333 video stream_index=0 key_frame=0 pts=42000 pts_time=1.400000 duration_time=0.033333 video stream_index=0 key_frame=0 pts=43000 pts_time=1.433333 duration_time=0.033333 video stream_index=0 key_frame=0 pts=44000 pts_time=1.466667 duration_time=0.033333 video stream_index=0 key_frame=0 pts=45000 pts_time=1.500000 duration_time=0.033333 video stream_index=0 key_frame=0 pts=46000 pts_time=1.533333 duration_time=0.033333 video stream_index=0 key_frame=0 pts=47000 pts_time=1.566667 duration_time=0.033333 video stream_index=0 key_frame=0 pts=48000 pts_time=1.600000 duration_time=0.033333 audio stream_index=1 key_frame=1 pts=62992 pts_time=1.312333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=64016 pts_time=1.333667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=65040 pts_time=1.355000 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=66064 pts_time=1.376333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=67088 pts_time=1.397667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=68112 pts_time=1.419000 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=69136 pts_time=1.440333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=70160 pts_time=1.461667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=71184 pts_time=1.483000 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=72208 pts_time=1.504333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=73232 pts_time=1.525667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=74256 pts_time=1.547000 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=75280 pts_time=1.568333 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=76304 pts_time=1.589667 duration_time=0.021333 audio stream_index=1 key_frame=1 pts=77328 pts_time=1.611000 duration_time=0.021333 pts stands for "presentation time stamp" and pts_time is of interest. Video-time-intervall: from 1.300000 to 1.600000+0.033333=1.633333 Audio-time-intervall: from 1.312333 to 1.611000+0.021333=1.632333 Audio is a bit ahead at the beginning and a tiny bit behind at the end. pts should be multiples of 1024, but they aren't hmm. The difference is still 1024, but they are phase-shifted. Phase-shift is 62992 mod 1024 = 528 (or -496). The interval from a bit further ahead: Video: From 8.066667 to 8.366667+0.033333=8.400000 Audio: From 8.053667 to 8.352333+0.021333=8.373666 pts-phase-shift: still 528 (-496) Audio is lagging behind. To really see what is happening I will have to implement better statistics than just looking at things 🙂 One further test: I tried to phase-lock audio and video optimally: VideoFrameRate: f. AudioFrameRate: 48000/1024, so f = 48000/1024 = 46,875. I've added this frame-rate to the demo. Result: Perfect sync for the first audio-video group. In the middle of the second group the pts-phase-shift is again 528, and audio lags behind. For the rest of the groups the lag doesn't get bigger, it is always corrected to some degree. But the file should have identical audio and video timestamps in the first place! There is another new frame-rate, which is the result of trying to phase-lock 2 video-frames to 3 audio-frames. 2/f = 3*1024/4800 results in f = 2*48000/3/1024 = 31.25 I will try to find out what causes the phase-shift in audio by parsing the ffprobe-output a bit more (sigh). Maybe generate a log-file for the samples written, too. (Sigh). No, so far it's still fun. For those, who made it up to here: Thanks for your patience. Renate
-
First of all, thanks. I'll be back as soon as I understand better what the analysis is showing, and then I might be able to do the "little code changes" you mention :). Just keep in mind, that the sinkwriter isn't giving you any control over dts, pts., or how video and audio are interleaved. Renate
-
Hi, Anders, CreateFmt uses internally constructor Exception.CreateFmt(const Msg: string; const Args: array of const); begin FMessage := Format(Msg, Args); end; and help says that this version of Format isn't threadsafe, since it uses the locale for the decimal separator. Now I'm not using decimal-separators here, and I guess once the exception is raised in a thread thread-safety doesn't really matter anymore? Another thing: Is %x.8 doing the same as IntToHex(hr,8)? Renate
-
With a little change you can perform that test from within the demo, I think. Just put a little change into TBitmapEncodeWMF.AddVideo: procedure TBitmapEncoderWMF.AddVideo( const VideoFile: string; TransitionTime: integer = 0; crop: boolean = false; stretch: boolean = false); var VT: TVideoTransformer; bm: TBitmap; TimeStamp, Duration, VideoStart: int64; begin if not fInitialized then exit; VT := TVideoTransformer.Create( VideoFile, fVideoHeight, fFrameRate); try bm := TBitmap.Create; try if not VT.NextValidSampleToBitmap(bm, TimeStamp, Duration) then exit; if TransitionTime > 0 then CrossFadeTo( bm, TransitionTime, crop, stretch); VideoStart := fWriteStart; // fill gap at beginning of video stream if TimeStamp > 0 then AddStillImage( bm, Trunc(TimeStamp / 10000), crop, stretch); while (not VT.EndOfFile) and fInitialized do begin BitmapToRGBA( bm, fBmRGBA, crop, stretch); bmRGBAToSampleBuffer(fBmRGBA); // !!!!! Change is here for extra hard sync-check: // WriteOneFrame( // VideoStart + TimeStamp, // Duration); // Write the decoded video stream in exactly the same way as AddFrame would. // I.e. with the same timestamps, not taking any timestamps from the // video-input WriteOneFrame( fWriteStart, fSampleDuration); if not VT.NextValidSampleToBitmap(bm, TimeStamp, Duration) then Break; end; // FrameCount*FrameTime > Video-end? (shouldn't differ by much) // if fWriteStart > VideoStart + TimeStamp + Duration then // Freeze((fWriteStart - VideoStart - TimeStamp - Duration) div 10000); finally bm.Free; end; finally VT.Free; end; end; Then transcode a movie on the Demo-Tab "Use TBitmapEncoderWMF as a transcoder". It uses the procedure TranscodeVideoFile, treating the video- and audiostream of an input-video as totally independent inputs. AddVideo decodes the video-stream into a stream of bitmaps, and the input-video is used again as audiofile. I encoded 40 minutes of "Fellowship of the Ring" this way, and did not see any desynching. You'll probably say that's no proof, and you'd be right, but it might be an indication that the problem isn't as severe. Or the video player is just very good at making something usable out of the input.
-
I don't think that's quite true, if it fails the rest of WriteOneFrame isn't executed and in Line 1713 an exception is raised with errorcode hr. I could translate it into an EAudioFormatException, though, at the spot you indicate. It was meant as an extra safety check, since the code already checks for EndOfStream, and that hasn't failed so far. But I've put it back in.
-
presentation time = image time + effect time (2000).
-
Thanks everybody. Now I have a lot to think about, a great chance to expand my horizon at the age of 74:). I'll fix the code. But then I need a bit of time to think. The info is great. Because my poor debugger didn't run the code, because I didn't tell it to do so. I pasted that compatibility code in without checking, probably missed another piece. Mistake I won't do again. So I need to disable LogicalCompare for more compiler versions, or write a header for StrCmpLogicalW.
-
If I understand this right, I should match video-timestamp and duration to the closest blockalign-boundary of audio? If the difference in frame rate is really that negligable that should be doable, if I can get the math right :). Talk about not contributing, you just forced a new way of seeing things down my throat, not a small achievement.
-
Hi Anders, Thanks for that. I hate the format-strings, because I can never remember the code for the place-holders. I had already thought before, that I should get used to them, though. Now I also see, that I forgot to use IntToHex(hr,8) 🙂
-
Hi Kas, Good to see you again, and sorry for the long time of inactivity on my part. Thank you for the detailed input, which I need to digest first. Since you already invested so much thought, wouldn't you like to be a contributor? When I incorporate the changes you mention, I wouldn't even know how to list you as contributor. The issues you mention definitely need to be looked into. For the audio-part I was just glad it worked, and haven't put much thought into it lately. The wrong audio-duration was returned by some .vobs, which aren't really supported in the first place. The missing SafeRelease(pAudioSample) has caused memory leaks for me in a totally different context too, when I tried to write some code which simply plays an audio file through the default-device. Renate
-
I have made my parallel resampling of bitmaps now as fast as I can get it. Now I would find it interesting to know, how the algorithm performs on other systems, and it would be super to get suggestions for improvement. The procedures can be found in the unit uScale under Algorithms in the attached zip. I have tested against Windows StretchBlt-Half_Tone, WICImage and Graphics32. On my PC (AMD Ryzen 7 1700, 8-core) I see a substantial improvement in speed. The threads are based on TThread rather than TTask or TParallel, because I had failures using the latter two, whereas the oldfashioned threads haven't failed me ever in gazillions of runs. If you want to compile and run the comparison project, you need to include the source folder of Graphics32 (latest version) in your search path. For convenience it is included in the zip under Algorithms. I couldn't find a way to divvy out only the units needed for resampling. The test against Graphics32 might be slightly unfair, because of bitmaps being assigned to TBitmap32 before the processing. Right now, the procedure itself cannot be run in concurrent threads, because the array of TThreads is a global variable, I need to change the design (ideas welcome). There might be still room for improvement by minimizing the cache misses, but my head can't handle it at the moment. Hope you find it useful. Renate Bitmap Scaling.zip
-
I have worked on a port of my Bitmaps2Video-encoder to using Windows Media Foundation instead of ffmpeg, since I wanted to get rid of having to use all those dll's. Now, before posting it on GitHub, I'd like to run it by the community because of my limited testing possibilies. I also hope that there are people out there having more experience with MF and could give some suggestions on the problems remaining (see below). Learning how to use media foundation certainly almost drove me nuts several times, because of the poor quality of the documentation and the lack of examples. What is does: Encodes a series of bitmaps to video with the user interface only requiring basic knowledge about videos. Can do 2 kinds of transitions between bitmaps as an example of how to add more. Supports file formats .mp4 with encoders H264 or H265, or .wmv with encoder WMV3. Does hardware encoding, if your GPU supports it, falls back to software encoding otherwise. Uses parallel routines wherever that makes sense. Experimental routine to mux in mp3-audio. Only works for H264 and WMV3 right now. Requirements: VCL-based. Needs the excellent MF headers available at https://github.com/FactoryXCode/MfPack. Add the src-folder of MFPack to the library path, no need to install a package. Needs to run on Windows10 or higher to make use of all features. Not sure about Delphi-version required, guess XE3 and up is required for sure. Problems remaining: I'm not too thrilled about the encoding quality. Might be a problem with my nVidia-card. The audio-muxer should work for H265, because it works when I use ffmpeg. But with my present routine the result just plays the audio and shows no video. I haven't yet figured out how to insert video clips. Major problem I see is adjusting the frame rate. Renate Bitmaps2VideoWMF.zip
-
This is my video-project on GitHub: https://github.com/rmesch/Bitmaps2Video I am presenting it here, because it is useful as it is, but could use some ideas for improvement. Features: A Delphi-class to support encoding of a series of bitmaps and video clips to a video file Requires the ffmpeg-library and is intended as an easy to use interface to this library Versions for Win32/Win64 and a cross-platform version currently supporting Win32/Win64/Android32/Android64 Most popular file formats and codecs supported, more codecs contained in FFMpeg can be registered Rudimentary support for adding an audio-stream Demos for both versions, set up to compile and run "out of the box", as library files and support for their deployment to Android are included There are some problem areas though, the most important one in my opinion being threading issues with TBitmap under Android. For more see the readme and the demos. Critique, ideas, bug reports most welcome, maybe someone would even like to contribute, that would be delightful. There have been valuable contributions so far, but there are some areas which could use the input of an expert. Thanks for reading, Renate
-
I have update the repo on GitHub https://github.com/rmesch/Parallel-Bitmap-Resampler Changes made to the "modern" VCL- and FMX-version in the folder BitmapScaling: New resampling filters: Mitchell, Robidoux, RobidouxSharp, RobidouxSoft. Simplified and corrected MakeGaussContributors in uScaleCommon.pas. @Anders Melander: It will pass the uniform color tests now. But it will fail the Gauss-RMS, since I changed to RadiusToSigma back. Tried to make Gamma-correction a little more precise. I tried nonetheless. You already spent so much time digging through that ancient attachment, give the repo a shot. I also added the option in DemoScale.dpr to use a test-bitmap similar to yours. I can't see any of the color-artefacts you describe, though.
-
Right. You want to add 1 frame of your animation at a time, but you use bme.addStillImage, which is meant for adding the same image for multiple frames. So it will only work (roughly) correctly if the ShowTime is much larger than the frame time of the movie. Try to use bme.AddFrame instead. That just won't work, it's a codec limitation. You have to use at least even numbers, for some codecs the sizes might even have to be multiples of 4. I would stick to multiples of 4 to be on the safe side. Another thing you might consider is to shorten the chain from animation to movie. To show the animation and make screenshots seems a bit roundabout to me, there must be a shorter way. There must be, but I haven't yet bothered to look at it 🙂, maybe I will.
-
OK, Maple computed the following simplified filters, to implement them was just a matter of extending the TFilter-Enum. I'll update my repo some time tomorrow, the new filters need to be implemented in the demos. Right now I feel more like surviving a few more days on The Long Dark. // The following filters are based on the Mitchell-Netravali filters with // restricting the parameters B and C to the "good" line B + 2*C = 1. // We have eliminated B this way and scaled the filter to [-1,1]. // See https://en.wikipedia.org/wiki/Mitchell%E2%80%93Netravali_filters const C_M = 1 / 3; // Mitchell filter used by ImageMagick function Mitchell(x: double): double; inline; begin x := abs(x); if x < 0.5 then Result := (8 + 32 * C_M) * x * x * x - (8 + 24 * C_M) * x * x + 4 / 3 + 4 / 3 * C_M else if x < 1 then Result := -(8 / 3 + 32 / 3 * C_M) * x * x * x + (8 + 24 * C_M) * x * x - (8 + 16 * C_M) * x + 8 / 3 + 8 / 3 * C_M else Result := 0; end; const C_R = 0.3109; // Robidoux filter function Robidoux(x: double): double; inline; begin x := abs(x); if x < 0.5 then Result := (8 + 32 * C_R) * x * x * x - (8 + 24 * C_R) * x * x + 4 / 3 + 4 / 3 * C_R else if x < 1 then Result := -(8 / 3 + 32 / 3 * C_R) * x * x * x + (8 + 24 * C_R) * x * x - (8 + 16 * C_R) * x + 8 / 3 + 8 / 3 * C_R else Result := 0; end; .... and so on. Just one function with different constants.
-
I know! Just missed the B and C-values for the Robidoux in the table-image you post. And then I just have to rescale the functions to have support in [-1,1], make sure it's integral is 1. Bang. It plugs right in. Wish I could edit my original post and delete the attachment, it's ancient now, and include a link to my GitHub-repo. The AntiNLanczos is a spline to approximate the antiderivative of Lanczos, all that stuff isn't needed anymore.
-
They are in System.Math: function Min(const A, B: Integer): Integer; overload; inline; I had coded it with ifs in a previous version, but I changed that after I noticed the inlining, looks a bit less stupid. Oh, W is the weight you compute, and param is the x of the kernel. So you *did* post the kernel code, I was just too dense to see it. I think Maple and me can take it from there. I tried to find something in their source code, but gave up. Looks like you had a bit more stamina :).
-
What's the function these parameters need to be plugged into? All I can gather is that it might be some kind of cubic spline, and I don't feel like reading all of this guy's papers :). Would you mind posting the formula for the kernel?
-
MadExcept can check for frozen main thread, with or without debugger. Just surprised nobody mentioned it.
-
It doesn't only depend to the pitch, but also on the pixel-format of the source. If that is BGR or BGRA, the following pseudo-code based on what you post should be usable. If the color-order is different, like RGB or RGBA, you need to copy the single color-bytes. Best done by defining a record for the pixel. // Pointers to Pixels in Source/Target to be copied var pPixS, pPixT: pByte; // Bytes per Pixel for the Source-Texture // would be 3 for BGR, 4 for BGRA // if the channels have a different order, like RGB, // then the single color bytes need to be copied. var BytesPerPixel: integer; // Common Width and Height of Source and Target var Width, Height: integer; for I := 0 to Height - 1 do begin pPixS := @FTexture.pData[FTexture.RowPitch * I]; pPixT := FBitmap.Scanline[I]; for j := 0 to Width - 1 do begin Move(pPixS^, pPixT^, BytesPerPixel); inc(pPixS, BytesPerPixel); inc(pPixT, 4); end; end;
-
It's probably not. Your blur is slightly slower than mine for small radii. For large radii, yours is much faster. I'll mail you my benchmark-unit, then you can see for yourself.
-
Thanks very much for the input, I hadn't looked at those filters more closely before, should be easy to test them out. Thank you! Here is a first result for radius 10. It only passed the Gauss-RMS-test after I changed the sigma-to-radius-ratio to the same as yours. Need to give that a closer look. For other radii my routine failed some of the uniform color tests, (and edge detection as a consequence,) so it's back to the drawing board for that.
-
I managed to get it, source or not. For the same amount of "blurriness" my parallel version needs about 1.5 times the time of yours. Source would still be nice, I'm sure we'd learn something. Renate