dummzeuch 1506 Posted July 12, 2022 (edited) I have got to handle huge files which store floating point numbers between 30 and 200 with a precision of 3 decimal places. Currently the file format uses the floating point type Single for these numbers. Each requiring 4 bytes. Is there any way to reduce that memory requirement without sacrificing (or even improving) precision? My first idea was to: subtract 30, giving me numbers between 0 and 170 squash that into 2 bytes using the largest possible 10 based divisor Unfortunately that gives me only two decimal places with a divisor of 100 because 17000 < 65535 < 170000 Alternatively I could use a divisor of 200 which would fit into 2 bytes as 34000 but I would still lose some precision. I could use 3 bytes rather than 2 which would give me the required precision 170000 < 16777215 but would be awkward to handle. I could even reduce that to 18 bits with even more awkward handling involved. Any alternative ideas? Background: These files store the data of macro texture laser measurements in road condition surveys. There are 2x200 values for every 25 cm surveyed, giving me 2x200x4 values per metre plus some additional info for each dataset. While that isn't much when compared to the videos, this data will frequently be transferred via mobile internet (which in Germany means bl***y slower than in the average 3rd world country), while videos won't. I see two options here: Make the file format use less space without sacrificing precision Always compress and uncompress the files with some standard algorithm I'm not quite sure which way to go yet. Edited July 12, 2022 by dummzeuch Share this post Link to post
Fritzew 51 Posted July 12, 2022 42 minutes ago, dummzeuch said: Always compress and uncompress the files with some standard algorithm I would go this way.... You can add this in low time, the other way needs more time I would think Share this post Link to post
Uwe Raabe 2059 Posted July 12, 2022 If the values for each block don't use the full value range, you can store a base value and only the difference for the others. In other words: Instead of always subtract 30, find the min value of that block (Vmin), store that followed by the other values with the min value subtracted (V - VMin). Of course that would only help if Vmax - Vmin <= 65535. 2 Share this post Link to post
Vandrovnik 214 Posted July 12, 2022 These numbers are randomly distributed? When not, it might be more space efficient to store differences (after converting them to integers). If differences are small, you could use variable-length storing (7 bit data, 1 bit as indicator, that another byte is used). And finally a compression. 2 Share this post Link to post
Lajos Juhász 293 Posted July 12, 2022 You can compress the file before sending. Should give you the best result. 2 Share this post Link to post
Brian Evans 108 Posted July 13, 2022 (edited) You can compress floating values in a small range into an integer evenly. AnInteger = N - RangeStart * (IntegerRange / (RangeEnd - RangeStart)) N = (AnInteger / (IntegerRange / (RangeEnd - RangeStart))) + RangeStart For two byte unsigned integers this only gives ~344 divisions between the whole numbers in your 30-220 range so not quite 3 decimal places. Edited July 13, 2022 by Brian Evans Share this post Link to post
A.M. Hoornweg 144 Posted July 13, 2022 If these are measurements of a road, then do you really need 3 fractional decimals at all? I mean, a single car driving over it would probably change the fractional part already. Are the readings a 2-dimensional map of deformation/wear/potholes of the road, like an image? If so, then a lossy data reduction algorithm similar to JPG would shrink the data enormously whilst keeping the essence intact. Share this post Link to post
dummzeuch 1506 Posted July 13, 2022 (edited) 21 minutes ago, A.M. Hoornweg said: If these are measurements of a road, then do you really need 3 fractional decimals at all? I mean, a single car driving over it would probably change the fractional part already. These values are used for measuring the "macro texture". In the end they are combined into two numbers, the Mean Profile Depth (MPD) and based on that the Estimated Texture Depth (ETD). There is an ISO standard for the measurement and the (rather complex) calculation. Those numbers describe a part of what constitutes skid resistance of a road. And that's about structures that are nearly too small to see. The way to measure it requires 200 values measured in a line, no more than 1 mm and no less than 0.5 mm apart. For the calculation I need the full resolution of the laser, so any lossy compression is out. (One could argue that the algorithm loses most of that resolution anyway and that there is already a lot of noise in the data, but I have to adhere to the ISO standard.) The linked Wikipedia article describes quite a lot of what we do in road condition survey, just in case anybody is curious. We survey all autobahns and federal roads in Germany every 4 years, about 1/4 each year. On top of that there are the state roads and lesser roads, that should be surveyed once every 5 years. There are only a hand full of companies that offer this service. We are the market leader in Germany, but we are also active in Switzerland, Austria and France. Edited July 13, 2022 by dummzeuch Share this post Link to post
A.M. Hoornweg 144 Posted July 13, 2022 Cool ! One learns something new every day Share this post Link to post
Fr0sT.Brutal 900 Posted July 13, 2022 (edited) I'd first try to cope with it with little efforts and check compression. Depending on values distribution, that could give you the largest gain - or almost nothing if entropy is too high. Next, you can prepare the order of numbers. F.ex., cols-then-rows could be more compressable than rows-then-cols. After these simple and quick checks, you can dive into rabbit hole of low-level packing. Btw, there's nothing frightening in packing the data into non-modulo-8 bit chunks - this format is widely used in radio transmission. It's just the question of adding a reader and writer and Delphi already has TBits in RTL (haven't used it though). If you decide to go this way, I can borrow TBitReader class that I use in my projects Edited July 13, 2022 by Fr0sT.Brutal Share this post Link to post
mikerabat 20 Posted July 13, 2022 (edited) You could do something like CBOR encoding does. Basically there they encode values based on the NEEDED space. If integer then encode it in an int8 - to int16. You can also check if the encoding would fit into a 16bit float (half a single) In addition it seems that a differential approach for each block could be feasable ... does it change much?? And of course at the end just compress the stream 😉 Edited July 13, 2022 by mikerabat Share this post Link to post
Fr0sT.Brutal 900 Posted July 13, 2022 12 minutes ago, mikerabat said: And of course at the end just compress the stream 😉 As my friend used to say, 7zip(rar(zip(something))) 🙂 alas, things don't work in such a way, otherwise data of any size could be compressed into a single byte after N iterations 😄 Share this post Link to post
A.M. Hoornweg 144 Posted July 14, 2022 (edited) My recommendation: treat your data as integers, using an implit divisor multiplier of 1000. Your highest value 200000 is hexadecimal $30D40 and as you can see by the number of digits, that will fit in 5 nibbles ( two and a half bytes). Now all you need is a class that will correctly manage a tArray<byte> to read and write individual float values. Edited July 14, 2022 by A.M. Hoornweg Share this post Link to post