Reduce storage space for floating point range

dummzeuch · July 12, 2022

I have got to handle huge files which store floating point numbers between 30 and 200 with a precision of 3 decimal places.

Currently the file format uses the floating point type Single for these numbers. Each requiring 4 bytes.

Is there any way to reduce that memory requirement without sacrificing (or even improving) precision?

My first idea was to:

subtract 30, giving me numbers between 0 and 170
squash that into 2 bytes using the largest possible 10 based divisor

Unfortunately that gives me only two decimal places with a divisor of 100 because 17000 < 65535 < 170000

Alternatively I could use a divisor of 200 which would fit into 2 bytes as 34000 but I would still lose some precision.

I could use 3 bytes rather than 2 which would give me the required precision 170000 < 16777215 but would be awkward to handle.

I could even reduce that to 18 bits with even more awkward handling involved.

Any alternative ideas?

Background:

These files store the data of macro texture laser measurements in road condition surveys. There are 2x200 values for every 25 cm surveyed, giving me 2x200x4 values per metre plus some additional info for each dataset. While that isn't much when compared to the videos, this data will frequently be transferred via mobile internet (which in Germany means bl***y slower than in the average 3rd world country), while videos won't.

I see two options here:

Make the file format use less space without sacrificing precision
Always compress and uncompress the files with some standard algorithm

I'm not quite sure which way to go yet.

Edited July 12, 2022 by dummzeuch

Fritzew · July 12, 2022

42 minutes ago, dummzeuch said:

Always compress and uncompress the files with some standard algorithm

I would go this way.... You can add this in low time, the other way needs more time I would think

Uwe Raabe · July 12, 2022

If the values for each block don't use the full value range, you can store a base value and only the difference for the others. In other words: Instead of always subtract 30, find the min value of that block (Vmin), store that followed by the other values with the min value subtracted (V - VMin). Of course that would only help if Vmax - Vmin <= 65535.

Vandrovnik · July 12, 2022

These numbers are randomly distributed? When not, it might be more space efficient to store differences (after converting them to integers). If differences are small, you could use variable-length storing (7 bit data, 1 bit as indicator, that another byte is used).

And finally a compression.

Lajos Juhász · July 12, 2022

You can compress the file before sending. Should give you the best result.

Brian Evans · July 13, 2022

You can compress floating values in a small range into an integer evenly.

AnInteger = N - RangeStart * (IntegerRange / (RangeEnd - RangeStart))

N = (AnInteger / (IntegerRange / (RangeEnd - RangeStart))) + RangeStart

For two byte unsigned integers this only gives ~344 divisions between the whole numbers in your 30-220 range so not quite 3 decimal places.

Edited July 13, 2022 by Brian Evans

A.M. Hoornweg · July 13, 2022

If these are measurements of a road, then do you really need 3 fractional decimals at all? I mean, a single car driving over it would probably change the fractional part already.

Are the readings a 2-dimensional map of deformation/wear/potholes of the road, like an image? If so, then a lossy data reduction algorithm similar to JPG would shrink the data enormously whilst keeping the essence intact.

dummzeuch · July 13, 2022

21 minutes ago, A.M. Hoornweg said:

If these are measurements of a road, then do you really need 3 fractional decimals at all? I mean, a single car driving over it would probably change the fractional part already.

These values are used for measuring the "macro texture". In the end they are combined into two numbers, the Mean Profile Depth (MPD) and based on that the Estimated Texture Depth (ETD). There is an ISO standard for the measurement and the (rather complex) calculation. Those numbers describe a part of what constitutes skid resistance of a road. And that's about structures that are nearly too small to see.

The way to measure it requires 200 values measured in a line, no more than 1 mm and no less than 0.5 mm apart. For the calculation I need the full resolution of the laser, so any lossy compression is out.

(One could argue that the algorithm loses most of that resolution anyway and that there is already a lot of noise in the data, but I have to adhere to the ISO standard.)

The linked Wikipedia article describes quite a lot of what we do in road condition survey, just in case anybody is curious. We survey all autobahns and federal roads in Germany every 4 years, about 1/4 each year. On top of that there are the state roads and lesser roads, that should be surveyed once every 5 years. There are only a hand full of companies that offer this service. We are the market leader in Germany, but we are also active in Switzerland, Austria and France.

Edited July 13, 2022 by dummzeuch

A.M. Hoornweg · July 13, 2022

Cool ! One learns something new every day

Fr0sT.Brutal · July 13, 2022

I'd first try to cope with it with little efforts and check compression. Depending on values distribution, that could give you the largest gain - or almost nothing if entropy is too high. Next, you can prepare the order of numbers. F.ex., cols-then-rows could be more compressable than rows-then-cols. After these simple and quick checks, you can dive into rabbit hole of low-level packing. Btw, there's nothing frightening in packing the data into non-modulo-8 bit chunks - this format is widely used in radio transmission. It's just the question of adding a reader and writer and Delphi already has TBits in RTL (haven't used it though). If you decide to go this way, I can borrow TBitReader class that I use in my projects

Edited July 13, 2022 by Fr0sT.Brutal

mikerabat · July 13, 2022

You could do something like CBOR encoding does. Basically there they encode values based on the NEEDED space.

If integer then encode it in an int8 - to int16. You can also check if the encoding would fit into a 16bit float (half a single)

In addition it seems that a differential approach for each block could be feasable ... does it change much??

And of course at the end just compress the stream 😉

Edited July 13, 2022 by mikerabat

Fr0sT.Brutal · July 13, 2022

12 minutes ago, mikerabat said:

And of course at the end just compress the stream 😉

As my friend used to say, 7zip(rar(zip(something))) 🙂 alas, things don't work in such a way, otherwise data of any size could be compressed into a single byte after N iterations 😄

A.M. Hoornweg · July 14, 2022

My recommendation: treat your data as integers, using an implit ~~divisor~~ multiplier of 1000.

Your highest value 200000 is hexadecimal $30D40 and as you can see by the number of digits, that will fit in 5 nibbles ( two and a half bytes).

Now all you need is a class that will correctly manage a tArray<byte> to read and write individual float values.

Edited July 14, 2022 by A.M. Hoornweg

Sign In

Reduce storage space for floating point range

Recommended Posts

dummzeuch 1656

Share this post

Link to post

Fritzew 51

Share this post

Link to post

Uwe Raabe 2159

Share this post

Link to post

Vandrovnik 223

Share this post

Link to post

Lajos Juhász 322

Share this post

Link to post

Brian Evans 124

Share this post

Link to post

A.M. Hoornweg 153

Share this post

Link to post

dummzeuch 1656

Share this post

Link to post

A.M. Hoornweg 153

Share this post

Link to post

Fr0sT.Brutal 902

Share this post

Link to post

mikerabat 27

Share this post

Link to post

Fr0sT.Brutal 902

Share this post

Link to post

A.M. Hoornweg 153

Share this post

Link to post

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity