Jump to content
Steve Maughan

Any Known Issues with ZCompressStream?

Recommended Posts

My application saves data as a compressed JSON file. I use the standard System.ZLib routines ZCompressStream and ZDecompressStream to handle the compression. I thought these were just a wrapper around standard ZLib routines.

 

Earlier today someone sent me an application file that couldn't be opened. It turns out it couldn't be decompressed. It could have been corrupted after being saved. The decompression routine went into some sort of endless loop when it attempts to open the file.

 

My question is: does anyone know of any problems compressing data using ZCompressStream? I'm using Delphi DX 10.4.1.

 

Are there any other third party ZLib compatible compression and decompression routines?

 

Steve

Share this post


Link to post

If you don't need anything fancy, you can use System.ZIP (or the updated System.ZIP2, which is a drop-in replacement offering some extras). I'm using Zip2 with smaller modifications, works like a charm.

  • Like 1

Share this post


Link to post
Just now, aehimself said:

If you don't need anything fancy, you can use System.ZIP (or the updated System.ZIP2, which is a drop-in replacement offering some extras). I'm using Zip2 with smaller modifications, works like a charm.

How did you diagnose that the defect was in ZCompressStream or ZCompressStream? 

Share this post


Link to post
2 hours ago, Steve Maughan said:

I thought these were just a wrapper around standard ZLib routines

They are. 

 

2 hours ago, Steve Maughan said:

It could have been corrupted after being saved.

That's the a very plausible explanation. File corruption is something that does happen. 

 

You'll want to reproduce the issue before trying to solve the problem. And if it is file corruption then the solution is somebody else's problem.  

Edited by David Heffernan
  • Like 1

Share this post


Link to post
2 minutes ago, David Heffernan said:

How did you diagnose that the defect was in ZCompressStream or ZCompressStream? 

Ummm... I did not? I just offered a free to use alternative.

Share this post


Link to post
1 minute ago, aehimself said:

Ummm... I did not? I just offered a free to use alternative.

It's pretty bad advice. Changing algorithm and implementation without any justification or rationale. Seems like you are advocating trying libraries at random. If every time you encounter an issue you replace the lirbsry, after a while you'll have run out of libraries. 

Share this post


Link to post
Just now, David Heffernan said:

It's pretty bad advice. Changing algorithm and implementation without any justification or rationale. Seems like you are advocating trying libraries at random. If every time you encounter an issue you replace the lirbsry, after a while you'll have run out of libraries. 

Well, it's free and not necessarily change in the algorithm (only if you force the compression to the added LZMA). There is no justification or rationale, just experience:  as I mentioned I'm using this for long years now without any major issues. This can be a coincidence, I agree; this is why I did not say it's error free and guaranteed to bring world peace. Maybe it will work for OP too, maybe not. The decision is his to make to try it, I just offered a potentional candidate I happen to know about.

Change the libraries until I run out? I fixed a memory leak, added an unaccepted pull request and learned to live with it's limitations - there was no need to look for an other one.

 

But let's not hijack the topic.

Share this post


Link to post

This was not advice, but the exact answer to the questions asked.

Share this post


Link to post
12 hours ago, aehimself said:

If you don't need anything fancy, you can use System.ZIP (or the updated System.ZIP2, which is a drop-in replacement offering some extras). I'm using Zip2 with smaller modifications, works like a charm.

But it uses built-in ZLib anyway.

Share this post


Link to post
17 hours ago, aehimself said:

If you don't need anything fancy, you can use System.ZIP (or the updated System.ZIP2, which is a drop-in replacement offering some extras). I'm using Zip2 with smaller modifications, works like a charm.

Thanks — I wasn't aware of System.ZIP2. I'll take a look

 

- Steve

Share this post


Link to post
17 hours ago, David Heffernan said:

They are. 

 

That's the a very plausible explanation. File corruption is something that does happen. 

 

You'll want to reproduce the issue before trying to solve the problem. And if it is file corruption then the solution is somebody else's problem.  

Thanks David — all good advice. I was really looking to see if there were a list of known bugs with the ZCompressStream. It seems file corruption post saving is probably the most likely explanation.

Share this post


Link to post
1 minute ago, Attila Kovacs said:

It is no coincidence that checksums were invented.

That was a question I had — why wasn't there a checksum error when it tried to decompress?

Share this post


Link to post
1 hour ago, Steve Maughan said:

That was a question I had — why wasn't there a checksum error when it tried to decompress?

I don't know if Zlib writes a checksum or not, maybe you should append it yourself at the end of the compressed stream. Then first verify that checksum before trying to decompress it. 

Share this post


Link to post

Didn't anyone Google "zlib checksum"?

According to the ZLib specification there is a checksum: https://tools.ietf.org/html/rfc1950

Quote

ADLER32 (Adler-32 checksum)
This contains a checksum value of the uncompressed data
(excluding any dictionary data) computed according to Adler-32
algorithm. This algorithm is a 32-bit extension and improvement
of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073
standard. See references [4] and [5] in Chapter 3, below)

Adler-32 is composed of two sums accumulated per byte: s1 is
the sum of all bytes, s2 is the sum of all s1 values. Both sums
are done modulo 65521. s1 is initialized to 1, s2 to zero.  The
Adler-32 checksum is stored as s2*65536 + s1 in most-
significant-byte first (network) order.
Quote

A compliant compressor must produce streams with correct CMF, FLG
and ADLER32, but need not support preset dictionaries.  [...]

A compliant decompressor must check CMF, FLG, and ADLER32, and
provide an error indication if any of these have incorrect values.

 

Share this post


Link to post
3 hours ago, Steve Maughan said:

Thanks — I wasn't aware of System.ZIP2. I'll take a look

I can't understand why you would. Aren't you likely just to end up changing your code for no reason, given that the defect is almost certainly not in your compression library?

Share this post


Link to post
15 hours ago, Anders Melander said:

Didn't anyone Google "zlib checksum"?

According to the ZLib specification there is a checksum: https://tools.ietf.org/html/rfc1950

 

"contains a checksum value of the uncompressed data"

 

This isn't foolproof because the decompressor can only verify that checksum after decompression. What if the data is corrupted in such a way that it causes the decompressor to crash (I quote OP: "endless loop") during decompression?

 

One should also append a record containing the size and checksum of the compressed data at the end of the stream. That way one can check the integrity of the file before decompressing the data. 

Share this post


Link to post
44 minutes ago, A.M. Hoornweg said:

This isn't foolproof because the decompressor can only verify that checksum after decompression.

Yes, you're right and it makes sense. The integrity of the data within a container is the responsibility of the container so the ZLib decompressor can assume that it's being given valid data and only uses its own checksum to verify that it is producing the correct output.

Share this post


Link to post
38 minutes ago, Anders Melander said:

... the ZLib decompressor can assume that it's being given valid data and only uses its own checksum to verify that it is producing the correct output.

I think decompressor should somehow handle wrong data too (raise an exception, for example). I have used "ZDecompressStream" and on wrong data, it stays in endless loop (tested on Android 32 bit). I have not tested whether it happens with all wrong data, or if I just was lucky and tested it with something special...

Share this post


Link to post
2 minutes ago, Vandrovnik said:

I think decompressor should somehow handle wrong data too

Yes, if it's documented as doing so. It's perfectly acceptable to have an implementation that requires the input data to be valid. For example if you have already verified that the input data is valid elsewhere then you'd probably want the decompressor to not be slowed down with validating the data once more. At some point you have to assume that the input you're given is valid.

Share this post


Link to post
1 hour ago, Anders Melander said:

Yes, you're right and it makes sense. The integrity of the data within a container is the responsibility of the container so the ZLib decompressor can assume that it's being given valid data and only uses its own checksum to verify that it is producing the correct output.

If that were true, a non-matching checksum would mean a broken algorithm and not necessarily broken data. It would tell us exactly nothing.

Share this post


Link to post
19 minutes ago, A.M. Hoornweg said:

If that were true, a non-matching checksum would mean a broken algorithm and not necessarily broken data. It would tell us exactly nothing.

I think the checksum is there to guard against a broken implementation. You can't validate the algorithm. Only the output it produces.

 

I'm not really sure what it is you're disputing.

Share this post


Link to post

There are bunch of "endless loop" hits on zlib from 2015, the one in RTL has a date from 2014, maybe you should re-compress the original data and see if it happens again. That would also make others satisfied when you switch to another lib.

 

Eventually try to decompress it with more recent zlib version. https://unix.stackexchange.com/questions/22834/how-to-uncompress-zlib-data-in-unix

It would be cool to know the results.

Edited by Attila Kovacs

Share this post


Link to post
7 minutes ago, Anders Melander said:

I think the checksum is there to guard against a broken implementation. You can't validate the algorithm. Only the output it produces.

I'm not really sure what it is you're disputing.

If the data can only be verified after expansion and the expansion algorithm crashes, then we still don't know if the data or the implementation is broken.

Share this post


Link to post
13 minutes ago, A.M. Hoornweg said:

we still don't know if the data or the implementation is broken

That is not the purpose of the checksum. The purpose is to guarantee that the output is correct.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×