Steve Maughan 26 Posted September 28, 2020 My application saves data as a compressed JSON file. I use the standard System.ZLib routines ZCompressStream and ZDecompressStream to handle the compression. I thought these were just a wrapper around standard ZLib routines. Earlier today someone sent me an application file that couldn't be opened. It turns out it couldn't be decompressed. It could have been corrupted after being saved. The decompression routine went into some sort of endless loop when it attempts to open the file. My question is: does anyone know of any problems compressing data using ZCompressStream? I'm using Delphi DX 10.4.1. Are there any other third party ZLib compatible compression and decompression routines? Steve Share this post Link to post
aehimself 396 Posted September 28, 2020 If you don't need anything fancy, you can use System.ZIP (or the updated System.ZIP2, which is a drop-in replacement offering some extras). I'm using Zip2 with smaller modifications, works like a charm. 1 Share this post Link to post
David Heffernan 2345 Posted September 28, 2020 Just now, aehimself said: If you don't need anything fancy, you can use System.ZIP (or the updated System.ZIP2, which is a drop-in replacement offering some extras). I'm using Zip2 with smaller modifications, works like a charm. How did you diagnose that the defect was in ZCompressStream or ZCompressStream? Share this post Link to post
David Heffernan 2345 Posted September 28, 2020 (edited) 2 hours ago, Steve Maughan said: I thought these were just a wrapper around standard ZLib routines They are. 2 hours ago, Steve Maughan said: It could have been corrupted after being saved. That's the a very plausible explanation. File corruption is something that does happen. You'll want to reproduce the issue before trying to solve the problem. And if it is file corruption then the solution is somebody else's problem. Edited September 28, 2020 by David Heffernan 1 Share this post Link to post
aehimself 396 Posted September 28, 2020 2 minutes ago, David Heffernan said: How did you diagnose that the defect was in ZCompressStream or ZCompressStream? Ummm... I did not? I just offered a free to use alternative. Share this post Link to post
David Heffernan 2345 Posted September 28, 2020 1 minute ago, aehimself said: Ummm... I did not? I just offered a free to use alternative. It's pretty bad advice. Changing algorithm and implementation without any justification or rationale. Seems like you are advocating trying libraries at random. If every time you encounter an issue you replace the lirbsry, after a while you'll have run out of libraries. Share this post Link to post
aehimself 396 Posted September 28, 2020 Just now, David Heffernan said: It's pretty bad advice. Changing algorithm and implementation without any justification or rationale. Seems like you are advocating trying libraries at random. If every time you encounter an issue you replace the lirbsry, after a while you'll have run out of libraries. Well, it's free and not necessarily change in the algorithm (only if you force the compression to the added LZMA). There is no justification or rationale, just experience: as I mentioned I'm using this for long years now without any major issues. This can be a coincidence, I agree; this is why I did not say it's error free and guaranteed to bring world peace. Maybe it will work for OP too, maybe not. The decision is his to make to try it, I just offered a potentional candidate I happen to know about. Change the libraries until I run out? I fixed a memory leak, added an unaccepted pull request and learned to live with it's limitations - there was no need to look for an other one. But let's not hijack the topic. Share this post Link to post
Stano 143 Posted September 29, 2020 This was not advice, but the exact answer to the questions asked. Share this post Link to post
Fr0sT.Brutal 900 Posted September 29, 2020 12 hours ago, aehimself said: If you don't need anything fancy, you can use System.ZIP (or the updated System.ZIP2, which is a drop-in replacement offering some extras). I'm using Zip2 with smaller modifications, works like a charm. But it uses built-in ZLib anyway. Share this post Link to post
Steve Maughan 26 Posted September 29, 2020 17 hours ago, aehimself said: If you don't need anything fancy, you can use System.ZIP (or the updated System.ZIP2, which is a drop-in replacement offering some extras). I'm using Zip2 with smaller modifications, works like a charm. Thanks — I wasn't aware of System.ZIP2. I'll take a look - Steve Share this post Link to post
Steve Maughan 26 Posted September 29, 2020 17 hours ago, David Heffernan said: They are. That's the a very plausible explanation. File corruption is something that does happen. You'll want to reproduce the issue before trying to solve the problem. And if it is file corruption then the solution is somebody else's problem. Thanks David — all good advice. I was really looking to see if there were a list of known bugs with the ZCompressStream. It seems file corruption post saving is probably the most likely explanation. Share this post Link to post
Attila Kovacs 629 Posted September 29, 2020 (edited) It is no coincidence that checksums were invented. Edited September 29, 2020 by Attila Kovacs Share this post Link to post
Steve Maughan 26 Posted September 29, 2020 1 minute ago, Attila Kovacs said: It is no coincidence that checksums were invented. That was a question I had — why wasn't there a checksum error when it tried to decompress? Share this post Link to post
A.M. Hoornweg 144 Posted September 29, 2020 1 hour ago, Steve Maughan said: That was a question I had — why wasn't there a checksum error when it tried to decompress? I don't know if Zlib writes a checksum or not, maybe you should append it yourself at the end of the compressed stream. Then first verify that checksum before trying to decompress it. Share this post Link to post
Anders Melander 1783 Posted September 29, 2020 Didn't anyone Google "zlib checksum"? According to the ZLib specification there is a checksum: https://tools.ietf.org/html/rfc1950 Quote ADLER32 (Adler-32 checksum) This contains a checksum value of the uncompressed data (excluding any dictionary data) computed according to Adler-32 algorithm. This algorithm is a 32-bit extension and improvement of the Fletcher algorithm, used in the ITU-T X.224 / ISO 8073 standard. See references [4] and [5] in Chapter 3, below) Adler-32 is composed of two sums accumulated per byte: s1 is the sum of all bytes, s2 is the sum of all s1 values. Both sums are done modulo 65521. s1 is initialized to 1, s2 to zero. The Adler-32 checksum is stored as s2*65536 + s1 in most- significant-byte first (network) order. Quote A compliant compressor must produce streams with correct CMF, FLG and ADLER32, but need not support preset dictionaries. [...] A compliant decompressor must check CMF, FLG, and ADLER32, and provide an error indication if any of these have incorrect values. Share this post Link to post
David Heffernan 2345 Posted September 29, 2020 3 hours ago, Steve Maughan said: Thanks — I wasn't aware of System.ZIP2. I'll take a look I can't understand why you would. Aren't you likely just to end up changing your code for no reason, given that the defect is almost certainly not in your compression library? Share this post Link to post
A.M. Hoornweg 144 Posted September 30, 2020 15 hours ago, Anders Melander said: Didn't anyone Google "zlib checksum"? According to the ZLib specification there is a checksum: https://tools.ietf.org/html/rfc1950 "contains a checksum value of the uncompressed data" This isn't foolproof because the decompressor can only verify that checksum after decompression. What if the data is corrupted in such a way that it causes the decompressor to crash (I quote OP: "endless loop") during decompression? One should also append a record containing the size and checksum of the compressed data at the end of the stream. That way one can check the integrity of the file before decompressing the data. Share this post Link to post
Anders Melander 1783 Posted September 30, 2020 44 minutes ago, A.M. Hoornweg said: This isn't foolproof because the decompressor can only verify that checksum after decompression. Yes, you're right and it makes sense. The integrity of the data within a container is the responsibility of the container so the ZLib decompressor can assume that it's being given valid data and only uses its own checksum to verify that it is producing the correct output. Share this post Link to post
Vandrovnik 214 Posted September 30, 2020 38 minutes ago, Anders Melander said: ... the ZLib decompressor can assume that it's being given valid data and only uses its own checksum to verify that it is producing the correct output. I think decompressor should somehow handle wrong data too (raise an exception, for example). I have used "ZDecompressStream" and on wrong data, it stays in endless loop (tested on Android 32 bit). I have not tested whether it happens with all wrong data, or if I just was lucky and tested it with something special... Share this post Link to post
Anders Melander 1783 Posted September 30, 2020 2 minutes ago, Vandrovnik said: I think decompressor should somehow handle wrong data too Yes, if it's documented as doing so. It's perfectly acceptable to have an implementation that requires the input data to be valid. For example if you have already verified that the input data is valid elsewhere then you'd probably want the decompressor to not be slowed down with validating the data once more. At some point you have to assume that the input you're given is valid. Share this post Link to post
A.M. Hoornweg 144 Posted September 30, 2020 1 hour ago, Anders Melander said: Yes, you're right and it makes sense. The integrity of the data within a container is the responsibility of the container so the ZLib decompressor can assume that it's being given valid data and only uses its own checksum to verify that it is producing the correct output. If that were true, a non-matching checksum would mean a broken algorithm and not necessarily broken data. It would tell us exactly nothing. Share this post Link to post
Anders Melander 1783 Posted September 30, 2020 19 minutes ago, A.M. Hoornweg said: If that were true, a non-matching checksum would mean a broken algorithm and not necessarily broken data. It would tell us exactly nothing. I think the checksum is there to guard against a broken implementation. You can't validate the algorithm. Only the output it produces. I'm not really sure what it is you're disputing. Share this post Link to post
Attila Kovacs 629 Posted September 30, 2020 (edited) There are bunch of "endless loop" hits on zlib from 2015, the one in RTL has a date from 2014, maybe you should re-compress the original data and see if it happens again. That would also make others satisfied when you switch to another lib. Eventually try to decompress it with more recent zlib version. https://unix.stackexchange.com/questions/22834/how-to-uncompress-zlib-data-in-unix It would be cool to know the results. Edited September 30, 2020 by Attila Kovacs Share this post Link to post
A.M. Hoornweg 144 Posted September 30, 2020 7 minutes ago, Anders Melander said: I think the checksum is there to guard against a broken implementation. You can't validate the algorithm. Only the output it produces. I'm not really sure what it is you're disputing. If the data can only be verified after expansion and the expansion algorithm crashes, then we still don't know if the data or the implementation is broken. Share this post Link to post
Anders Melander 1783 Posted September 30, 2020 13 minutes ago, A.M. Hoornweg said: we still don't know if the data or the implementation is broken That is not the purpose of the checksum. The purpose is to guarantee that the output is correct. Share this post Link to post