Jump to content
Sign in to follow this  
Sherlock

How to quickly hash growing files

Recommended Posts

I was tasked with finding a way to know if files have been altered. The possible alterations can only be caused by either system failure or a semi malicious person as the computer is not connected to a network and the gain for said person is...minimal at best. In both cases the altered files would have to be recreated instead of reused which would slow down a different process a bit. In short these are cache files. And of course the best way to know if a file has been altered without permission is a hash, since Windows permissions are sketchy at best. One of these files grows by roughly 2600 Bytes every 5 seconds, and therefore it would be necessary to rehash after each such interval. The system would be generating up to 20 of these files "simultaneously" with each of these files being about 6 hours long on average and may peak at 24 hours. So in Bytes we are talking on average 11MB peaking to 44MB.

Here's the question: What would be the best way to do this without to much of an impact on performance. I could use Delphis built-in hashing system.Hash.TSHA1. It is pretty straight forward, but before I dive in on the shallow end of this pool and break my neck...perhaps one of you has a better suggestion.

 

Share this post


Link to post

Why not keep the file locked while writing into it? In that case you have to do the hashing only when the file is done.

  • Like 1

Share this post


Link to post

OK, but my application might crash and then the file might not have been hashed or even written. But I guess I could express the need to perform a "complete reload" after a crash. But it's not nice...

Share this post


Link to post

As Lajos said, if they are only cache and will not be needed after a restart or to be shared with different process, then put them in system temp folder and don't share writing, as for hashing 44Mb is nothing to worry about, but you can build a table to hash parts like every 64kb this way you will check that part when you read (need) it.

Share this post


Link to post

SHA1 is dead. I suggest using SHA2 or SHA3 even if it is "just" for a file checksum. If you want performance-optimized implementations I would suggest using mormot2.

 

uses
  mormot.core.buffers,
  mormot.crypt.secure;

...

  HashFile(myFileName, THashAlgo.hfSHA256);  

Fun fact: mormot2 SHA256 is faster than RTL SHA1.

  • Like 3
  • Thanks 1

Share this post


Link to post
59 minutes ago, Stefan Glienke said:

Fun fact: mormot2 SHA256 is faster than RTL SHA1.

Also SHA3, SHA512 are faster by at least %50 than SHA256.

Share this post


Link to post

For basic error / tamper detection a CRC would be easier and a lot faster since you can feed additional bytes into the calculation as the file grows. Can also keep a few length, CRC pairs around to re-check parts of the file as desired. The CRC value of the first 16MB could be used to either check the first 16MB of the file or to check from 16MB to another CRC at 20MB for example.

  • Like 1

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×