Jump to content
Sign in to follow this  
dummzeuch

Caching oddity

Recommended Posts

Posted (edited)

I have got two programs which execute the same code as the first step in some processing. It's about checking huge (like in 100 gigabytes) files for consistency.

 

These files are basically a container for large binary data (blob) of different sizes. Each blob has some small metadata (a few 100 bytes) stored just in front of it. The data is stored in chunks of n blobs each chunk is preceded by a descriptor which contains a pointer to the next descriptor and n pointers and size of blobs stored in this chunk. The structure is like this:

 

  • file header
  • chunkA
    • descriptorA
    • data1
      • metadata1
      • blob1
    • data2
      • metadata2
      • blob2
    • data3
      • metadata3
      • blob3
    • ...
    • dataN
      • metadataN
      • blobN
  • chunkB
    • descriptorB
    • dataN+1
      • metadataN+1
      • blobN+1
    • dataN+2
      • metadataN+2
      • blobN+2
    • dataN+3
      • metadataN+3
      • blobN+3
    • ...
    • data2*N
      • metadata2*N
      • blob2*N
  • ...

 

These files are created incrementally on a different computer by a different program. Every time a blob is added it gets appended to the last chunk until that chunk is full at which point a new chunk appended appended. Then these files are transferred to the computer that runs the programs I am talking about here. There they are available from a local hard drive (so there is no network access involved when reading them).

 

Both programs read the metadata from the blobs and check them for consistency. In order to do that they first read all the descriptors and put them into an array. Then they go through these descriptors and read the corresponding metadata for each blob. This means that they read small parts from all over the huge file, skipping the actual binary content in the process.

 

The first time this is done in Program1, it takes several minutes. When I repeat this the second time (even when exiting the program and starting it again) the full process takes only a few seconds.

 

So I thought that Windows 10 does some really efficient caching which is great.

 

But then I started Program2 which as the first step also does the same consistency check (same source code) on the same file. Program1 was still running but had already finished the processing. Program2 then took several minutes to complete that check. When I abort the check, close Program2 and then restart it again, the check moves to the entry it last processed within a second and then again takes much longer to process the rest. Once it's done, exiting it and starting it again the same effect happens as with Program1: The consistency check takes only seconds.

 

To summarize:

  1. Program1 takes several minutes for the check on the first run.
  2. Program1 takes only a few seconds for the check on subsequent runs.
  3. Program2 takes several minutes for the check on the first run even though Program1 had just done the same check on the same file and was still active.
  4. When I abort Program2 and restart it again, it only takes seconds to reach the position where I aborted the last run and then slows down again.
  5. Once Program2 has finished processing the file, running it again the processing only takes seconds.

 

Can anybody explain this phenomenon to me? If it's Windows caching I would have expected that Program2 would only take seconds because it would take advantage of the data in the file cache put there by Program1. But that doesn't happen.

 

Just in case it matters: Both programs are Delphi 2007 programs (so 32 bit) and they are running on Windows 10 64 bit. The checking code is exactly the same in both.

Edited by dummzeuch

Share this post


Link to post
1 hour ago, Lars Fosdal said:

How do you open the files with regards to sharing?

  st := TdzFile.Create(_Filename);
  try
    st.AccessMode := [faRead];
    st.ShareMode := [fsRead];
    st.CreateDisposition := fcOpenFailIfNotExists;
    st.Open;

Share this post


Link to post

Don't have an explanation, but will think it think about loud, (in writing)

 

1) OS like Windows will not leave the memory unused, so even if it is not allocated or reported as allocated paged or non-paged it will use all the pages for a thing or two.

2) Windows in fact does buffer files to unused memory, aka unreported it is known to its drivers only.

3) Windows will try do whatever to guarantee the security of the system in full and track these buffers and their location, security here comes first compared to speed.

4) So in theory, a process read a file in user A realm, OS will/might keep the data somewhere in memory but will not allow user B to even know that file does exist if it should not, this apply to the content of the file.

5) The problem you described in my opinion might be rooted from WOW64 and security descriptor for each process instead of being aligned by users and privileges.

6) I have seen something far from what you observed, in my case it is the access time between 32bit and 64bit applications, while was benchmarking and observing the fast response of cached data, ( i was reading dates and sizes in folders), there was something i couldn't figure out between two 32bit application on 32bit OS and the same two but 64bit on 64bit OS, and when one was 32bit with WOW64, but didn't give it much of thought as i assumed WOW64 had its toll, now i think it might was something different.

 

Can you repeat or simulate theses big reads with 64bits or test it on 32bit OS, also check if the security of these application are identical.

 

Share this post


Link to post

Have you used Task Manager to watch the cache counter in the Memory tab? It is a good place to start before perfmon.

 

While that disk might look local to you, it is likely on some storage array which could have some block caching. Consider how much RAM is on that server to cache with and how busy is that server when you are executing your programs.

 

Perfmon with some counters set around physical disk, % idle time, read queue length, reads per second and  is a good place to start to see when it is actually hitting the physical disk and if it is waiting on disk. Compare your reads per second against run 1 an run 2.  Also grab the cache counters and look to see if any of them spike on a second run, this will give you an OS cache look.

Share this post


Link to post
Posted (edited)
1 hour ago, SwiftExpat said:

Have you used Task Manager to watch the cache counter in the Memory tab? It is a good place to start before perfmon.

I looked a the performance tab of Task Manager and saw that during the fast access (presumably from cache) no disk access was shown "Active time" was 0). During the slow access (presumably not from cache), disk access was shown ("Active time" was high).

 

By "cache counter", do you mean the number shown at the "cached" label (it says 3.9 GB and doesn't really change much)? I can't see anything called "cache counter" there.

 

I also tried the resource monitor but it didn't show anything enlightening.

 

1 hour ago, SwiftExpat said:

While that disk might look local to you, it is likely on some storage array which could have some block caching. Consider how much RAM is on that server to cache with and how busy is that server when you are executing your programs.

The disk is local to me. I can see it in the computer sitting below my desk, it's attached via SATA to the on board controller, and it's a single hard disk (no SSD, no RAID). There is no Server involved at all.

 

I didn't try perfmon (somehow I forgot that it exists).

 

Edited by dummzeuch

Share this post


Link to post
55 minutes ago, Pat Foley said:

I suspect it might be OLE for files at work! In short leaving everything

on disk, reading only what's needed. How to share the "loaded" or

streamed file is a good question. 
https://en.wikipedia.org/wiki/COM_Structured_Storage

No COM or OLE in sight. This is plain Delphi code, using stream IO.

Share this post


Link to post
1 hour ago, Kas Ob. said:

5) The problem you described in my opinion might be rooted from WOW64 and security descriptor for each process instead of being aligned by users and privileges.

That sounds as if it might be a likely explanation.

 

1 hour ago, Kas Ob. said:

Can you repeat or simulate theses big reads with 64bits or test it on 32bit OS, also check if the security of these application are identical.

Unfortunately I haven't got a 32 bit OS available to me any more (the last one was Windows XP and went out about half a year ago) and I can't simply port this code to 64 bits either. Maybe I can recreate the effect with some simpler code though.

Share this post


Link to post
55 minutes ago, dummzeuch said:

I looked a the performance tab of Task Manager and saw that during the fast access (presumably from cache) no disk access was shown "Active time" was 0). During the slow access (presumably not from cache), disk access was shown ("Active time" was high). 

For me this confirms it is cached. I would not waste time in perfmon.

I would go with @Kas Ob. and his number 5.  I have never read into any of that material but it sounds like a logical boundary that the OS would not allow you to cross.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×