Jump to content
robertjohns

Find, Replace and Save

Recommended Posts

my procedure works like expected!

tested with file +5GBytes!!!  RAD11.2 ISO file and tested if the same-file with profetional tool to compare it!

the HASH file returns the equal values between "my copy" and ISO Embarcadero!

 

On MSWindows 10 64bit, you need to pay attention to the buffer size, and not use too large a value.

Well, a buffer larger than 16MBytes didn't show efficiency during the file copy, due to the way that MSWindows performs the cache during the data copy! Most likely, because MSWindows uses much more advanced techniques than my "simple code" presented here!

 

After all, MSWindows is made by engineers, and I'm not one!

Share this post


Link to post
59 minutes ago, programmerdelphi2k said:

my procedure works like expected!

tested with file +5GBytes!!!  RAD11.2 ISO file and tested if the same-file with profetional tool to compare it!

the HASH file returns the equal values between "my copy" and ISO Embarcadero!

 

On MSWindows 10 64bit, you need to pay attention to the buffer size, and not use too large a value.

Well, a buffer larger than 16MBytes didn't show efficiency during the file copy, due to the way that MSWindows performs the cache during the data copy! Most likely, because MSWindows uses much more advanced techniques than my "simple code" presented here!

 

After all, MSWindows is made by engineers, and I'm not one!

The Procedure works well on 2GB file but If the File size is above 2GB it gives error, Out of memory while expanding memory stream.

I am using MSWindows 10 64bit and Delphi 10.2

Share this post


Link to post

@robertjohns

really, I dont know what to say for you... 

My PC have 16GBytes RAM, CPU i7 4770K Intel, HDD 1TBytes, Antitivirus online, RAD 11.2 Embarcadero, Chrome (hungry for mem)...

The unique real problem is "the time"... because the tech is not the most indicated for this task!... (just compare with MSWindows copy-file... it's very so fast than my code)

Another problem is: my code it's not "multi-tasks / multi-processor" then, of course, it's so slow for 5GBytes... but it's really does not matter for me, because I really dont use it for nothing here! it was just for show you something...

Edited by programmerdelphi2k

Share this post


Link to post

I am really making you panic even I am trying on Intel(R) Core(TM) i7-7700HQ 16GB RAM and 2TB of SSD and using Original Windows 10 64bit . I am really surprised if it works on more 5GB of File why on my PC gives Out of memory while expanding memory stream. on 2.5GB of file.

Is there any way to make Change of TMemoryStream to TFileStream in your procedure ? because this procedure is working as my expections only the problem is bigger file size

Share this post


Link to post
23 hours ago, robertjohns said:

It is 32 bit Platform and the file size is 3GB

You should have given us this info in your first post since it is very relevant to the way this needs to be approached. If the whole file does not fit into available process memory you have to process it in chunks, and this opens a whole new can of worms (only a chicken would be happy about that :classic_cool:). You have to deal with source byte sequences exceeding the end of a chunk, for example. And do you really want to do the replacement directly on the original file, potentially risking to corrupt it if something goes wrong?

Anyway, I don't have time today to whip up any coding example, so be patient.

 

P.S. : No PM unless explicitely requested! This forum is also intended as an information source for future problems from other people.

  • Thanks 1

Share this post


Link to post

here my screenshots to prove that my code works:

  1. the copies was done in the same HDD:  NOW USING EMBARCADERO ISO RAD 11.2 = +6GBytes
    1. MSWindows copy-file (by Explorer), time: +/- 4/5mins 
    2. My (slow) procedure using TFileStream, time: +/- 10mins  or more (I forgot see the clock at begin  🙂  (let's say: 15minutes for dont be wrong at all)
      1. CPU usage: < 10% at 3.8GHertz with all MSWindows apps on memory: antivirus, chrome, RAD 11.2, etc...
      2. MEM usage: < 4GB in sometime < yet

 

 

Sem título.png

Edited by programmerdelphi2k

Share this post


Link to post

try use the this types:  change "integer" by "int64"

function MyMinValue(const A, B: int64): int64;
begin
  result := A;
  //
  if (A > B) then
    result := B;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  MyFileStreamSource: TFileStream;
  MyFileStreamTarget: TFileStream;
  MyFileSourceSize  : int64;
  MyBuffer          : TBytes; // array of byte;
  MyBufferSize      : int64;
  MyBlockSize       : int64;  // 4096 (4K... or others)
begin

 

Share this post


Link to post
52 minutes ago, PeterBelow said:

do you really want to do the replacement directly on the original file, potentially risking to corrupt it if something goes wrong?
 

No not directly replacement of the original file but to create the copy of it.

Share this post


Link to post
21 hours ago, PeterBelow said:

You should have given us this info in your first post since it is very relevant to the way this needs to be approached. If the whole file does not fit into available process memory you have to process it in chunks, and this opens a whole new can of worms (only a chicken would be happy about that :classic_cool:). You have to deal with source byte sequences exceeding the end of a chunk, for example. And do you really want to do the replacement directly on the original file, potentially risking to corrupt it if something goes wrong?

Anyway, I don't have time today to whip up any coding example, so be patient.

 

P.S. : No PM unless explicitely requested! This forum is also intended as an information source for future problems from other people.

I am still waiting for your help to solve the big file issue please

Share this post


Link to post
2 hours ago, programmerdelphi2k said:

@robertjohns

 

post or ZIP your unit, your "full-code" (complete, total) used to do it! then, it will be possible see where is the error in your code, else ...

Already provided working procedure by @PeterBelow , it works like charm as per my expectations only the problem is it gives error if file size is more than 2GB . Out of memory while expanding memory stream

 

procedure ReplaceBytesInFile(aStream: TMemoryStream; const
    aSearchBytes, aReplaceBytes: TBytes);
var
  LNumBytes: Integer;
  LPos, LEnd: PByte;
begin
  LNumBytes := Length(aSearchBytes);
  Assert(LNumBytes = Length(aReplaceBytes), 'Arrays have to be of the same length!');
  if LNumBytes = 0 then
    Exit;

  LPos := aStream.Memory;
  LEnd := LPos;
  Inc(LEnd, aStream.Size - LNumBytes );
  while LPos < LEnd do begin
    if LPos^ = aSearchBytes[0] then begin
      if CompareMem(LPos, @aSearchBytes[0], LNumBytes) then begin
        CopyMemory(LPos, @aReplaceBytes[0], LNumBytes);
        // or
        // Move( aReplaceBytes[0], LPos^, LNumBytes );
        Inc(LPos, LNumBytes);
      end
      else
        Inc(LPos);
    end
    else
      Inc(LPos);
  end; {while}
end;

type
  TDataRec = record SearchFor, ReplaceBy: TBytes end;

const
  Data: array [0..1] of TDataRec = (
    (
     SearchFor: [$74, $00, $6f, $00, $7, $00, $73];
     ReplaceBy: [$68, $00, $6f, $00, $7, $00, $73]),
    (
     SearchFor: [$76, $00, $61, $00, $6F, $00, $70, $00, $5F, $00, $00];
     ReplaceBy: [$00, $00, $61, $00, $6F, $00, $70, $00, $5F, $00, $00])
     // add more data as needed, adjust array upper bound accordingly
    );
                   
procedure TForm2.Test;
var
  I: Integer;
  theFile: string;
  LBuffer: TMemoryStream;
begin
  theFile := 'pathname here';
  LBuffer := TMemoryStream.Create();
  try
    LBuffer.LoadFromFile(theFile);
    for I := Low(Data) to High(Data) do
      ReplaceBytesInFile(LBuffer,  Data[I].SearchFor, Data[I].ReplaceBy);

    LBuffer.SaveToFile(theFile);
  finally
    LBuffer.Free;
  end;
end;

 

Share this post


Link to post

look, if you mix 2 codes distincts you can procedure a "distincts resulted"!

using this procedure with TMemoryStream you are "catching" all file content on memory!!!

using my procedure with TFileStream, the "buffer" catch just "n" bytes by time! in case, I show to you using 4096 by time in a +6GBytes files!!!

 

what is the difference between it?

-- +2GB on memory or 4K on memory?

Share this post


Link to post

second, use a procedure to "just changes the bytes necessary on buffer read"

-- here you can try a problem:  -- if the bytes dont found in "current buffer data", BUT the real value can be in next bytes ...  you see?

 

buffer "X" read: 1, 2, 3, 4, 5

I need search "5, 6"

then, I need read next buffer "X":  6, 7, 8, 9....

 

then I need know where is the "beggin", and where is the "end"  -- you see?

--- how to solve it?

------ just, try find the "beggin" value (byte), if found, then, try find the rest!

try see my procedure 

 

if you catch whole file content, you solve this problem, BUT you have your error "out of memory!

Edited by programmerdelphi2k

Share this post


Link to post

- Use base TStream

- Read chunk by chunk, replace, write it back

- To handle the case of search pattern partially sitting at the end of chunk - the simplest solution:

- - Move piece of data with length = Length(SearchPattern)-1 from the end of the chunk to the beginning of buffer

- - Read from stream to buffer starting right after that moved piece

Of course search pattern must have length less than the buffer has

Share this post


Link to post

I use more simple way: No needs any "juggling"...  like: it will be that I'm on beggin or end of buffer? no, no, no! no needs it!

  • just do the sequencial read on file, and compare "read" with your "pattern"...  <-- here it's the magic
    • the "pattern" can have any size (if < file-size), of course, if "pattern>bufferRead" then jump it, else, it's ok just compare if "A in B"!
    • to compare, you can just compare each element on array... if ar=br then...
  • to write (in a new file, for avoid any problem in source-file), you can do it as soon as possible, or in "blocks" to avoid many writes on disk
Edited by programmerdelphi2k

Share this post


Link to post
On 11/27/2022 at 3:47 PM, robertjohns said:

It is 32 bit Platform and the file size is 3GB

OK; i'm done with the example code. Find a small test project attached as a zip file. It contains a unit named PB.ByteSearchAndReplaceU that implements a class TByteSearchAndReplace that encapsulates all the code for this task. It compiles and runs but I have not checked that it does do the actual replacements properly, due to the lack of test data. I leave that to you, good luck. :classic_cool:

ByteS&R.zip

Share this post


Link to post

hi @PeterBelow

 

I think that is not necessar so much code for this task!

you need just:

  • open the file and scan "sequencially" the buffer (with your size desired)
    • the sequecial reads you can do with: here I'm going to use "FileOpen(...)" directly (MSWindows) to avoid using any unnecessary stream class over here!
      • FileOpen(...), FileSeek(...), on end... FileClose(...) with this functions, your task (read) it's done!
      • Once you've found your pattern by comparing each buffer read, then you can update your buffer and save it in another file for this purpose!
        • the "compare" procedure you can use any one that desire, or create yourself;
          • if ByteX[ i ] = ByteY [ i ] then ...
    • to avoid problems with original file, just write the changes in another file!!!
  • compare this buffer with your pattern
  • assign new values to buffer found
  • write new buffer content in your file temp for example
  • close fileS opened and created! 

I have my code with a little bit lines:

  • +/-16 lines to sequencial reads, does not matter size file!
  • 21 + 17 lines to compare bytes: divided in 2 procedures
Edited by programmerdelphi2k

Share this post


Link to post
17 hours ago, programmerdelphi2k said:

hi @PeterBelow

 

I think that is not necessar so much code for this task!

you need just:

 

This approach only works if you have one single search/replace pair.

Share this post


Link to post
2 hours ago, PeterBelow said:

This approach only works if you have one single search/replace pair.

no, no, no!

I can do any number of searches, no matter what size pattern is used!
If there are 1000 occurrences within the file, I can find them all and replace them!
It doesn't matter the size of the "pattern" I'm looking for! The only restriction is: the pattern size must be smaller than the file size!

Share this post


Link to post
21 minutes ago, programmerdelphi2k said:

no, no, no!

I can do any number of searches, no matter what size pattern is used!
If there are 1000 occurrences within the file, I can find them all and replace them!
It doesn't matter the size of the "pattern" I'm looking for! The only restriction is: the pattern size must be smaller than the file size!

If you do multiple passes over the file you can end up replacing bytes in a part you replaced in a previous pass, which is usually not what you want.

Share this post


Link to post
2 hours ago, programmerdelphi2k said:

no, no, no!

I can do any number of searches, no matter what size pattern is used!
If there are 1000 occurrences within the file, I can find them all and replace them!
It doesn't matter the size of the "pattern" I'm looking for! The only restriction is: the pattern size must be smaller than the file size!

Thanks, any plan to provide any working example with multiple search and replacement with faster method

Share this post


Link to post
4 hours ago, PeterBelow said:

If you do multiple passes over the file you can end up replacing bytes in a part you replaced in a previous pass, which is usually not what you want.

again! I can say to you this:

  • my code find "ANY NUMBER OF OCCORRENCES" (expressed in BYTES), and I can do the CHANGES for ANY PATTERN (in BYTES), since that:
  1. BYTES FOUND IS SAME SIZE THE PATTERN SIZE
  2. PATTERN SIZE MINOR THAN FILE SIZE!

DOES NOT MATTER THE FILE SIZE! 

  • JUST 1 PASS!!!
  • I dont load it on memory!!!
    • just the bytes (buffer) necessary to read it! this buffer can have any size!
  • my "changes" is done in the SAME FILE!!! -- of course, for avoid any "problems", do the backup before!
    • why I dont do it: because the file can have 1Byte or 1TERABytes... then, because that I dont do backup on code!
      • but you CAN DO IT!

 

image.thumb.png.9986b72de44bcd36e52f885b79f0ecf6.png   image.thumb.png.d1cd40a7231522a0a4742661ef934081.png

Edited by programmerdelphi2k

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×