Jump to content
Jud

Integer overflow in tStringList.SaveToFile

Recommended Posts

Guest

try open the file, so!

 

 

image.thumb.png.05e831c11c0985fded1a7d2d4eebcc8c.png

 

...
 s.Create('A', 50);
  //
  // number := 20637552; // OK
  //
  number := 21000000; // NO anymore!
  //
  AssignFile(myFile, '.\myFileWithStrings.txt');
...

 

Edited by Guest

Share this post


Link to post

That is a work-around, or using streams with a block size that is small enough to not cause problems.

Share this post


Link to post
Guest
4 hours ago, Jud said:

hat is a work-around, or using streams with a block size that is small enough to not cause problems.

as you saw in my example, I used its own variable with only 50chars to write each line in the text file. This variable served as a "buffer", which, could receive other values, and, thus, the variable would always be comfortable to be reused.

Of course, you could use more than 50chars, always within the limits of your system (or application), which you must test which one is!

 

Look, the amount of memory on the computer is just one factor in this story!

 

Let's look at it from another angle!

 

You have a processor, which, theoretically, we could think that everything can! But not! He can't do everything! It is limited as well, like everything in life!

 

Returning to your case, of course, any language will be limited at some point!

 

In the Embarcadero help system, you will find the following: (for current edition, Sydney, and previous ones)

  • Strings: ~ 2 ^ 30 characters, Unicode characters, 8-bit (ANSI) characters, multiuser servers and multilanguage applications
  • UnicodeString is the default string type. => Strings = UnicodeString;
    • (in another part) The UnicodeString type is the default string type and represents a dynamically allocated Unicode string whose maximum length is limited only by available memory.
    • available memory = for your app or your system?

 

However, this does not mean that any system will allocate all of that memory address at once!

 

This should be verified in the space made available for use by your application, that is, where will this memory space be allocated?

 

It is not because you have 32GB of RAM that this will be used by your application! If that's true, then you could use it all for an app! And for the rest of the system, what would be left for other applications, services, etc ...

 

You must inform the Operating System how this will be done and why, through language guidelines or by appropriate means from the O.S.

 

Delphi, like other programming languages, has its own memory manager (for manageable types, like: strings, Interfaces, and others), just like the Operating System has its own memory manager.

 

You can also track the memory occupied or allocated by your application in "Task Manager" or another more suitable application.

 

You may find a more appropriate answer on the following Embarcadero pages, however, this can still be contested by other factors outside the language. Certainly, if you respect the limits and look for conservative alternatives, you will be closer to success. However, you can do your own thing and pay to see the results.

 

http://docwiki.embarcadero.com/RADStudio/Sydney/en/String_Types_(Delphi)

Increasing the Memory Address Space

Go Up to Managing Memory Index

This section describes how to extend the address space of the Memory Manager beyond 2 GB, on Win32.

Note: The default size of the user mode address space for a Win32 application is 2GB, but this can optionally be increased to 3GB on 32-bit Windows and 4GB on 64-bit Windows. The address space is always somewhat fragmented, so it is unlikely that a GetMem request for a single contiguous block much larger than 1GB will succeed - even with a 4GB address space.

To enable and use a larger address space

  1. Make sure the operating system supports a larger address space:
  2. Set the appropriate linker directive. The operating system must be informed through a flag in the executable file header that the application supports a user mode address space larger than 2GB, otherwise it will be provided with only 2GB. To set this flag, specify {$SetPEFlags IMAGE_FILE_LARGE_ADDRESS_AWARE} in the .dpr file of the application.
  3. Make sure that all libraries and third party components support the larger address space. With a 2GB address space the high bit of all pointers is always 0, so a larger address space may expose pointer arithmetic bugs that did not previously show any symptoms. Such bugs are typically caused when pointers are typecast to integers instead of cardinals when doing pointer arithmetic or comparisons.

Note: Memory allocated through the Memory Manager is guaranteed to be aligned to at least 8-byte boundaries. 16-byte alignment is useful when memory blocks will be manipulated using SSE instructions, but may increase the memory usage overhead. The guaranteed minimum block alignment for future allocations can be set with SetMinimumBlockAlignment.

http://docwiki.embarcadero.com/RADStudio/Sydney/en/Increasing_the_Memory_Address_Space

Edited by Guest

Share this post


Link to post

Yes, but it isn't as neat or easy as being able to use tStringList.SaveToFile for its intended purpose.  I think it is clear that there is a variable in there somewhere that is to hold the number of bytes and it is an integer instead of an int64.

Share this post


Link to post
Guest

try this my sample in your system:

image.thumb.png.2b1477fb6223b4be8ee8de3dac3a986e.png   image.thumb.png.db876062166262568cdb0e0e0f56658a.png

 

program prjTestando;

{$APPTYPE CONSOLE}
{$R *.res}

uses
  System.SysUtils,
  WinAPI.Windows;

var
  lText                 : string;
  lTextSize             : UInt64;
  lMinBlckAlig          : TMinimumBlockAlignment;
  SysInfo               : TSystemInfo;
  lMntSpcAddrrInProccess: UInt64;

begin
  ReportMemoryLeaksOnShutdown := true;
  //
  GetSystemInfo(SysInfo); // get info about System, like memory on "current process"
  //
  lMntSpcAddrrInProccess := (UInt64(SysInfo.lpMaximumApplicationAddress) - UInt64(SysInfo.lpMinimumApplicationAddress));
  //
  try
    try
      lText     := '';
      lTextSize := 0;
      //
      while true do
      begin
        lText     := lText + 'A';
        lTextSize := lTextSize + 1;
      end;
    except
      on E: Exception do
        Writeln(E.ClassName, ': ', E.Message);
    end;
  finally
    //
    (* To enable a 3GB address space in 32-bit editions of supported versions of Windows, run
      bcdedit /set {ID } increaseuserva 3072

      NOTE: See 4-Gigabyte Tuning: BCDEdit and Boot.ini, MSDN Documentation at:
      https://msdn.microsoft.com/en-us/library/bb613473
    *)
    //
    { Console = 32bits / MSWindows 64bits / none changes on MSWindows setting by BCEdit

      D:\RADRIOTests\yyyyyyyyyyyyyyyyy\Win32\Release>prjTestando.exe
      EOutOfMemory: Out of memory
      lTextSize = 735084527
      lMntSpcAddrrInProccess = 2147352575
    }
    //
    { Console = 64bits / MSWindows 64bits / none changes on MSWindows setting / normally Memory Leaks will be showed as "Pointer" AV

      D:\RADRIOTests\yyyyyyyyyyyyyyyyy\Win64\Release>prjTestando.exe
      EAccessViolation: Access violation at address 000000000040BB50 in module 'prjTestando.exe'. Write of address 000000000000000C
      lTextSize = 1073741814
      lMntSpcAddrrInProccess = 140737488224255

      Unexpected Memory Leak
      An unexpected memory leak has occurred. The unexpected small block leaks are:

      57 - 72 bytes: EAccessViolation x 1
      217 - 232 bytes: UnicodeString x 1
    }
    //
    Writeln('lTextSize = ' + lTextSize.ToString);
    Writeln('lMntSpcAddrrInProccess = ' + lMntSpcAddrrInProccess.ToString);
    ReadLn;
  end;

end.

hug

Edited by Guest

Share this post


Link to post
Guest
20 minutes ago, Jud said:

I think it is clear that there is a variable in there somewhere that is to hold the number of bytes and it is an integer instead of an int64.

// at 32bits, the error occurr at:
// System.Classes.pas, line 6815 =    SetString(Result, nil, Size); // Out Memory

Sets the contents and length of the given string.

In Delphi code, SetString sets the contents and length of the given string variable to the block of characters given by the Buffer and Length parameters.

  • For a short string variable, SetString sets the length indicator character (the character at S[0]) to the value given by Length and then, if the Buffer parameter is not nil, copies Length characters from Buffer into the string, starting at S[1]. For a short string variable, the Length parameter must be a value from 0 through 255.
  • -->> For a long string variable, SetString sets S to reference a newly allocated string of the given length. If the Buffer parameter is not nil, SetString then copies Length characters from Buffer into the string; otherwise, the content of the new string is left uninitialized. If there is not enough memory available to create the string, an EOutOfMemory exception is raised. Following a call to SetString, S is guaranteed to reference a unique string (a string with a reference count of one).
Edited by Guest

Share this post


Link to post
Guest

The "Strings" will be allocated on "Stack" or "Heap" address on in your app memory!

 

Quote

You will get to the point where you will read, in the Help, something like "Local variables (declared within procedures and functions) reside in an application's stack." and also Classes are reference types, so they are not copied on assignment, they are passed by reference, and they are allocated on the heap.

 

Share this post


Link to post
5 hours ago, Jud said:

Yes, but it isn't as neat or easy as being able to use tStringList.SaveToFile for its intended purpose.  I think it is clear that there is a variable in there somewhere that is to hold the number of bytes and it is an integer instead of an int64.

This has been answered above. It's the use of an intermediate string variable that holds the entire text which is then saved. And strings have 32 bit length. 

 

I'm not sure what @emailx45 is getting at but the problem you face is not due to a shortage of memory or address space. It's just that the string length is stored in a 32 variable. And a string is used by SaveToFile to save the text.

 

FWIW, it seems wasteful to me to make an in memory copy of GBs of text just to save it. I'd be looking for a writer based approach. Just not using the raw RTL classes because of their dire performance. 

Edited by David Heffernan

Share this post


Link to post
Guest
11 hours ago, David Heffernan said:

This has been answered above. ...

FWIW, it seems wasteful to me to make an in memory copy of GBs of text just to save it. I'd be looking for a writer based approach. Just not using the raw RTL classes because of their dire performance. 

thanks for help, help him.

my English is poor, and my way to express myself sometime dont help.

was this that I tryed say, when used the word "buffer", to re-work the assigment of string.

for last, my knowlodge is not so great than yours.

unfortunately, I begun later on programming and with little resources to learn the fundaments of computation.

my I try when it's possible.

 

main question :

Why keep 2GB of text on memory and duplicate on process... as sometimes is done for many peoples?

 

hug

Edited by Guest

Share this post


Link to post
11 minutes ago, emailx45 said:

Why keep 2GB of text on memory and duplicate on process... as sometimes is done for many peoples?

Generally it's wise to try not to do this. 

Share this post


Link to post
Guest

yes, for sure. but he keep trying. :classic_wacko:

Share this post


Link to post

Hi, only to examples, one file Zbrush software could have 3GB, with millions vertex and faces. I created Meshmolder software and have a same error when load in StringLists. Really, is very hard solve this.

Share this post


Link to post

Thank you.  yes, as far as I can tell, the total number of bytes has to be less than 2^31.  There are several places where they have not converted to 64-bit integers where they should.  Files are bigger now and memories are bigger now.

  • Like 1

Share this post


Link to post

Uwe Raabe write in

https://translate.google.com/translate?hl=en&sl=de&u=https://www.delphipraxis.net/186612-32-bit-tstringlist-textdatei-mit-30mio-zeilen-3.html&prev=search&pto=aue

 

The storage space for a string is limited to 2 GB even under 64 bit, which corresponds to approx. 2 ^ 30 characters. Since when loading, as well as when accessing .Text, the entire content is messed up as a string, a natural limit has been reached here. The StringList could probably hold even more data under 64 bit, but then it must be added line by line with Add. Everything that regards the content as a single string (Load, Save, Text, ???) must then not be used.

 

Look this link:

https://stackoverflow.com/questions/27007904/tstringlist-loadfromfile-exceptions-with-large-text-files

Edited by kybio

Share this post


Link to post

Is this 2GB limit a limitation in Windows (even 64-bit)??

Share this post


Link to post

But that is a limit in Delphi.  It could use 64-bit integers and not have that limit.  I think this is an oversight in 64-bid Delphi.  Dynamic arrays don't have that tiny 2GB limit.

 

Edited by Jud

Share this post


Link to post
Guest

in C# string: https://stackoverflow.com/questions/49541801/how-many-bytes-does-a-string-take-up-in-x64

Quote

As Hans Passant suggested, there is an extra field added at the end of string object which is 4 bytes (in x64, it might require another 4 bytes extra, for padding).

So in the end we have :

= 8 (sync) + 8 (type) + 4 (length) + 4(extra field) + 2 (null terminator) + 2 * length = 26 + 2 * length = ‭36.893.488.147.419.103.258‬ bytes = 36893,488147419106099 Pentabytes = 32768 Pebibytes

So Jon Skeet's blog post was right (how could it be wrong ?)

 

https://codeblog.jonskeet.uk/2011/04/05/of-memory-and-strings/

Quote

In particular, the questioner wanted to store hundreds of thousands – possibly millions – of strings in memory, and knowing (or assuming) that they all consisted of ASCII characters, he wanted to avoid the waste of space that comes from storing each character in a .NET string as a UTF-16 code unit.

...

 

https://docs.microsoft.com/en-us/office/vba/language/reference/user-interface-help/data-type-summary

 

What is the maximum Heap Size of 32 bit or 64-bit JVM in Windows and Linux?

https://javarevisited.blogspot.com/2013/04/what-is-maximum-heap-size-for-32-bit-64-JVM-Java-memory.html

 

read this post on SOF

https://stackoverflow.com/questions/312118/why-the-excess-memory-for-strings-in-delphi

 

read this article in "ThoughtCo" by Zarko Gajic

https://www.thoughtco.com/understanding-memory-allocation-in-delphi-1058464

 

Try if you get it!

Edited by Guest

Share this post


Link to post
1 hour ago, Jud said:

But that is a limit in Delphi.  It could use 64-bit integers and not have that limit.  I think this is an oversight in 64-bid Delphi.  Dynamic arrays don't have that tiny 2GB limit.

 

It is a design choice for strings. Stop using strings to hold huge text files. If you want to write a program that works then that's your only choice. 

Share this post


Link to post

So "it isn't a bug - it is a feature"?  That 2GB limit comes up in other places too.  If it is a design choice, they could choose to design it for modern computers.

 

I remember when Stony Brook Pascal went from 16-bit to 32-bit, there were a lot of places where they forgot to change a 16-bit integer to 32 bits.

Share this post


Link to post
Guest

In Swift 2.0 what is the maximum length of a string?

https://stackoverflow.com/questions/37234778/in-swift-2-0-what-is-the-maximum-length-of-a-string#:~:text=What is the maximum length of an NSString object%3F,little over 4.2 billion characters.

image.png.b59e2d26a1ef32a6edcf5590f8df9100.png

Quote

Following the official Apple documentation:

String is bridged to Objective-C as NSString, and a String that originated in Objective-C may store its characters in an NSString.

Since all devices were capable of running iOS are 32 bit, this means NSUIntegerMax is 2^32.

According to Swift opensource GitHub repo It would seem that its value is 2^64 = 18,446,744,073,709,551,615 ; hexadecimal 0xFFFFFFFFFFFFFFFF for the 64 bit devices, following this code:


#if __LP64__ || TARGET_OS_EMBEDDED || TARGET_OS_IPHONE || TARGET_OS_WIN32 || NS_BUILD_32_LIKE_64
typedef long NSInteger;
typedef unsigned long NSUInteger;
#else
typedef int NSInteger;
typedef unsigned int NSUInteger;
#endif

// + (instancetype)
    //     stringWithCharacters:(const unichar *)chars length:(NSUInteger)length
...
maxLength:(NSUInteger)maxBufferCount
...

 

https://www.tutorialspoint.com/What-is-the-max-length-of-a-Python-string

Quote

With a 64-bit Python installation, and 64 GB of memory, a Python 2 string of around 63 GB should be quite feasible. If you can upgrade your memory much beyond that, your maximum feasible strings should get proportionally longer. But this comes with a hit to the runtimes.

With a typical 32-bit Python installation, of course, the total memory you can use in your application is limited to something like 2 or 3 GB (depending on OS and configuration), so the longest strings you can use will be much smaller than in 64-bit installations with very high amounts of RAM.

 

https://stackoverflow.com/questions/816142/strings-maximum-length-in-java-calling-length-method

 

and so...

 

then, "Give it up, baby"!

Edited by Guest

Share this post


Link to post

I'm just a little curious:

Who really relies of a memory string >= 2GB, in danger of killing the whole OS ?

 

Even if my machine would have enough memory, I always would try to chunkify that, or to use memory mapped files instead.

Is this 2GB limit a rather theoretical limitation, or is that widely used in real world applications ?

 

Of course there were maybe some special cases on dedicated machines, like math, search engines, physics simulations or the like,

that would require such high needs, but I just think the normal app, even with high database or 3D load, would never reach that limit.

Maybe you can proof me wrong by some examples, since I don't see any use case at the moment.

 

Edited by Rollo62

Share this post


Link to post

On the one hand, I agree 2 Gb strings are close to nonsense. On the other hand, I see no strong reasons why the limit couldn't be at least 4 Gb (using signed integers for length, seriously?) or even 2^64 just like for dynarrays

Edited by Fr0sT.Brutal

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×