Jump to content
Sign in to follow this  
wuwuxin

What is the best (fast) way of checking if a string is number?

Recommended Posts

I know I can use TryStrToFloat, but is there any other (better, and faster) way of checking if a string can be converted to number?

Share this post


Link to post

It depends on what you mean by "can be converted to number". If your definition is that TryStrToFloat returns True, then the answer is no. If you're willing to give up some of the features of TryStrToFloat then the answer is yes, but you are going to have to implement it yourself - in assembler.

 

But why do you need it to be faster? Is it a bottleneck for you?

  • Like 2

Share this post


Link to post

@Anders Melander  Thank you for the inputs.

 

I need a fast way to just find out if a string is "convertible" to number, I don't need to see the actual number converted.

 

I can use TryStrToFloat, then ignore the converted number, while just looking at the returned Boolean result.

 

The reason I need this to be as fast as possible, is -  I am processing a large text file, and need to generate some statistics on how much can be converted to numbers.  I don't need to actually convert numbers, just need to know the percentage of the file that can be converted to numbers (integer, or floating point).

 

Any advice?

Share this post


Link to post

Do you need to know whether or not the value is in the valid range of your target data types? 

 

Also if you need to do this fast then you probably won't be using a string for each line becasue that involves heap allocation. 

Edited by David Heffernan

Share this post


Link to post
22 minutes ago, David Heffernan said:

Do you need to know whether or not the value is in the valid range of your target data types? 

No need for the range check. Just whether is is a number string.

Share this post


Link to post

The only built-in way to determine if a string can be converted by TryStrToFloat is to use TryStrToFloat. You could reverse engineer TryStrToFloat to make a version that only parsed the string and didn't actually do the conversion but my guess is that the majority of time used by TryStrToFloat is used by the parser, not by the conversion.

 

But if you are reading from a text file then it's likely that the I/O will be your real bottleneck and as David hints at, you should profile first and optimize next.

Share this post


Link to post

Possibly loop a pointer through the string, until and non-digit or non-comma is found, and no more than 1 comma's, maybe that could be faster.

If the string ends without any break, it should be convertible string ( if the length is also somewhat limited to what maximum digits could be there ).

But I haven't checked yet (too late) what TryStrToFloat is doing, maybe its already there.

 

If I assume that most digits are in normal range (not above min/max limits), then this precheck to find non-digits could be faster, as a kind of pre-selection.

But when you finally convert, you will have to consider the min/max again, to make it complete.

Edited by Rollo62
  • Like 1

Share this post


Link to post
7 minutes ago, pyscripter said:

Have a look at the routines in SVGIconImageList/SVGCommon.pas at master · EtheaDev/SVGIconImageList (github.com)

They were "stolen" and adapted from synopse/mORMot: Synopse mORMot ORM/SOA/MVC framework (github.com) and they did make a big difference when parsing SVG files.

Yes, I was think of something like this, with pointer math.

But not sure if this might be faster than System functions, which should include similar stuff.

 

Edited by Rollo62

Share this post


Link to post

Have you tried to use val ?

 

ex:

      val('1287653.3', yourFloat , InvalidCharIndex );

 

Val converts a string to a numeric value. The Result argument can be an integer, Int64, or floating-point variable. If the conversion is successful, InvalidCharIndex is zero. Otherwise, the value of InvalidCharIndex is the string position where Val first detected a format error. Val is not a real procedure.

 

 

Edited by Clément

Share this post


Link to post

Should this support any number format which TryStrToFloat supports? E.g. '5.4E7', '543,543.647' or ' $0abc6'? Or are there restrictions? What about decimal and thousands separators? What about negative numbers? Negative exponents? The more flexible it must be, the slower it will become.

The easiest case would be:

Only decimal digits and only a decimal point, no negative numbers. No range restrictions. No exponents. That would be fairly simple and fast, because it's a state machine with only a few states.

Edited by dummzeuch
  • Like 1

Share this post


Link to post
1 hour ago, dummzeuch said:

Should this support any number format which TryStrToFloat supports?

Exactly. It's pretty pointless to suggest different ways of doing conversion if the conversion will be done with TryStrToFloat, StrToFloat or the like.

 

And it's even more pointless when this is in all likelihood premature optimization.

  • Like 1

Share this post


Link to post
11 hours ago, Clément said:

Have you tried to use val ?

Interesting, I had overlooked and forgotten this little, ancient piece for years.

I thought this was only Integer, but it seems to work with Real = Double as well.

The only problem is that it ignores the decimal separator, so the result of '123.4'  will be an integer 1234;

Which doesn't make sense only in a few corner cases, for handling with Real.

 

But good to know that it exists anyway, and to point into that direction again.

 

Edited by Rollo62

Share this post


Link to post
26 minutes ago, Rollo62 said:

Interesting, I had overlooked and forgotten this little, ancient piece for years.

I thought this was only Integer, but it seems to work with Real = Double as well.

The only problem is that it ignores the decimal separator, so the result of '123.4'  will be an integer 1234;

Which doesn't make sense only in a few corner cases, for handling with Real.

It doesn't ignore the decimal separator as such, but will always assume it to be '.' rather than the value of DecimalSeparator. So converting '123.4' will result in 123.4, but converting '123,4' with DecimalSeparator = ',' will result in 123 and an error at position 3.

Share this post


Link to post

@dummzeuch and @Rollo62

 

No matter if you use '123.4' or '123,4'. It gives error in position 4 with a result of 123.

 

Oops, my bad.

 

Indeed when a float is passed, '123.4' results in 123.4 with error = 0, while '123,4' results in 123 with error = 4.

 

There must be an error in the docs:

Quote

Other than the optional sign at the beginning, all characters must be digits; decimal or thousands separators are not supported.

Though one sentence later it says:

Quote

V is an integer-type or real-type variable. If V is an integer-type variable, S must form a whole number.

Which implies that a real type must not be a whole number.

 

Edited by Leif Uneus

Share this post


Link to post

I have created some small Var() tests (for positive numbers), to check the behaviour under Rx11, probably it behaves different in older versions.

It has so many corner cases, better know exactly how and when to use it.

See yourself, and maybe find the right story for the use-case.

 

procedure Test_Val;
var
    LTest : String;
    LResD : Double;
    LResI : Integer;
    LPos  : Integer;
begin
    LTest := '123';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes 123     LPos 0     OK
    Val( LTest, LResD, LPos );  // LRes 123.0   LPos 0     OK

    LTest := '123.4';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes 123     LPos 4     MAYBE acceptable as TRUNC, if LPos < length ?
    Val( LTest, LResD, LPos );  // LRes 123.4   LPos 0     OK

    LTest := '123,4';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes 123     LPos 4     MAYBE acceptable as TRUNC, if LPos < length ?
    Val( LTest, LResD, LPos );  // LRes 123.0   LPos 4     MAYBE acceptable as TRUNC, if LPos < length ?

    LTest := '123a';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes 123     LPos 4     ERR, acceptable as split before char, if LPos = length ?
    Val( LTest, LResD, LPos );  // LRes 123.0   LPos 4     ERR, acceptable as split before char, if LPos = length ?

    LTest := '123.4a';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes 123     LPos 4     ERR
    Val( LTest, LResD, LPos );  // LRes 1234.0  LPos 6     ERR ???, MAYBE useful to "digitize" a float ?

    LTest := '123,4a';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes 123     LPos 4     ERR
    Val( LTest, LResD, LPos );  // LRes 123.0   LPos 4     ERR




    LTest := '.5';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes   0     LPos 1     ERR
    Val( LTest, LResD, LPos );  // LRes 0.5     LPos 0     OK

    LTest := ',5';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes   0     LPos 1     ERR
    Val( LTest, LResD, LPos );  // LRes   0     LPos 1     ERR

    LTest := '.5a';
    LResD := 0.0;  LResI := -1;  LPos := -1;
    Val( LTest, LResI, LPos );  // LRes   0     LPos 1     ERR
    Val( LTest, LResD, LPos );  // LRes   5     LPos 3     ERR



end;

 

Edited by Rollo62

Share this post


Link to post
6 minutes ago, Anders Melander said:

I would not say that, in regards of Var().

For me personally, I knew Var(), but never worked with that really.

So its fair to find where its benefits and disadvantages are, also its do's and don'ts.

 

I expected a hidden pearl, 10 min ago, but it seems that I only found an empty oyster :classic_sad:

 

 

 

Edited by Rollo62

Share this post


Link to post
On 11/17/2021 at 7:06 PM, wuwuxin said:

I know I can use TryStrToFloat, but is there any other (better, and faster) way of checking if a string can be converted to number?

With all abovementioned number formats don't forget decimal separators that could be customized to ANY char and that a digit even could be something other than 0-9 https://en.wikipedia.org/wiki/List_of_numeral_systems

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×