Jump to content
Skullcode

Regex Validate string

Recommended Posts

i have create this function in order to allow arabic and english letters and numbers only in a string

 

var
Regexs : TRegEx;
begin
if Regexs.IsMatch(astr, '[ء-ي-A-Z-a-z-0-9 ]+') then
begin
Result := True;
end else
begin
Result := False;
end;

if i type the string as example abcdefgÄ the result returned True  and i did not specfiy Ä in the regex pattern 

 

how to make regex match return true with the given pattern only ?

Share this post


Link to post
Quote

if i type the string as example abcdefgÄ the result returned True  and i did not specfiy Ä in the regex pattern 

You need to match EOS "$":

if Regexs.IsMatch(astr, '^[ء-ي-A-Z-a-z-0-9 ]+$') then // match fro the begining to the end of astr 

 

Share this post


Link to post

For curiosity, why are you mixing arabic with latin letters. Does such ArabicLatin word make sense for you? 

Share this post


Link to post
30 minutes ago, Mahdi Safsafi said:

For curiosity, why are you mixing arabic with latin letters. Does such ArabicLatin word make sense for you? 

i am trying to validate an input that will be written by a client and trying to prevent special charcters and allow only alphabets arabic and english

Share this post


Link to post
Posted (edited)

current i use this function is it fine or more correction needed ?

 

 


Function Checkstr(const astr: string):Boolean;
var
Regexs : TRegEx;
i : integer;
svalue : string;
Allowed : string;
begin

svalue := Trim(astr);
for i := 1 to Length(svalue) do
begin

if Regexs.IsMatch(svalue[i], '^[ء-يA-Za-z0-9$&+=?@#~<>.^*()%!\s]+$') then
begin
Allowed := 'YES';
end else
begin
Allowed := 'NO';
Break;
end;


end;


if Allowed = 'YES' then
begin
Result := True;
end else
begin
Result := False;
end;

end;

 

Edited by Skullcode

Share this post


Link to post
18 minutes ago, Skullcode said:

current i use this function is it fine or more correction needed ?

 

 



Function Checkstr(const astr: string):Boolean;
var
Regexs : TRegEx;
i : integer;
svalue : string;
Allowed : string;
begin

svalue := Trim(astr);
for i := 1 to Length(svalue) do
begin

if Regexs.IsMatch(svalue[i], '^[ء-يA-Za-z0-9$&+=?@#~<>.^*()%!\s]+$') then
begin
Allowed := 'YES';
end else
begin
Allowed := 'NO';
Break;
end;


end;


if Allowed = 'YES' then
begin
Result := True;
end else
begin
Result := False;
end;

end;

 

Ummm.... Why make a string?

 

Why not getting rid of Allowed and setting Result directly? Why going through character by character if Regex can validate the whole string at once?

If I'm not mistaken you'll achieve the same with this one line:

 

Result := Regexs.IsMatch(svalue.Trim, '^[ء-يA-Za-z0-9$&+=?@#~<>.^*()%!\s]+$');

  • Thanks 1

Share this post


Link to post

Why on earth are you using RegEx at all?

Just use one of the many existing functions or bake your own:

function ValidateChars(const Value, ValidChars: string): boolean;
begin
  for var v in Value do
  begin
    var Valid: boolean := False;
    for var c in ValidChars do
      if (c = v) then
      begin
        Valid := True;
        break;
      end;
    if (not Valid) then
      Exit(False);
  end;
  Result := True;
end;

Also consider using character classes. See TCharHelper in the System.Character unit.

  • Like 1

Share this post


Link to post

Don't validate per char as it will be maze to follow in Unicode, and don't use RegEX as it will be different problem as numbers settings in the OS might be Arabic (those numbers in fact Hindu mistakenly called Arabic) for all Locale and they will cause havoc.

I do it differently, i also wrote a filter to filter out any non English and non Arabic chars

 

TEncoding.GetEncoding(1256).GetString(TEncoding.GetEncoding(1256).GetBytes(aText))

 

any other languages chars will be '?', many special chars will be '?' , special chars wasn't not a problem in my case, it was requested to prevent bad (offending) words that is written in different languages and the administrators don't understand them.

 

I don't use TEncoding but windows API directly, but i think you got the idea.

  • Like 2

Share this post


Link to post

 

Quote

Why on earth are you using RegEx at all?

There are many reasons :

- This is a good place where using RegEx makes sense.

- RegEx interacts perfectly with UDB. Eg: '^[\p{Arabic}\p{P}A-Za-z0-9\s]+$'. How someone can represent \p{P}, \p{Arabic} smoothly when doing hand-writing ? 

- In many times(but not always), RegEx can outperform a hand writing function specially if you compile them. I didn't test but I believe RegEx-solution performs better than your hand-writing-solution.

- RegEx is much faster for typing and reading. 

- Extending a RegEx pattern is much simple than extending a function.
- ...

 

Quote

RegEX as it will be different problem as numbers settings in the OS might be Arabic (those numbers in fact Hindu mistakenly called Arabic) for all Locale and they will cause havoc.

AFAIK, RegEx do not rely on OS. They have their own DB.

 

  • Like 2

Share this post


Link to post

@Mahdi Safsafi Thank you, i didn't know that.

 

It was one line to filter out every other Unicode, and will be interesting to see how RegEX handle numbers (Hindu and Arabic), later will investigate.

Share this post


Link to post
29 minutes ago, Kas Ob. said:

and will be interesting to see how RegEX handle numbers (Hindu and Arabic), later will investigate.

\d handles number from any language.

Share this post


Link to post
On 7/1/2020 at 9:12 AM, Mahdi Safsafi said:

For curiosity, why are you mixing arabic with latin letters. Does such ArabicLatin word make sense for you? 

In my experience,  when reporting is done on oil wells in the Middle East, the Arabic text is often interspersed with English technical terms.  And for a developer it is quite a challenge to get mixed  LTR-RTL text input right.   

  • Like 1

Share this post


Link to post
1 hour ago, A.M. Hoornweg said:

In my experience,  when reporting is done on oil wells in the Middle East, the Arabic text is often interspersed with English technical terms.  And for a developer it is quite a challenge to get mixed  LTR-RTL text input right.   

Thanks man !

This "ArabicWord EnglishWord" sentence makes sense to me, But using a word that has a mixed Arabic and English letter such "ArabicLettersEnglishLetters" is much harder to make sense.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×