PCRE, the regular expression engine used in Delphi has a large number of compile time options only few of which are exposed in the high-level (System.RegularExpressions) or the low-lever (System.RegularExpressionsCore) Delphi interface. For example a useful PCRE option that is not exposed is the PCRE_UCP, which controls the meaning of \w \d etc. When this options is set for example \w matches any Unicode letter or _ character. If it is not set (in Delphi usage) it only matches ascii letter characters. Class helpers can come to the rescue again.
uses
System.RegularExpressionsAPI,
System.RegularExpressionsCore,
System.RegularExpressions;
type
{ TPerlRegExHelper }
TPerlRegExHelper = class helper for TPerlRegEx
procedure SetAdditionalPCREOptions(PCREOptions : Integer);
end;
procedure TPerlRegExHelper.SetAdditionalPCREOptions(PCREOptions: Integer);
begin
with Self do FPCREOptions := FPCREOptions or PCREOptions;
end;
type
{ TRegExHelper }
TRegExHelper = record helper for TRegEx
public
procedure Study;
procedure SetAdditionalPCREOptions(PCREOptions : Integer);
end;
procedure TRegExHelper.Study;
begin
with Self do FRegEx.Study;
end;
procedure TRegExHelper.SetAdditionalPCREOptions(PCREOptions: Integer);
begin
with Self do FRegEx.SetAdditionalPCREOptions(PCREOptions);
end;
Example usage:
Var
RE : TRegEx;
Match : TMatch;
begin
RE.Create('\w+');
RE.SetAdditionalPCREOptions(PCRE_UCP); // No match without this
Match := RE.Match('æ±‰å ¡åŒ…/æ¼¢å ¡åŒ…');
if Match.Success then
ShowMessage(Match.Groups[0].Value);