Mike Torrettinni 198 Posted June 26, 2020 I'm trying to generalize my method to remove formatting tokens from the string, and just want to be sure that I handle all cases. So, the questions is: besides the d, e, f, g, m, n, p, s, u, x are there other possible options in Format (SysUtils) function? I have this method and would like to make sure it covers all options: // Remove any %s %d... from string // Used for cleaning formatted ready strings to be display ready // 'Customer name %s contains invalid characters.' -> 'Customer name contains invalid characters.' function RemoveFormatSettingsFromString(const aString: string): string; const cReplacements: array [1..10] of string = ('%s ', '%d ', '%e ', '%f ', '%g ', '%m ', '%n ', '%p ', '%u ', '%x '); var vReplacement: string; begin Result := aString; for vReplacement in cReplacements do Result := StringReplace(Result, vReplacement, '', [rfReplaceAll]); end; Share this post Link to post
Guest Posted June 26, 2020 Not an answer to your question, but a suggestion Change the function to return boolean if such invalid characters detected, and the result string (sanitized) will return in var parameter, such you will be able to show and log the original invalid string, yet still can silently discard it and use the fixed one. Share this post Link to post
Guest Posted June 26, 2020 Quote Copy Code "%" [index ":"] ["-"] [width] ["." prec] type A format specifier begins with a % character. After the % come the following elements, in this order: An optional argument zero-offset index specifier (that is, the first item has index 0), [index ":"] An optional left justification indicator, ["-"] An optional width specifier, [width] An optional precision specifier, ["." prec] The conversion type character, type So the cases where all formaters are a % and just one char are specific. What about matching for %[anything w/o space][type][one space] where type is one of those type characters? Share this post Link to post
Achim Kalwa 61 Posted June 26, 2020 (edited) 57 minutes ago, Mike Torrettinni said: I'm trying to generalize my method to remove formatting tokens from the string, and just want to be sure that I handle all cases. So, the questions is: besides the d, e, f, g, m, n, p, s, u, x are there other possible options in Format (SysUtils) function? I have this method and would like to make sure it covers all options: function RemoveFormatSettingsFromString(const aString: string): string; const cReplacements: array [1..10] of string = ('%s ', '%d ', '%e ', '%f ', '%g ', '%m ', '%n ', '%p ', '%u ', '%x '); That is just a simple start. There might be indexes, width and precision informations, like 'Total amount: %1.2f'. You will need much more parsing. Perhaps some RegEx filtering. Edited June 26, 2020 by Achim Kalwa Share this post Link to post
Mike Torrettinni 198 Posted June 26, 2020 4 minutes ago, Dany Marmur said: So the cases where all formaters are a % and just one char are specific. Wow, a lot more options. I see it now. I was reading this page and didn't scroll down: http://www.delphibasics.co.uk/RTL.asp?Name=format Share this post Link to post
Guest Posted June 26, 2020 IMHO Quote "%" [index ":"] ["-"] [width] ["." prec] type is the "Key". This line specifies ALL possible combinations. You can expect. Obviously the formatter ALWAYS starts with % and ends with a "type" char. You should be able to skip testing for a space ater the type char. Share this post Link to post
Guest Posted June 26, 2020 I dunno, but i seem to remember an escape character in the mix. Cannot find it now on the phone. I.e. how would you print a "%". Share this post Link to post
Anders Melander 1783 Posted June 26, 2020 You can find a function to strip out format specifiers (among other things), including the index, width and precision stuff, here: https://bitbucket.org/anders_melander/better-translation-manager/src/a9e47ac90e7f80b67176cdb61b72aa34f4a8f165/Source/amLocalization.Normalization.pas#lines-306 The code as-is replaces %... with space to make the result readable. This is the relevant code: Result := Value; // Find first format specifier n := PosEx('%', Result, 1); while (n > 0) and (n < Length(Result)) do begin Inc(n); if (Result[n] = '%') then begin // Escaped % - ignore Delete(Result, n, 1); end else if (IsAnsi(Result[n])) and (AnsiChar(Result[n]) in ['0'..'9', '-', '.', 'd', 'u', 'e', 'f', 'g', 'n', 'm', 'p', 's', 'x']) then begin Result[n-1] := ' '; // Replace %... with space // Remove chars until end of format specifier while (Result[n].IsDigit) do Delete(Result, n, 1); if (Result[n] = ':') then Delete(Result, n, 1); if (Result[n] = '-') then Delete(Result, n, 1); while (Result[n].IsDigit) do Delete(Result, n, 1); if (Result[n] = '.') then Delete(Result, n, 1); while (Result[n].IsDigit) do Delete(Result, n, 1); if (IsAnsi(Result[n])) and (AnsiChar(Result[n]) in ['d', 'u', 'e', 'f', 'g', 'n', 'm', 'p', 's', 'x']) then Delete(Result, n, 1) else begin // Not a format string - undo Result := Value; break; end; end else begin // Not a format string - undo Result := Value; break; end; // Find next format specifier n := PosEx('%', Result, n); end; 2 Share this post Link to post
Anders Melander 1783 Posted June 26, 2020 14 minutes ago, Dany Marmur said: how would you print a "%" '%%' Share this post Link to post
Mike Torrettinni 198 Posted June 26, 2020 26 minutes ago, Anders Melander said: The code as-is replaces %... with space to make the result readable. Thanks! You never had the need to not replace with space, but just delete the %... ? I usually have spaces around the %..., so in this case you end up with triple space, right? 'Customer name %s is wrong' -> 'Customer name is wrong'. No? Share this post Link to post
Anders Melander 1783 Posted June 26, 2020 1 minute ago, Mike Torrettinni said: You never had the need to not replace with space, but just delete the %... ? I usually have spaces around the %..., so in this case you end up with triple space, right? 'Customer name %s is wrong' -> 'Customer name is wrong'. No? It depends on what I use the string for afterwards. For example if I need to compare with another string I just trim consecutive spaces down to a single. If I need to parse out the individual words I leave the spaces there since the parser will skip over them anyway. The function is for use in a translation tool. AFAIR it can also remove shortcut accelerators, () [] {} <> pairs and punctuation .:;? etc. Share this post Link to post
Mike Torrettinni 198 Posted June 26, 2020 17 minutes ago, Anders Melander said: It depends on what I use the string for afterwards. For example if I need to compare with another string I just trim consecutive spaces down to a single. If I need to parse out the individual words I leave the spaces there since the parser will skip over them anyway. The function is for use in a translation tool. AFAIR it can also remove shortcut accelerators, () [] {} <> pairs and punctuation .:;? etc. OK, makes sense. It looks like very versatile function! 1 Share this post Link to post
Mahdi Safsafi 225 Posted June 26, 2020 @Mike Torrettinni I don't know if this going to help. A while ago I wrote a regular expression to match the following pattern ""%" [index ":"] ["-"] [width] ["." prec] type" for format %((\d+|\*)\:)?[\-]?(\d+|\*)?(\.(\d+|\*))?[duefgnmpsx] The regex was used with Perl but I believe its still compatible with PCRE like. 1 Share this post Link to post
Mike Torrettinni 198 Posted June 26, 2020 37 minutes ago, Mahdi Safsafi said: @Mike Torrettinni I don't know if this going to help. A while ago I wrote a regular expression to match the following pattern ""%" [index ":"] ["-"] [width] ["." prec] type" for format %((\d+|\*)\:)?[\-]?(\d+|\*)?(\.(\d+|\*))?[duefgnmpsx] The regex was used with Perl but I believe its still compatible with PCRE like. Thanks, but I rarely use RegEx and only with very simple expressions. Share this post Link to post
Anders Melander 1783 Posted June 26, 2020 1 hour ago, Mike Torrettinni said: Thanks, but I rarely use RegEx and only with very simple expressions. Wise decision. They are fun to write but a nightmare to maintain. Share this post Link to post
Anders Melander 1783 Posted June 26, 2020 2 hours ago, Mahdi Safsafi said: %((\d+|\*)\:)?[\-]?(\d+|\*)?(\.(\d+|\*))?[duefgnmpsx] Hmm. Looking at that RegEx I just realized that I forgot to handle the asterisk parameter specifier in my own code. To fix replace all three occurrences of: while (Result[n].IsDigit) do Delete(Result, n, 1); with: if (Result[n] = '*') then Delete(Result, n, 1) else while (Result[n].IsDigit) do Delete(Result, n, 1); 1 Share this post Link to post
Fr0sT.Brutal 900 Posted July 24, 2020 Don't forget '%%d' (meant to produce '%d' after formatting) and '%s%s' cases Share this post Link to post