JohnLM 27 Posted Friday at 08:19 AM (edited) I was wondering if there are algorithms for date detection inside strings. I have a text file full of hundreds of URL's and some of the url's have Dates and/or Times embeded, either on the left or right side of the url. These were entered by me over the years, since around 2014. I would copy and paste url's and sometimes write notes on the same line with the url. It didn't mean much to me at the time when I writting down the urls. Sometimes I was in a rush or under stress and just jotted them down the way I saw it in my head and fingers at the time. But now, the thought occurred to me that the dates could be important or be useful, today. As for the date, I did not make them standardised. Sometimes I would enter them as 1/1/2020, or 1.1.2020, or 1-1-2020, or 01012020. For instance. . . 1-1-2020 car sale http://www.cars.com 1/2/2020 car sale http://cars.com car sale http://cars.com 3/1/2020 Right now: 1. - I have an app that snipps just the url's, (any lines that have them) and then 2. - I remove the "www." portion for sorting and then removing duplicate urls 3. - now, I want to insert Dates if any. 4. - and finally, display a report output log - I will use the tmemo for this. Is there any custom function for date detection? Or else I will have to conjure up one myself. I don't mind doing one, but it may not be as efficient as expected. Edited Friday at 08:22 AM by JohnLM edits Share this post Link to post
dummzeuch 1730 Posted Friday at 08:27 AM I have got my own function for converting strings to TDate, but I doubt that you will like it, because it does not support that strange date format you are using in your examples. But here goes anyway: function TryStr2Date(const _s: string; out _dt: TDateTime): Boolean; var UKSettings: TFormatSettings; begin Result := True; // Try several different formats // format configured in Windows if not TryStrToDate(_s, _dt) then // German dd.mm.yyyy if not Tryddmmyyyy2Date(_s, _dt) then // ISO yyyy-mm-dd if not TryIso2Date(_s, _dt) then begin // United Kingdom: dd/mm/yyyy UKSettings := GetUserDefaultLocaleSettings; UKSettings.DateSeparator := '/'; UKSettings.ShortDateFormat := 'dd/mm/yyyy'; if not TryStrToDate(_s, _dt, UKSettings) then // nothing worked, give up Result := False; end; end; function Str2Date(const _s: string): TDateTime; begin if not TryStr2Date(_s, Result) then raise EConvertError.CreateFmt(_('''%s'' is not a valid date'), [_s]); end; They are part of my dzlib, where you can find those other functions called above: https://sourceforge.net/p/dzlib/code/HEAD/tree/dzlib/trunk/src/u_dzDateUtils.pas#l493 Share this post Link to post
Anders Melander 2121 Posted Friday at 09:28 AM Sounds like a good job for regex. You can use https://regex101.com/ to experiment with the data and different expressions. For example this is a simple solution against your test data: https://regex101.com/r/VHdo1k/1 Share this post Link to post
JohnLM 27 Posted Friday at 09:42 PM @dummzeuch, you are correct, But thank you for your comment. Share this post Link to post
JohnLM 27 Posted Friday at 09:51 PM I was not complete with that list. But for what its worth, I have other "odd" formats that I did not mention. Like 1.1.2020 and 1.1.20 and 01.01.20 and 010120mo, sept/2020 and 012020 and a few more. @Anders Melander - your regex suggestion looks interesting and promising. I have some learning to do. Thank you for your tip and the example. At first, I thought that I would have to do an overall cleanse of the file by discovering certain parts (patterns) and then breaking them up and redo the dates to a more standard format and then I would go through the list as a final stage and add the dates, if any, to the url output log in my app. Once that (the above idea) is complete, I am also considering changing the date to mm/yyyy or yyyy/mo format in the output log. Share this post Link to post
JohnLM 27 Posted Friday at 10:48 PM I googled and found out that Regex is included in Delphi, under System.RegularExpressions and RegularExpressionsCore. Time to do some more research into this area, and maybe try out the regex string that Anders posted earlier. Share this post Link to post
JohnLM 27 Posted yesterday at 09:18 AM (edited) Progress. . . With a bit of time and effort from that website, so far I have the following success. I have not started anything in Delphi (using its support for RegEx), as that is another learning curve, but it will be interesting to see how I create something to help me identify dates and pull them. Edited yesterday at 09:20 AM by JohnLM edits Share this post Link to post
Anders Melander 2121 Posted yesterday at 04:08 PM 17 hours ago, JohnLM said: I googled and found out that Regex is included in Delphi, under System.RegularExpressions and RegularExpressionsCore. Or you could have just followed the link I provided 🙂: On 9/26/2025 at 11:28 AM, Anders Melander said: Sounds like a good job for regex. Note that there's also TPerlRegEx if you need Perl regular expressions (PCRE). PCRE has some additional features but is also a bit more complicated to use. YMMV. 6 hours ago, JohnLM said: Beware of using . unescaped if you want to match the '.' character. Normally . matches any character but in this case it works because it's inside a [ ] list. If you want to match the '.' character then it's usually better to escape it like this: \. Share this post Link to post
pyscripter 842 Posted yesterday at 04:31 PM 22 minutes ago, Anders Melander said: PCRE has some additional features but is also a bit more complicated to use. YMMV. System.RegularExpressions uses PCRE. Share this post Link to post
Anders Melander 2121 Posted yesterday at 04:41 PM 2 minutes ago, pyscripter said: System.RegularExpressions uses PCRE. Ah... Lovely. Still, TPerlRegEx optionally supports greedy/ungreedy mode. AFAIK TRegEx is always greedy. Just last week I implemented a simple C to Delphi converter for some generated C code an I had to use TPerlRegEx instead of TRegEx because I needed ungreedy matching. Share this post Link to post
pyscripter 842 Posted yesterday at 08:22 PM (edited) 7 hours ago, Anders Melander said: AFAIK TRegEx is always greedy You can do it in (at least) three ways: 1. Use AddRawOptions RegEx.AddRawOptions(PCRE_UNGREEDY); With older versions you can use this code (see [RSP-21733] Compile PCRE with JIT enabled - Embarcadero Technologies) {$IF (CompilerVersion <= 35) and not Declared(RTLVersion112)} type TPerlRegExHelper = class helper for TPerlRegEx procedure AddRawOptions(PCREOptions: Integer); end; procedure TPerlRegExHelper.AddRawOptions(PCREOptions: Integer); begin with Self do FPCREOptions := FPCREOptions or PCREOptions; end; type TRegExHelper = record helper for TRegEx public procedure AddRawOptions(PCREOptions: Integer); end; procedure TRegExHelper.AddRawOptions(PCREOptions: Integer); begin with Self do FRegEx.AddRawOptions(PCREOptions); end; {$ENDIF} 2. Another use of a class helper type TRegExHelper = record helper for TRegEx public procedure AddPCREOptions(PCREOptions: TPerlRegExOptions); end; procedure TRegExHelper.AddPCREOptions(PCREOptions: TPerlRegExOptions); begin with Self do FRegEx.FOptions := FRegEx.FOptions + PCREOptions; end; and use it (not tested): RegEx.AddPCREOptions([preUnGreedy]); 3. Use the ungreedy ? Instead, you can use the ungreedy ? in your regular expressions. See How can I write a regex which matches non greedy? - Stack Overflow. Edited 22 hours ago by pyscripter 1 Share this post Link to post
JohnLM 27 Posted yesterday at 10:32 PM I started looking at ways to use Regex in Delphi. I read that Regex started in XE. I am using XE7 (win-7). But I also have D12.2 (win-10) on my tablet which I haven't had the chance to test my app against since I would have to manually type the source into in order to test it. So my main testing and debugging regex under delphi is in XE7. However, I found an issue with getting regex to work in XE7 and was wondering, are there different implementation versions in Delphi? Share this post Link to post
Vincent Parrett 893 Posted 22 hours ago One major difference is that later versions user the utf-16 version of PCRE, whereas the earlier versions use the utf-8 version and do conversions between utf-8<->utf16. I diffed the source between XE7 and 13.0 and there are a lot of small changes, but mostly around the encoding handling. What issue are you having in XE7. Share this post Link to post