clubreseau 0 Posted November 4, 2020 (edited) Hi, Question 1 I have a listbox with full or URL like http://test.com/demo.php https://test.com/demo.php http://www.test.com/demo.php if my string have somethink like https://test.com/test/test.php then do not add in listbox1 because test.com are in the listbox1. maybe the solution is to remove http:// and https:// at the begining until first / and after compare to see. Question 2 I have a listbox with 3000 URL when I press the button to paste all items from listbox to memo, its feeze and take time. Someone have the solution for long text for memo. Thank you, please im visual need a demo of part of code. Edited November 4, 2020 by clubreseau Share this post Link to post
Guest Posted November 4, 2020 Question 1 Try something like this function ExtractDomainName(const Url: string): string; var FirstChar, LastChar, Len: Integer; begin Result := ''; FirstChar := Pos('://', Url); LastChar := Pos('/', Url, FirstChar + 3); if FirstChar = 0 then FirstChar := 1 // means no protocol there, so we adjust for the first char else FirstChar := FirstChar + 3; // to compensate the length of "://" if LastChar = 0 then LastChar := Length(Url) + 1; // adjust for the last char, add 1 for Len calculation if FirstChar >= LastChar then Exit; // we have broken an empty URL or broken one Len := LastChar - FirstChar; SetLength(Result, Len); Move(Url[FirstChar], Result[1], Len * SizeOf(Char)); end; Question 2 You need to understand the following TMemo is a control that will issue an invalidate and redraw..., every time you add an item, this behaviour is not strict to TMemo, you can avoid the redraw on every items when you will add many items in a loop by locking it. Memo.Lines.BeginUpdate; try // We loop here and add items finally Memo.Lines.EndUpdate; end; try..finally is crucial here, because if you locked some control like a Memo or ListBox, then in that loop some exception raised, that will mean EndUpdate is not called and your control is frozen and might cause you whole application to freeze. Share this post Link to post
clubreseau 0 Posted November 6, 2020 On 11/4/2020 at 3:22 AM, Kas Ob. said: Question 1 Try something like this function ExtractDomainName(const Url: string): string; var FirstChar, LastChar, Len: Integer; begin Result := ''; FirstChar := Pos('://', Url); LastChar := Pos('/', Url, FirstChar + 3); if FirstChar = 0 then FirstChar := 1 // means no protocol there, so we adjust for the first char else FirstChar := FirstChar + 3; // to compensate the length of "://" if LastChar = 0 then LastChar := Length(Url) + 1; // adjust for the last char, add 1 for Len calculation if FirstChar >= LastChar then Exit; // we have broken an empty URL or broken one Len := LastChar - FirstChar; SetLength(Result, Len); Move(Url[FirstChar], Result[1], Len * SizeOf(Char)); end; Question 2 You need to understand the following TMemo is a control that will issue an invalidate and redraw..., every time you add an item, this behaviour is not strict to TMemo, you can avoid the redraw on every items when you will add many items in a loop by locking it. Memo.Lines.BeginUpdate; try // We loop here and add items finally Memo.Lines.EndUpdate; end; try..finally is crucial here, because if you locked some control like a Memo or ListBox, then in that loop some exception raised, that will mean EndUpdate is not called and your control is frozen and might cause you whole application to freeze. I use this to Load 10K of Items from listbox1 to Memo1 and it take 4 minut to see items in memo1 and the APP Freeze. here my code procedure TForm1.Button1(Sender: TObject); var i:integer; begin Memo1.lines.BeginUpdate; try for i := 0 to ListBox1.Items.Count -1 do Memo1.Lines := ListBox1.Items; finally Memo1.lines.EndUpdate; end; end; Share this post Link to post
Guest Posted November 6, 2020 1 hour ago, clubreseau said: for i := 0 to ListBox1.Items.Count -1 do Memo1.Lines := ListBox1.Items; What is this ???!!!!!! You are loading all items 10000 items, and yes it will take 4 minutes. You should either do this Memo1.Lines.BeginUpdate; try for I := 0 to ListBox1.Items.Count - 1 do Memo1.Lines.Add(ListBox1.Items.Strings[I]) finally Memo1.Lines.EndUpdate; end; Or simply this one line, (no looping) Memo1.Lines := ListBox1.Items; // no need to BeginUpdate..EndUpdate here the difference is this, if you need to process each item then use BeginUpdate..EndUpdate and loop through the items, when you don't need to process then you don't need to loop, means just one line will do it. Here is a demo with 10k lines, you go and time it and what is the difference for yourself. ListBoxMemo.zip Share this post Link to post
clubreseau 0 Posted November 6, 2020 12 hours ago, Kas Ob. said: What is this ???!!!!!! You are loading all items 10000 items, and yes it will take 4 minutes. You should either do this Memo1.Lines.BeginUpdate; try for I := 0 to ListBox1.Items.Count - 1 do Memo1.Lines.Add(ListBox1.Items.Strings[I]) finally Memo1.Lines.EndUpdate; end; Or simply this one line, (no looping) Memo1.Lines := ListBox1.Items; // no need to BeginUpdate..EndUpdate here the difference is this, if you need to process each item then use BeginUpdate..EndUpdate and loop through the items, when you don't need to process then you don't need to loop, means just one line will do it. Here is a demo with 10k lines, you go and time it and what is the difference for yourself. ListBoxMemo.zip why this freeze the APP and take 1 minut to load 10K of items ? procedure TForm1.Button1(Sender: TObject); begin Form1.Memo1.Lines := Form2.Listbox1.Items; end; Share this post Link to post
Anders Melander 1783 Posted November 6, 2020 18 minutes ago, clubreseau said: why this freeze the APP and take 1 minut to load 10K of items ? Generally speaking, the API of most Windows common controls goes through the windows message queue. So my guess is that for each item in the listbox there's a lot of messages going back and forth while the listbox is populated and that's an enormous overhead. If you break the application in the debugger while the list is loading you can see what it's doing. Share this post Link to post
Guest Posted November 6, 2020 29 minutes ago, clubreseau said: why this freeze the APP and take 1 minut to load 10K of items ? I have no idea ! On my PC both take almost the same time and that is less than 2 seconds. Share this post Link to post
Remy Lebeau 1396 Posted November 6, 2020 (edited) 14 minutes ago, Anders Melander said: Generally speaking, the API of most Windows common controls goes through the windows message queue. So my guess is that for each item in the listbox there's a lot of messages going back and forth Messages are used, yes (LB_ADDSTRING, etc). But not the message queue, no. The messages in question would be SENT directly to the ListBox's window procedure for immediate processing, not POSTED to the message queue of the thread that owns the ListBox awaiting dispatching to the ListBox's window procedure. Edited November 6, 2020 by Remy Lebeau 1 Share this post Link to post
Anders Melander 1783 Posted November 6, 2020 1 minute ago, Remy Lebeau said: But not the message queue, no. Yes, it was poorly expressed on my part. If they went through the message queue then the application wouldn't appear hung. Share this post Link to post
clubreseau 0 Posted November 6, 2020 (edited) Im confused ! then the fastest way is ? someone tell me to use tstringlist ? Edited November 6, 2020 by clubreseau Share this post Link to post
clubreseau 0 Posted November 6, 2020 17 hours ago, clubreseau said: I use this to Load 10K of Items from listbox1 to Memo1 and it take 4 minut to see items in memo1 and the APP Freeze. here my code procedure TForm1.Button1(Sender: TObject); var i:integer; begin Memo1.lines.BeginUpdate; try for i := 0 to ListBox1.Items.Count -1 do Memo1.Lines := ListBox1.Items; finally Memo1.lines.EndUpdate; end; end; the function do not delete all duplicate domain in listbox1 ? function ExtractDomainName(const Url: string): string; var FirstChar, LastChar, Len: Integer; begin Result := ''; FirstChar := Pos('://', Url); LastChar := Pos('/', Url, FirstChar + 3); if FirstChar = 0 then FirstChar := 1 // means no protocol there, so we adjust for the first char else FirstChar := FirstChar + 3; // to compensate the length of "://" if LastChar = 0 then LastChar := Length(Url) + 1; // adjust for the last char, add 1 for Len calculation if FirstChar >= LastChar then Exit; // we have broken an empty URL or broken one Len := LastChar - FirstChar; SetLength(Result, Len); Move(Url[FirstChar], Result[1], Len * SizeOf(Char)); end; Share this post Link to post
Mahdi Safsafi 225 Posted November 7, 2020 Inserting lines one by one is a very expensive operation for memo. Also your 'All items at once' is wrong ! procedure TForm10.Button5Click(Sender: TObject); begin Memo1.Lines := ListBox1.Items; // this just inserts item one by one => same as 'One item with loop' ! // if you want to insert all items in a faster way than what you called 'One item with loop', you need to use text property or SetTextBuf (depending on your need): // - Memo1.Text := ListBox1.Items.Text; // - Memo1.SetTextBuf(PChar(ListBox1.Items.Text)); end; 1 Share this post Link to post
clubreseau 0 Posted November 7, 2020 I have a listbox with full or URL like http://test.com/demo.php https://test.com/demo.php http://www.test.com/demo.php if my string have somethink like https://test.com/test/test.php then do not add in listbox1 because test.com are in the listbox1. maybe the solution is to remove http:// and https:// at the begining until first / and after compare to see. someone have a solution to delete duplicate domain in my listbox1 ? Share this post Link to post
Guest Posted November 7, 2020 Not sure if i do understand what you need exactly, try this project and see what you can use, then refactor it into your need. ListBoxMemo2.rar Share this post Link to post
FPiette 383 Posted November 7, 2020 (edited) 3 hours ago, clubreseau said: someone have a solution to delete duplicate domain in my listbox1 ? function ExtractDomain(const URL : String) : String; var I, J : Integer; begin I := Pos('://', URL); if I <= 0 then I := 1 else Inc(I, 3); J := Pos('/', URL, I); if J <= 0 then begin Result := Copy(URL, I, MAXINT); Exit; end; Result := Copy(URL, I, J - I); end; procedure TForm1.Button1Click(Sender: TObject); var Index : Integer; Dict : TDictionary<String, Integer>; URL : String; Domain : String; Value : Integer; begin Dict := TDictionary<String, Integer>.Create(10000); try ListBox1.Items.BeginUpdate; try for Index := ListBox1.Items.Count - 1 downto 0 do begin URL := ListBox1.Items[Index]; if URL = '' then begin ListBox1.Items.Delete(Index); continue; end; Domain := ExtractDomain(Trim(UpperCase(URL))); if Dict.TryGetValue(Domain, Value) then begin // Domain already found, delete from ListBox ListBox1.Items.Delete(Index); continue; end; // Domain not seen before, add to dictionary and don't remove from list Dict.Add(Domain, 0); end; finally ListBox1.Items.EndUpdate; end; finally FreeAndNil(Dict); end; end; This will check for domain by ignoring character casing and ignoring the protocol. You may adept this to a TStrings easily. I used a dictionary because you said you have 10K items. A dictionary should be faster than a simple list when there are a lot of items but I have not checked how many items are required so that dictionary is faster than list. Edited November 7, 2020 by FPiette 1 Share this post Link to post
clubreseau 0 Posted November 8, 2020 16 hours ago, FPiette said: function ExtractDomain(const URL : String) : String; var I, J : Integer; begin I := Pos('://', URL); if I <= 0 then I := 1 else Inc(I, 3); J := Pos('/', URL, I); if J <= 0 then begin Result := Copy(URL, I, MAXINT); Exit; end; Result := Copy(URL, I, J - I); end; procedure TForm1.Button1Click(Sender: TObject); var Index : Integer; Dict : TDictionary<String, Integer>; URL : String; Domain : String; Value : Integer; begin Dict := TDictionary<String, Integer>.Create(10000); try ListBox1.Items.BeginUpdate; try for Index := ListBox1.Items.Count - 1 downto 0 do begin URL := ListBox1.Items[Index]; if URL = '' then begin ListBox1.Items.Delete(Index); continue; end; Domain := ExtractDomain(Trim(UpperCase(URL))); if Dict.TryGetValue(Domain, Value) then begin // Domain already found, delete from ListBox ListBox1.Items.Delete(Index); continue; end; // Domain not seen before, add to dictionary and don't remove from list Dict.Add(Domain, 0); end; finally ListBox1.Items.EndUpdate; end; finally FreeAndNil(Dict); end; end; This will check for domain by ignoring character casing and ignoring the protocol. You may adept this to a TStrings easily. I used a dictionary because you said you have 10K items. A dictionary should be faster than a simple list when there are a lot of items but I have not checked how many items are required so that dictionary is faster than list. Thank you its work perfectly ! How to convert it, to look at a file called url.txt that contain URL, and delete all duplicate domains in my listbox1 ? and like your code I want its look only domain root. If domain root are in Url.txt delete the entire URL in listbox1. Thank you Share this post Link to post
FPiette 383 Posted November 8, 2020 5 hours ago, clubreseau said: How to convert it, to look at a file called url.txt that contain URL, and delete all duplicate domains in my listbox1 ? The idea is just the same as above and then read the url file line by line, extract the domain, check the dictionary. If already there, just read next line if not there, add the domain in the dictionary and add the url to the listbox. You should be able to do that easily yourself. Quote and like your code I want its look only domain root. If domain root are in Url.txt delete the entire URL in listbox1. I don't understand what you call "domain root". Give an example. Share this post Link to post
clubreseau 0 Posted November 8, 2020 2 hours ago, FPiette said: The idea is just the same as above and then read the url file line by line, extract the domain, check the dictionary. If already there, just read next line if not there, add the domain in the dictionary and add the url to the listbox. You should be able to do that easily yourself. I don't understand what you call "domain root". Give an example. this is my code and not working trying to loook into files and delete all duplicated URL in listbox1. procedure TForm1.DELETEURLClick(Sender: TObject); var Index : Integer; Dict : TDictionary<String, Integer>; URL : String; Domain : String; Value : Integer; sl : TStringList; ix : Integer; begin sl := TStringList.Create; Dict := TDictionary<String, Integer>.Create(100000); try ListBox1.Items.BeginUpdate; try for Index := ListBox1.Items.Count - 1 downto 0 do begin URL := ListBox1.Items[Index]; sl.LoadFromFile('C:\Users\myname\Documents\Url.txt'); ix := sl.IndexOf(URL); if URL = '' then begin ListBox1.Items.Delete(Index); continue; end; Domain := ExtractDomain(Trim(UpperCase(ix.ToString))); if Dict.TryGetValue(Domain, Value) then begin // Domain already found, delete from ListBox ListBox1.Items.Delete(Index); continue; end; // Domain not seen before, add to dictionary and don't remove from list Dict.Add(Domain, 0); end; finally ListBox1.Items.EndUpdate; end; finally FreeAndNil(Dict); sl.Free; end; end; function ExtractDomain(const URL : String) : String; var I, J : Integer; begin I := Pos('://', URL); if I <= 0 then I := 1 else Inc(I, 3); J := Pos('/', URL, I); if J <= 0 then begin Result := Copy(URL, I, MAXINT); Exit; end; Result := Copy(URL, I, J - I); end; thank you SIR ! Share this post Link to post
FPiette 383 Posted November 8, 2020 You have to keep the code as I designed it to initialize the dictionary and clean - if needed - the list box content. Then, after the for/loop, without clearing the dictionary, add a second for/loop to scan the URL file as I explained in my previous message. To read the URL file, of course you may use a string list or you may use TStreamReader or traditional file I/O to read the file line by line. The later will avoid loading the entire file in memory and only then scan for the lines. Share this post Link to post
FPiette 383 Posted November 8, 2020 8 hours ago, clubreseau said: look at a file called url.txt that contain URL, and delete all duplicate domains in my listbox1 ? What if an URL from url.txt is not in the ListBox? Simply ignore it or add it to the ListBox? In other words, does url.txt contain unwanted URL or does it contain URL to be added to the ListBox avoiding duplicates? And you didn't answered my question about "root domain". Share this post Link to post
FPiette 383 Posted November 8, 2020 Here is code to add all URL from url.txt except those already existing in the ListBox and avoiding all duplicates. function ExtractDomain(const URL : String) : String; var I, J : Integer; begin I := Pos('://', URL); if I <= 0 then I := 1 else Inc(I, 3); J := Pos('/', URL, I); if J <= 0 then begin Result := Copy(URL, I, MAXINT); Exit; end; Result := Copy(URL, I, J - I); end; procedure TForm1.Button1Click(Sender: TObject); var Index : Integer; Dict : TDictionary<String, Integer>; URL : String; Domain : String; Value : Integer; UrlFile : TStreamReader; begin Dict := TDictionary<String, Integer>.Create(10000); try ListBox1.Items.BeginUpdate; try for Index := ListBox1.Items.Count - 1 downto 0 do begin URL := Trim(ListBox1.Items[Index]); if URL = '' then begin ListBox1.Items.Delete(Index); continue; end; Domain := ExtractDomain(UpperCase(URL)); if Dict.TryGetValue(Domain, Value) then begin // Domain already found, delete from ListBox ListBox1.Items.Delete(Index); continue; end; // Domain not seen before, add to dictionary and don't remove Dict.Add(Domain, 0); end; // Now process url.txt file to add to the ListBox all URL found in // it, avoiding to add duplicates UrlFile := TStreamReader.Create('url.txt'); try while not UrlFile.EndOfStream do begin URL := Trim(UrlFile.ReadLine); if URL = '' then continue; Domain := ExtractDomain(UpperCase(URL)); if Dict.TryGetValue(Domain, Value) then // Domain already found, ignore it continue; // Domain not seen before, add to dictionary Dict.Add(Domain, 0); // and add the URL to the ListBox ListBox1.Items.Add(URL); end; finally FreeAndNil(UrlFile); end; finally ListBox1.Items.EndUpdate; end; finally FreeAndNil(Dict); end; end; Share this post Link to post
Guest Posted November 8, 2020 May suggest to look at my last demo, i produced a list mask where duplicated domains are replaced with empty strings, i think it should fit your need, thanks to Francois and his code, you can use both of them. The code is simple and short and support what you called root domains or subs, but i didn't use case insensitive comparison. Share this post Link to post
FPiette 383 Posted November 8, 2020 52 minutes ago, Kas Ob. said: what you called root domains or subs I still do see what he call "root domain". What are the root domains of "www.company.co.uk", "www.ulg.ac.be" and "www.company.com"? Share this post Link to post
clubreseau 0 Posted November 8, 2020 5 hours ago, FPiette said: I still do see what he call "root domain". What are the root domains of "www.company.co.uk", "www.ulg.ac.be" and "www.company.com"? domain = http://welcome.com/hi.html the root or this domain is welcome.com remove http:// until / Share this post Link to post
FPiette 383 Posted November 8, 2020 1 hour ago, clubreseau said: domain = http://welcome.com/hi.html the root or this domain is welcome.com remove http:// until / Sorry but for me "welcome.com" is simply the domain part of the URL. 1 Share this post Link to post