Jump to content
clubreseau

remove part of string and compare

Recommended Posts

43 minutes ago, FPiette said:

Sorry but for me "welcome.com" is simply the domain part of the URL.

sorry I misspoke then. 

Share this post


Link to post
11 hours ago, FPiette said:

Here is code to add all URL from url.txt except those already existing in the ListBox and avoiding all duplicates.


 


function ExtractDomain(const URL : String) : String;
var
    I, J : Integer;
begin
    I := Pos('://', URL);
    if I <= 0 then
        I := 1
    else
        Inc(I, 3);
    J := Pos('/', URL, I);
    if J <= 0 then begin
        Result := Copy(URL, I, MAXINT);
        Exit;
    end;
    Result := Copy(URL, I, J - I);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
    Index   : Integer;
    Dict    : TDictionary<String, Integer>;
    URL     : String;
    Domain  : String;
    Value   : Integer;
    UrlFile : TStreamReader;
begin
    Dict  := TDictionary<String, Integer>.Create(10000);
    try
        ListBox1.Items.BeginUpdate;
        try
            for Index := ListBox1.Items.Count - 1 downto 0 do begin
                URL := Trim(ListBox1.Items[Index]);
                if URL = '' then begin
                    ListBox1.Items.Delete(Index);
                    continue;
                end;
                Domain := ExtractDomain(UpperCase(URL));
                if Dict.TryGetValue(Domain, Value) then begin
                    // Domain already found, delete from ListBox
                    ListBox1.Items.Delete(Index);
                    continue;
                end;
                // Domain not seen before, add to dictionary and don't remove
                Dict.Add(Domain, 0);
            end;
            // Now process url.txt file to add to the ListBox all URL found in
            // it, avoiding to add duplicates
            UrlFile := TStreamReader.Create('url.txt');
            try
                while not UrlFile.EndOfStream do begin
                    URL := Trim(UrlFile.ReadLine);
                    if URL = '' then
                        continue;
                    Domain := ExtractDomain(UpperCase(URL));
                    if Dict.TryGetValue(Domain, Value) then
                        // Domain already found, ignore it
                        continue;
                    // Domain not seen before, add to dictionary
                    Dict.Add(Domain, 0);
                    // and add the URL to the ListBox
                    ListBox1.Items.Add(URL);
                end;
            finally
                FreeAndNil(UrlFile);
            end;
        finally
            ListBox1.Items.EndUpdate;
        end;
    finally
        FreeAndNil(Dict);
    end;
end;

 

its the opposite I would like.. delete url in listbox1.  search in url.txt all duplicated remove in lixtbo1.

in my listbox1 i have already 1000 URL, i dont want to load url from url.txt I want he look in URl.txt if line in litbox1 are in url.txt then delete the line in listbox.

Edited by clubreseau

Share this post


Link to post
9 hours ago, clubreseau said:

its the opposite I would like.. delete url in listbox1.  search in url.txt all duplicated remove in lixtbo1.

in my listbox1 i have already 1000 URL, i dont want to load url from url.txt I want he look in URl.txt if line in litbox1 are in url.txt then delete the line in listbox.

Why can't you use my example and do the reverse. It is trivial! Maybe a little bit more programming learning is required and also some effort to correctly specify a problem. We could have avoided all this if the question was asked clearly from the beginning

 

function ExtractDomain(const URL : String) : String;
var
    I, J : Integer;
begin
    I := Pos('://', URL);
    if I <= 0 then
        I := 1
    else
        Inc(I, 3);
    J := Pos('/', URL, I);
    if J <= 0 then begin
        Result := Copy(URL, I, MAXINT);
        Exit;
    end;
    Result := Copy(URL, I, J - I);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
    Index   : Integer;
    Dict    : TDictionary<String, Integer>;
    URL     : String;
    Domain  : String;
    Value   : Integer;
    UrlFile : TStreamReader;
begin
    Dict  := TDictionary<String, Integer>.Create(10000);
    try
        ListBox1.Items.BeginUpdate;
        try
            // Load the dictionary with url.txt file
            UrlFile := TStreamReader.Create('url.txt');
            try
                while not UrlFile.EndOfStream do begin
                    URL := Trim(UrlFile.ReadLine);
                    if URL = '' then
                        continue;
                    Domain := ExtractDomain(UpperCase(URL));
                    if Dict.TryGetValue(Domain, Value) then
                        // Domain already found, ignore it
                        continue;
                    // Domain not seen before, add to dictionary
                    Dict.Add(Domain, 0);
                end;
            finally
                FreeAndNil(UrlFile);
            end;

            // Now filter the ListBox to remove duplicate and items wich
            // are already in the dictionary (because they come from url.txt)
            for Index := ListBox1.Items.Count - 1 downto 0 do begin
                URL := Trim(ListBox1.Items[Index]);
                if URL = '' then begin
                    ListBox1.Items.Delete(Index);
                    continue;
                end;
                Domain := ExtractDomain(UpperCase(URL));
                if Dict.TryGetValue(Domain, Value) then begin
                    // Domain already found, delete from ListBox
                    ListBox1.Items.Delete(Index);
                    continue;
                end;
                // Domain not seen before, add to dictionary and don't remove
                Dict.Add(Domain, 0);
            end;
        finally
            ListBox1.Items.EndUpdate;
        end;
    finally
        FreeAndNil(Dict);
    end;
end;

 

  • Thanks 1

Share this post


Link to post
13 hours ago, FPiette said:

Why can't you use my example and do the reverse. It is trivial! Maybe a little bit more programming learning is required and also some effort to correctly specify a problem. We could have avoided all this if the question was asked clearly from the beginning

 


function ExtractDomain(const URL : String) : String;
var
    I, J : Integer;
begin
    I := Pos('://', URL);
    if I <= 0 then
        I := 1
    else
        Inc(I, 3);
    J := Pos('/', URL, I);
    if J <= 0 then begin
        Result := Copy(URL, I, MAXINT);
        Exit;
    end;
    Result := Copy(URL, I, J - I);
end;

procedure TForm1.Button1Click(Sender: TObject);
var
    Index   : Integer;
    Dict    : TDictionary<String, Integer>;
    URL     : String;
    Domain  : String;
    Value   : Integer;
    UrlFile : TStreamReader;
begin
    Dict  := TDictionary<String, Integer>.Create(10000);
    try
        ListBox1.Items.BeginUpdate;
        try
            // Load the dictionary with url.txt file
            UrlFile := TStreamReader.Create('url.txt');
            try
                while not UrlFile.EndOfStream do begin
                    URL := Trim(UrlFile.ReadLine);
                    if URL = '' then
                        continue;
                    Domain := ExtractDomain(UpperCase(URL));
                    if Dict.TryGetValue(Domain, Value) then
                        // Domain already found, ignore it
                        continue;
                    // Domain not seen before, add to dictionary
                    Dict.Add(Domain, 0);
                end;
            finally
                FreeAndNil(UrlFile);
            end;

            // Now filter the ListBox to remove duplicate and items wich
            // are already in the dictionary (because they come from url.txt)
            for Index := ListBox1.Items.Count - 1 downto 0 do begin
                URL := Trim(ListBox1.Items[Index]);
                if URL = '' then begin
                    ListBox1.Items.Delete(Index);
                    continue;
                end;
                Domain := ExtractDomain(UpperCase(URL));
                if Dict.TryGetValue(Domain, Value) then begin
                    // Domain already found, delete from ListBox
                    ListBox1.Items.Delete(Index);
                    continue;
                end;
                // Domain not seen before, add to dictionary and don't remove
                Dict.Add(Domain, 0);
            end;
        finally
            ListBox1.Items.EndUpdate;
        end;
    finally
        FreeAndNil(Dict);
    end;
end;

 

 

You rock !

 

last question what is the commande to delete line in url.txt ?

Share this post


Link to post
10 hours ago, clubreseau said:

what is the command to delete line in url.txt ?

1) Open the file if read mode (The "Source"),

2) Open a temporary file in write more,

3) If end of source file goto line 7

4) Read a line from the source

5) If the line must be kept, wtri to temporary file

6) Loop  to line 3

7) Close source

8 Close temporary

9) Rename source to .bak

10) Rename temporary to source name

 

To read the source file, use TStreamReader.

To write the temporary file, use TStreamWriter.

 

Share this post


Link to post

I wonder.

I used test speed.rar.

Under D10.2.3 IDE the numbers are 17xx

But executing the exe the numbers are 28xx

almost twice as much.

How come?

 

Edited by limelect

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×