Jump to content
David Schwartz

Trying to DL image yields a web page instead

Recommended Posts

We have a ticketing system where we get tickets for work, and sometimes clients send image files to us for different purposes. They get attached to a ticket and the ticket is forwarded to us. They're supposedly PNG files.

 

Currently, I have to right-click on the image link and then Save As... and save it to a specific folder with a slight change to the name.

 

Then I open an imaging tool we have and I have to click the Open button, find the file, select it, then click some buttons in the program, then click a Save button, saving it with a slightly different name, then move on. I'm trying to simplify an otherwise tedious manual process.

 

I want an app that lets me click on the image file URL, drag it to a form, then DL the image, do all of the stuff, and save it. 

 

Easy peasy, right?

 

I've got the drag&drop and creating the URL and target filename. When I try to load the file via Indy's http.Get method, it loads something, but I finally figured out that it's not an image. It's a frigging HTML file! When I saved it to disk and looked at it in the browser, it's a stinking LOGIN page. I'm like WTF?

 

The initial response from the http.Get is a 302 redirect, which isn't surprising. There's a HandleRedirects checkbox on TidHTTP and I checked it. That lets the http.Get succeed. But then ... how in the heck do I get it to work just like a right-click -> Save As ... ?

 

  MS := TMemoryStream.Create;
  image := TWICImage.Create;
  try
    idHttp1.Request.Accept := 'image/png, image/gif, image/jpg, image/jpeg, image/tif, image/tiff, image/bmp, image/x-bmp;q=0.9,*/*;q=0.8';
    idHttp1.Get(URL, MS);
    hdr := IdHTTP1.Response.ContentType;  // this is: text
    MS.Position := 0;
    memo2.Lines.LoadFromStream(ms);  // saving to a memo to see what it is, since it seems to be text
    // turns out it's an html login page!

 

Share this post


Link to post
Guest
3 hours ago, David Schwartz said:

how in the heck do I get it to work just like a right-click -> Save As ... ?

You need to parse the HTML in full then walk the tags (elements) for <IMG> ( or may be <picture>), most likely you will find the dimensions of these images and their class name (id) within the tag will help to recognize the image needed, from there you need to use idHttp again to grab them, as you said easy peasy !

 

 

Different approach as suggestion :

Ditch this approach of walking web page in full using Delphi and use TWebBrowser, it will handle most those hard-to-do things, like login, redirecting, grabbing and rendering the page, after loading the document event fired, you can parse the final page, here you already reach the point of that example of yours, TWebBrowser's IHTMLDocument2 here already have all the HTML you need, including the images data, now you walk the elements (tags), here a great example from Remy on SO

https://stackoverflow.com/questions/37399382/scraping-images-from-website-in-delphi-with-twebbrowser

in that example you even can parse the login page to fill the login data too.

All what you did in your example including the before that steps from redirecting, logging..., will be mostly one line in Delphi to get the page with TWebBrowser.

Share this post


Link to post
43 minutes ago, Kas Ob. said:

You need to parse the HTML in full then walk the tags (elements) for <IMG> ( or may be <picture>), most likely you will find the dimensions of these images and their class name (id) within the tag will help to recognize the image needed, from there you need to use idHttp again to grab them, as you said easy peasy !

 

You missed the part where I said this HTML page is a LOGIN page. The only image on it is the company's logo. That's not what I'm looking for.

 

The URL is pointing to an IMAGE FILE. Not a login page.

 

The MIME type is "image/png" not "text".

 

And a half-dozen questions on Stack Overflow where people asked how to DL specific files, this is the approach that was recommended. Not one of them even hinted that a TWebBrowser is needed.

 

Right-click --> Save As ... actually saves a PNG file. Not an HTML page. Always. The image is in a download folder. Not on a LOGIN page. 

Edited by David Schwartz

Share this post


Link to post
Guest

@David Schwartz You mentioned "Right-click --> Save As", to my knowledge in Delphi applications there is no component simply does that, that what web browsers do have.

 

20 minutes ago, David Schwartz said:

The URL is pointing to an IMAGE FILE. Not a login page.

Exactly, and a web browser will login and communicate the cookies and the authorization data needed, do you have an idea how to do this with Delphi, no one knows, because there is countless method, most of the time is unique server-side and per website thing.

 

23 minutes ago, David Schwartz said:

The MIME type is "image/png" not "text".

You are accepting "*/*" hence the normal server rightfully can answer with text, but yet most of the time when server is built to restrict you from reaching resources without authorization will simply redirect you from the link provided to another page to force the login.

26 minutes ago, David Schwartz said:

And a half-dozen questions on Stack Overflow where people asked how to DL specific files, this is the approach that was recommended. Not one of them even hinted that a TWebBrowser is needed.

OK, i am sure there is, but do you need Delphi application to do it, by saying no one hinted TWebBrowser is needed, then where you would right-click on, a TBotton, TForm, TImage ...?!!

27 minutes ago, David Schwartz said:

Right-click --> Save As ... actually saves a PNG file. Not an HTML page. Always. The image is in a download folder. Not on a LOGIN page. 

I think you don't fully understand how TWebBrowser works, with TWebBrowser as explained you already have everything, even images and their data, so if an image is JPEG, then grab it and reencode it into PNG and then save it to file, without even right-click.

 

 

anyway, David i am sorry for wasting your time, my bad, never again.

Share this post


Link to post

Looks like your ticket-system requires credentials. Often they're stored in a cookie or a session-cookie, this is why downloading the image using your browser usually works. You will need some kind of authentication for your IdHTTP-request. You could look into the docs of your ticket-system to get information about authentication, or you could use a http-proxy like Fiddler to inspect what your browser is actually sending while fetching the image. This might also work using the usual developer-tools inside your browser. In the end you'll have to look at the request-header. I am pretty sure that there will be some kind of authentication.

Share this post


Link to post

You can try:

 

http(s)://ticketing.system.local/images/imagetobedownloaded.jpg?username={URL_Encoded_Username}&password={URL_Encoded_Password}

 

or, add a header to the request: Authorization=Basic {Base64_Encoded username:password}

 

or, if you are lucky, there will be a (now unhandled) onAuth event in the component. When it fires, you can set the username / password to access the resource.

 

These all require though that the system allows these kind of authentications. If you are unlucky, you still can "log on" first and then download the picture with the component. Just make sure you re-attach the session cookie in the second request.

 

Share this post


Link to post

Yeah, I woke up this morning thinking it's probably looking for a login cookie. I wonder if there's some way to have the http component look up the cookie in the other browser's cache? 

 

I'm clicking and dragging from browser window A to the app, and I guess the http component looks like an unrelated browser window B.

 

I don't really want to force the user into a second login. That said, I could add Name + Pwd fields to this little app and save them, but that's getting into a very muddy area here....

 

There are a few Authorization events in IdHttp component:

* OnAuthorization

* OnProxyAuthorization

* OnSelectAuthorization

* OnSelectProxyAuthorization

 

I guess Right-Click --> Save As ... runs in the security context of browser window A, but a drag&drop runs in the context of browser window B.

 

I wonder if I can set up a proxy of some kind? They really should be the same context.

Edited by David Schwartz

Share this post


Link to post
48 minutes ago, David Schwartz said:

I'm clicking and dragging from browser window A to the app, and I guess the http component looks like an unrelated browser window B.

That is exactly how it looks like from the webapp's perspective. A is logged on, B is not.

49 minutes ago, David Schwartz said:

I guess Right-Click --> Save As ... runs in the security context of browser window A, but a drag&drop runs in the context of browser window B.

Well, not security context but it has a valid session open. But effectively yes.

49 minutes ago, David Schwartz said:

I don't really want to force the user into a second login. That said, I could add Name + Pwd fields to this little app and save them, but that's getting into a very muddy area here....

 

I wonder if I can set up a proxy of some kind? They really should be the same context.

Having your username / password is always going to be way more easy. Normally session identifiers can distinguish between browser instances running on the same PC with the same user. Which means, even if you can "catch" that session ID somehow, it's not going to work from your Delphi app.

 

I'd just drop a TWebBrowser / TEdgeBrowser on a form, make the users use this program to use the ticketing system. From within the program you can easily get the page source, discover and download attachments automatically within the exact same session. Even without drag & drop.

Share this post


Link to post
6 hours ago, aehimself said:

I'd just drop a TWebBrowser / TEdgeBrowser on a form, make the users use this program to use the ticketing system. From within the program you can easily get the page source, discover and download attachments automatically within the exact same session. Even without drag & drop.

hmmmm ... now that's an interesting approach ... I'll still need a way to select each individual DL link because the way we do things in this case uses the same ticket to collect these requests up for a whole month, as there can be a dozen requests or more. We only want to deal with the latest ones, and there might be one, two, or even three at once.

 

Edited by David Schwartz

Share this post


Link to post

ok, I set up a TWebBrowser in one tab and the image processing stuff in another tab. I open the ticket and can drag-n-drop the image link to an area at the top. It switches to the 2nd tab and I click a Process button. That causes the browser to navigate to the image, which it loads into the browser window. Perfect. Now I just need to grab it.

 

But ... while I'm getting the height and width of the image, I'm not getting the image to show up most of the time. Sometimes, but mostly not.

 

I'm finding the image files on the page using IHTMLElement2.getelementsByTagName('img') and returning the first one (since that's all there is on the page).

 

  img := getFirstImage;
  Image1_frame.Height := img.height+2;
  Image1_frame.Width := img.width+2;
  rnd := img as IHTMLElementRender ;
  rnd.DrawToDC(Image1.Canvas.Handle);

Image1 is aligned to Client on a panel Image1_frame. So I set the frame's H & W -- they get set ok.

 

But the image is usually not visible.

 

I see that DrawToDC is deprecated, but I haven't found what to replace it with.

 

What am I missing here?

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×