Jump to content

All Activity

This stream auto-updates     

  1. Past hour
  2. David Schwartz

    web scraping or web parsing for project?

    That's pretty interesting... ScrapeStack lets you unwind the javascript encoding, but they say it returns a raw HTML page, which means it still has to be parsed. So CSS can still trip you up unless the headless browser used for unwinding javascript handles that for you as well. (They don't say, but ... it probably does.) So even if there's no javascript that needs to be processed, you might want to select that option anyway just to get the CSS processed. I played with SerpStack a bit, but then I found ScaleSerp and I like it better. Both return a JSON data packet. My only beef with it is that it has several types of queries, and while they have very similar results, it's as if different people wrote each one, because the field names used in the JSON data for the same data items often aren't the same. So the code that processes each of the different types of queries needs to be different. Luckily I'm only interested in a half-dozen of the fields returned in each of the different JSON packets. But it's just odd to see. I'm mentioning this because ... this represents the "state of the art" at this point in time. We're early-on in this technology curve. (I've been playing with this code for a couple of months, and the data results have changed several times because Google changes their page layouts and encoding mechanisms a lot. I've even found some bugs in the JSON data. It's worrysome to think of publishing a product that relies on something like this where you KNOW that what you're working with is a moving target.)
  3. Today
  4. Serge_G

    Interbase Update

    What if you use a trigger BEFOREUPDATE on table Master ?
  5. Was playing with IdSSLIOHandlerSocketOpenSSL1 on D11. Seems a 32-bit Windows release lacks the Indy DLL's? Got exception "could not load ssl library" while it works for a 64 bit release.
  6. Lajos Juhász

    TFDConnection.ExecSQL with ResultSet

    That's correct the ExecSQL will create the TFDQuery object without an owner and you've to free (no matter if there was an exception or not).
  7. corneliusdavid

    web scraping or web parsing for project?

    I would suggest looking at one of Embarcadero's latest acquisitions, ApiLayer; they have a web-scraping API. Don't know anything about it but "Turn web pages into actionable data" (scraped from their website) sounds like what you're trying to do.
  8. vfbb

    Skia4Delphi

    @Rollo62 Explained Svg limitations in documentation, and see how to avoid in the considerations.
  9. vfbb

    Skia4Delphi

    We are very pleased to announce the 2.0 release of the Skia4Delphi project. The library was almost completely rewritten, bringing many new features such as visual components (VCL/FMX), easy installation, easy configuration, miscellaneous error handling, and much more. Now it also has many more features as well as more examples. See main changes: v2.0.0 Added support for Delphi 11 Added support for OSX ARM 64-bit Added Setup (friendly install) Added IDE plugin to enable the skia in your project: replacing manual changes in project settings, deployment and dlls that were needed before. Now it's automatic, just right click on your project inside the IDE then "Enable Skia" Built-in by default the "icudtl.dat" file in dll Added the controls TSkLottieAnimation, TSkPaintBox and TSkSvg Added SkParagraph (multiple styles in text) Added support for Telegram stickers (using TSkLottieAnimation) Added error handling in sensitive methods Added the source of our website: https://skia4delphi.org Library rewritted Improved the integration with apple's MetalKit Improved library usage, like replacing SkStream to TStream Fixed AccessViolation using TStream Fixed issue with transparent bitmap in Vcl sample Fixed minor issues Explained Svg limitations in documentation We are now using the latest inactive milestone release of Skia (M88), because the others are updated frequently. Deprecated non-windows platforms in Delphi 10.3 Rio or older Homepage: skia4delphi.org GitHub: https://github.com/viniciusfbb/skia4delphi A liittle preview Setup Just download the Skia4Delphi_2.0.0_Setup.exe attached in Releases page. Enabling your project Telegram sticker (tgs files): SkParagraph: WebP: kung_fu_panda.webp Format Size Png (100% quality) 512 KB Jpeg (80% quality) 65.1 KB WebP (80% quality) 51.6 KB Controls And much more...
  10. David Schwartz

    web scraping or web parsing for project?

    Hello guy, before you do this you should know that in most cases, what you're proposing to do is going to violate copyright laws in most countries. Are you sure you want to take that risk? To answer your question, "screen-scraping" used to be a means by which applications written to run on PCs were made to interact with virtual terminals attached to what were usually bigger time-sharing computers (mainframs and mini-computers). There would be a form on a screen and the data fields were in certain positions that were always fixed on the screen. (This was a time when terminals were like TVs that showed text as green or white on a black background. They had 80 columns and around 24 lines.) The software would "scrape" (basically, it meant "copy") the data out of those fixed locations and put it into its own variables. (Crawling is not really related to what you're asking.) For web sites, this approach (page scraping) isn't very practical for a number of reasons, unless the material you're trying to scrape is from a report that uses fixed-width typeface on a simulated paper sheet. Rather, It's much easier to just take the raw page data, which is in HTML, and "parse" it to find the fields you want. But this is not nearly as simple as it sounds, especially if CSS is involved. Parsing is a rather complex topic and is usually taught as part of a computer science course on compiler construction. There are two parts to building a compiler: The first is parsing the input file; the second is emitting some other code based on the input stream, usually a bunch of machine instructions that get "fixed-up" by another program called a "linker". Most compilers are called "two-pass" compilers. (Some do more than two passes.) In the first pass, the parser builds what's called a "parse tree" that the second pass crawls down and processes, either depth-first or breadth-first. As it crawls that tree, it emits code. If there's any kind of optimization going on, the tree can be adjusted first to eliminte unused code and combine parts of the tree that are sematic duplicates. Every programming language, and most things even down to simple math equations, need to be parsed in some way. In this case, what you're emitting is not machine instructions but the content you find in and around the HTML tags. Also, be aware that CSS is often used to position content in different places on the page and it does not have to appear in the same order as the text around it. That is, the page could lay down a bunch of boxes or tables, then the headers, then the footers, then the data -- either in columns or rows. And there can be text data with CSS tags that say to hide it, or display it in the same color as the background color so it's invisible, and that can be done to scatter garbage all over the place that your parser will think is legitimate content, but a users looking at it in their web browser won't see any of it. Parsing the HTML (which is a linear encoding) won't give you any clue that the content is being generated in some apparently random pattern. So what your parser puts out looks could look like someone sucked up the words in a chapter of a book and just randomly spat them out across the pages. You'll have to sit there and study it and look closely at the CSS and figure out how to unravel it all. Then the next page you get could use a different ordering and you'll be back to square one. The thing is, with the increase in the use of client-side javascript and CSS to encode content and render it in a non-linear order, it's getting harder and harder to algorithmically extract content from a web page. It should also be fairly simple to render the input data on the virtual page that's displayed on the screen, which is what the web browser seems to be showing you. But my experience is that's not necessarily the case. You could always take the bitmap image from the browser's canvas and process it with OCR and see if that's simpler than parsing the input stream. It really just depends on how much effort the vendor wants to put into making it difficult to extract their content. For example, run a simple google search query and then look at the page source. Copy it all and paste it into a text editor and turn on word-wrapping. Good luck making sense of it! Repeat this by running the SAME QUERY a few times, then do a diff on each of the resulting files (if it's not obvious just looking at them) and you'll see what you're dealing with. Pick some phrases in the original page and search for them in the text editor. Some you'll find, and most you won't. Yes, it's fully HTML-compliant code. Yes it can be parsed with any HTML parser. But I guarantee you that a lot of the text you want is embedded inside of encoded javascript methods, and no two take the same approach. To make matters worse, they change the names of the javascript functions and parameter names from one query to the next, so you can't even build a table of common functions to look for. So you'll need a javascript parser that can execute enough of it to extract the content, but not go any further. A lot of it is structured like a macro language, and it uses multiple levels of "encoding" or "embedding". When you try unwinding it, you don't know how deep it goes, and if you're not fully executing the code at the same time as parsing it, you can end up "over-zooming" and miss what you're looking for. They can also bury little land-mines in the code and if you try decoding something it can get scrambled if you don't have something else loaded in the run-time environment that's totally unrelated. Or it could use a function loaded way earlier that doesn't look related that unscrambles some important bit of text that will stop the parser or run-time interpreter if it's not correct, and what you'll end up with is just a bunch of gibberish. It used to be fairly easy to parse Google search result pages (SERPs), up until 2-3 years ago when they started making them quite hard. Some other sites are starting to do this now, like Amazon, eBay, and others. Why? Because it's the only way to deter people from stealing their copyrighted content! They know that YOU DON'T GIVE A RIP about THEIR COPYRIGHTS. And it's a lot easier now to use multiple layers of javascript encoding to hide content than anything else. I've also seen CSS used to embed content. CSS is NOT HTML. You can parse HTML and end up with just a bunch of CSS tags and little if any content. Good luck with that as well. So now you need a CSS parser! Is your head spinning yet? Ask yourself how much you think the client is willing to pay you to figure all of that out, and realize that it's all being constantly changed by the vendor, so it's a moving target that will work one day and not the next. Honestly, if you're going to risk stealing copyrighted content from other sites, hire people in China or India and pay them per-piece to copy stuff by hand from the other systems into yours. It will be a lot faster than trying to write and maintain code that parses all of this stuff. (I found there are a few companies that do this for Google SERPs and they charge a bit to get the data you want. Maybe some exist for the sites you're interested in robbing of their intellectual property?) Even if it's not that complex to parse their data, pray that you don't get caught. TIP: the fact that you asked your question the way you did tells me you're looking at a minimum of 6-9 months to write the software you think you need, if you can even get it working steadily, because you're going to have to learn a lot about parsing first. I suggest you hire someone with a computer science degree who already knows this stuff. TIP: writing a parser can look seductively simple at first. It's not. And the obvious ways of digging into it without understanding anything about parsing will usually lead you into a dead-end alley. HTML is fairly easy to parse, and there are several components you can get that do an excellent job of it. But again, CSS and javascript are not HTML, and they'll stop you dead in your tracks if they're used to obfuscate content in any way, or even in the case of CSS to do page layout in a non-linear fashion.
  11. Yesterday
  12. Anders Melander

    DevExpress PDF Viewer

    I'm sorry about that, although I don't quite understand why you find it offensive. I'm sure Joe knows his stuff but appearance does matter. I actually looked at the site trying to find more information about the library but eventually gave up. Now that I look at it again I can see that what I thought was just more bullet points is actually links to sub pages. After that I had to read through all the FAQ to deduce that it's not a native Delphi library. If I had actually been looking to buy a PDF library (I'm not since I have a DevExpress subscription) I would have taken one look at that page and quickly moved on. Joe or not.
  13. Dave Nottage

    IOS 15 suppoort

    With the direct download method I used, it took around 30-40 minutes total to update I'm using Xcode 13 also with Delphi 10.4.2 and it appears to be working OK, aside from the issue mentioned earlier about deploying for App Store when using Debug config.
  14. Rolf Fankhauser

    How to use Open XML SDK in C++Builder

    Ok, I found Bruno Sonnino's Blog. He shows how to create OpenXML files in Delphi without using the SDK but directly generating the XML code (but it looks rather cumbersome...). I think that should also be possible in C++Builder. So, many thanks to Bruno for this tutorial and sharing the code !! But hints how to use the SDK in C++Builder would be still appreciated...
  15. Hi TFDConnection has following method function ExecSQL(const ASQL: String; var AResultSet: TDataSet): LongInt; When using this do I need to free returned AResultSet? And if yes then if exception will be raised when opening query does it mean that there will be memory leak of TFDQuery object?
  16. aehimself

    RADStudio-DemoKit/11-demos/

    Were there any changes to TZipFile alltogether? I'd love to see native LZMA support and the ability to open some "corrupted" ZIP files too (like what 7Zip does). I have a file which was compressed by a C# code. Delphi either throws a "Invalid central header signature" exception, or manages to open the archive but sees no files in it.
  17. Rolf Fankhauser

    How to use Open XML SDK in C++Builder

    I would like to use Open XML SDK in C++Builder. I found instructions, docs and forums only for VS C++. I tried the Direct Office library for Delphi but it doesn't support C++Builder. Thanks in advance for any hints!
  18. Joseph MItzen

    Missing The Old Forums

    I won't go into the whole story, but I once had an online exchange with David Intersimone, then VP of Developer Relations, about a survey he'd run and tried to explain to him how it was only going to tell him what he wanted to hear and not what he needed to know. Despite my being a professional data analyst at the time, it felt like he was trying to lecture me on how surveys worked. I tried to explain that when you survey the first 500 people to buy a new release, you're missing those who chose not to upgrade because of bugs, price or features, which were the three biggest complaints at the time (still are). Of course, it also missed those who had already opted to leave for another platform. That's when David gave me insight into the thought processes at Embarcadero and I knew there was no helping them... he wrote "People leave Delphi for C#; people leave C# for Delphi; so we just keep on doing what we're doing." In short, they're like a black box whose outputs are not influenced by the inputs. This survey asked questions like (approximately) "What's the most awesome thing you can think of about Delphi?" and nothing about what's your biggest problem with it. Answers to this question (from the biggest fans who were quickest to order the upgrade) were used within days by the marketing team in a press release and a blog entry, showing the survey was nothing more than quote mining for marketing. Marco Cantu tried to claim that they have lots of surveys - dozens, hundreds, thousands! - where they ask all the questions I suggested and more, although no one seems to have ever received an invitation to take one of these alleged surveys. In another instance a person talked about being a subject domain expert but no Delphi experience who was hired by a Delphi shop. They took him to... he called it a user group meeting, but it sounded like one of the old World Tour events. I'll skip over his general impressions, but he was unsettled by the small number and age of the participants. He brought this up after the meeting with the Embarcadero employee who was there and said that the employee responded to him: "We don't like for new people to show up at these meetings; they're filled with angry middle-aged white men". I've got a few more examples I won't go into, but I'll say I've seen and heard enough to be convinced that for the *majority* of Embarcadero employees, the concerns of customers aren't really high on their agenda (David Millington being a notable exception). When two of the people who actually develop Delphi showed up in the old forums one day and someone brought up a critical bug they were experiencing and not getting help with, one of them wrote: "See - this is why we don't like to come here." Tony de la Llama, who was in sales, showed up in the forums once and he was a nice, friendly guy. When someone told him about a problem they were having, he expressed personal sorrow over their experience and promised to escalate the issue personally so they'd get a fix. He said we were all swell people and he really enjoyed interacting with the users and planned on doing it again. Days later the EMBT CEO blamed Tony personally for low Delphi sales (I was told this by another EMBT employee) and fired him right before Christmas. 😞 So anyway, they did not have a history of listening to their customers, rarely showed up in the forum and when they did it often didn't go well. And the one person who really listened got fired. Hence I don't think it's that great a loss. (Again, I want to give David Millington credit for being one of the few EMBT employees to visit forums, including Reddit, and actually offer help to users with problems and listen to their suggestions and feedback. He's the only one I've seen do so since Mr. de la Llama.)
  19. Joseph MItzen

    Missing The Old Forums

    I've been thinking about it, and since C++ isn't exactly an obscure language, maybe they've gone to the same place non-C++ Builder C++ users go to.
  20. Stéphane Wierzbicki

    DevExpress PDF Viewer

    Did you see TMS announcement ? https://www.tmssoftware.com/site/blog.asp?post=836
  21. dwrbudr

    DevExpress PDF Viewer

    ImageEn had PDF support. One DLL about 4MB is required for that. I've tested it and it works well. https://www.imageen.com/info/history.html
  22. Dany Marmur

    DevExpress PDF Viewer

    I do not think this is an ok comment. We have a load of well-written libs with very old sites. Not just Joe's. Another discourse is the thing when new kids look and Delphi stuff looks "old". That is another discourse altogether.
  23. Remy Lebeau

    Missing The Old Forums

    ♫ Into the Unknooo-ooo-ooo-own... ♫ (sorry, couldn't help myself )
  24. That notice has been in the documentation since XE4. All ANSI types (P/AnsiChar, AnsiString/(N), ShortString) were disabled in the NEXTGEN mobile compilers in XE3. Under NEXTGEN, UTF8String and RawByteString were re-enabled in 10.1 Berlin, and then the rest of the ANSI types were re-enabled in 10.4 Sydney when NEXTGEN was dropped completely so mobile platforms now match desktop platforms in terms of language features. Looks like the ShortString documentation has not been updated yet to reflect that. Yes. Only in XE3 through 10.3. In 10.4 onward, ShortString can be used on mobile. No. Because ALL ANSI types were initially eliminated, as mobile platforms are Unicode-based. Then the types slowly started being re-introduced as needs arose, until eventually Embarcadero decided that having separate compiler architectures just wasn't working out. Hardly.
  25. billyb

    IOS 15 suppoort

    Should have asked before, Can update to Xcode 13 and sill use Delphi 10.4 or do I have to also update to Delphi 11? Bill
  26. Remy Lebeau

    How to connect to office 365 using proxy server

    ?
  27. Marketing strategy? Really? Just because you disagree with the decision to deprecate it doesn't make it a stupid decision, driven by "marketing". The deprecation of "object" was also controversial but that wasn't driven by marketing either. I'm pretty sure management, marketing and sales couldn't care less about these things.
  28. Dalija Prasnikar

    IOS 15 suppoort

    For iOS 15 you need Xcode 13. You can find direct download links here https://developer.apple.com/download/all/
  1. Load more activity
×