Jacek Laskowski 57 Posted March 26, 2020 I'm looking for a library/class to minify (compress) HTML files. Is there such a thing? Share this post Link to post
Edwin Yip 154 Posted March 26, 2020 As for as I know, I don't think there is a ready-made one for Delphi. Maybe find a C library and wrap it for Delphi? Share this post Link to post
Alex7691 7 Posted March 27, 2020 (edited) Would you mind elaborating why you need it? In general it is not possible to effectively minify HTML content unless it contains heaps of comments. In that case, simple gzip compression for http(s) transmission is as effective as (if not more, i.e. the cost to compress it using gzip is less compared to the cost of minifying it + compressing it and the benefit is almost the same) Cheers Edited March 27, 2020 by Alex7691 Share this post Link to post
Jacek Laskowski 57 Posted March 27, 2020 I need this for html mailing (e-mail in html format) Share this post Link to post
aehimself 396 Posted March 28, 2020 I wrote a beautifier - minimizer for XML; HTML should work with minor adjustments. If you wish I can share the source. Share this post Link to post
David Schwartz 426 Posted March 29, 2020 On 3/27/2020 at 9:27 AM, Jacek Laskowski said: I need this for html mailing (e-mail in html format) Aside from the tags that cannot be replaced, and the text that you want displayed, I'm not sure what you can remove. It's not like javascript where you can change all of the variable names to 2- or 3-letter codes and remove most of the white space. What do you imagine can be compressed out? Share this post Link to post
Jacek Laskowski 57 Posted March 30, 2020 10 hours ago, David Schwartz said: Aside from the tags that cannot be replaced, and the text that you want displayed, I'm not sure what you can remove. It's not like javascript where you can change all of the variable names to 2- or 3-letter codes and remove most of the white space. What do you imagine can be compressed out? You can: - delete end of lines (CR, CRLF) - delete spaces between fields in css: style="color: white; border: 5px" => style="color:white;border:5px" - delete all spaces and tabs of which there is more than one next to each other (except the <pre></pre> tag) - if the styles are defined within HTML in <style type="text/css"> then all long style names like bold_red_important_text can be replaced by short names Share this post Link to post
Fr0sT.Brutal 900 Posted April 3, 2020 On 3/30/2020 at 1:38 PM, Jacek Laskowski said: You can: - delete end of lines (CR, CRLF) - delete spaces between fields in css: style="color: white; border: 5px" => style="color:white;border:5px" - delete all spaces and tabs of which there is more than one next to each other (except the <pre></pre> tag) - if the styles are defined within HTML in <style type="text/css"> then all long style names like bold_red_important_text can be replaced by short names Will it worth the efforts? Share this post Link to post
aehimself 396 Posted April 4, 2020 On 4/3/2020 at 6:27 PM, Fr0sT.Brutal said: Will it worth the efforts? Oh, you would be surprised. The last WYSIWYG program that wrote clean HTML code was Front Page Express around 2000. My favorite was creating a simple page, saving it in Word 2003 as HTML and then manually trimming the waste out. I usually could save aywhere between 5 or 20 kilobytes, keeping the original design. Now imagine trimming the "useless" characters too, used only to make the code readable. We deal with XMLs mostly at work. As an example, a beautified XML is 687722 characters, the same minimized is 455528; that's about 33% "compression". Share this post Link to post
Fr0sT.Brutal 900 Posted April 4, 2020 3 hours ago, aehimself said: creating a simple page, saving it in Word 2003 as HTML and then manually trimming the waste out Word has never been a good HTML generator )) CSS styles in all pages it created were a nightmare. 3 hours ago, aehimself said: Now imagine trimming the "useless" characters too, used only to make the code readable. 3 hours ago, aehimself said: As an example, a beautified XML is 687722 characters, the same minimized is 455528; that's about 33% "compression". Well, I tried the first online minifier Google has given me and the source of this very page. Initial 76217, minified 63707, "compression" 16%. On the one hand, that's more than nothing. On the other hand, that makes the page source unreadable, could cause errors (f.ex. in embedded scripts or styles), and finally saves only 13 kB. Share this post Link to post
David Heffernan 2345 Posted April 5, 2020 How are you generating the original html? Share this post Link to post
aehimself 396 Posted April 5, 2020 12 hours ago, Fr0sT.Brutal said: Word has never been a good HTML generator )) CSS styles in all pages it created were a nightmare. I know, just wanted bring a really negative example. Most of the CMS WYSIWYG editors are no exception though and since they love to use shared libraries I suppose webmail interfaces are guilty as well. 12 hours ago, Fr0sT.Brutal said: Well, I tried the first online minifier Google has given me and the source of this very page. Initial 76217, minified 63707, "compression" 16%. On the one hand, that's more than nothing. On the other hand, that makes the page source unreadable, could cause errors (f.ex. in embedded scripts or styles), and finally saves only 13 kB. As I mentioned I mainly work with XML, which is similar (but not the same) as HTML; and in XML the results are surprising already. I’ll go ahead and call minifying a compression method (a fairly inefficient one, though) from now on. As with most compressions the output heavily depends on the input; but usually the larger the input, the better the ratio becomes. The purpose of a compression is not to make it readable, and if it causes errors it’s a bug in the code. All I’m saying is whether if it’s efficient or not - minifying is still viable as this is the only way the output is still accepted by the target software. Just out of curiosity I’ll run my XML minifyer on a HTML page and will post the results. Share this post Link to post
Alexander Sviridenkov 356 Posted April 5, 2020 @aehimself 95% of real world HTMLs cannot be parsed by XML parser. Also you cannot correctly modify HTML without calculating CSS styles. F.e. in this case <p class="p1">This is text</p> <p class="p1">This is text2</p> spaces between p can be removed, but in other case <p class="p2">This is text</p> <p class="p2">This is text2</p> they should be preserved. Share this post Link to post
Fr0sT.Brutal 900 Posted April 5, 2020 5 hours ago, David Heffernan said: How are you generating the original html? Are you addressing me? 2 hours ago, aehimself said: As with most compressions the output heavily depends on the input; but usually the larger the input, the better the ratio becomes. The purpose of a compression is not to make it readable, and if it causes errors it’s a bug in the code. Btw, what you need XML minification for? As far as I'm aware you can rarely see plain XML in the wild; most of formats suppose Zipped XML (docx, fb2.zip). DB engines often use compression internally as well, HTTP protocol allows deflate compression. Share this post Link to post
David Heffernan 2345 Posted April 5, 2020 2 hours ago, Fr0sT.Brutal said: Are you addressing me? Of course not. You aren't generating the html. Share this post Link to post
aehimself 396 Posted April 5, 2020 10 hours ago, Alexander Sviridenkov said: @aehimself 95% of real world HTMLs cannot be parsed by XML parser. Also you cannot correctly modify HTML without calculating CSS styles. F.e. in this case <p class="p1">This is text</p> <p class="p1">This is text2</p> spaces between p can be removed, but in other case <p class="p2">This is text</p> <p class="p2">This is text2</p> they should be preserved. I quickly realized. It threw a nullpointer exception anyway because of embedded JavaScript. And XML does not have that 🙂 9 hours ago, Fr0sT.Brutal said: Btw, what you need XML minification for? As far as I'm aware you can rarely see plain XML in the wild; most of formats suppose Zipped XML (docx, fb2.zip). DB engines often use compression internally as well, HTTP protocol allows deflate compression. Our overgrown legacy system is talking to an other, overgrown legacy system through a custom protocol, transporting the XMLs in a beautified form. Now the issue is, that sometimes these two systems start to get misaligned because either of a 3rd overgrown legacy system, a bug in their software or a bug in ours. So after fixing a bug these information packets should be re-processed or re-sent. And as the protocol was designed you always need a reference to an earlier packet as well... quite complicated, resulting us having to save all communication between our software instances and their system. Originally, someone decided to keep the XMLs in a BLOB field in a table. Unfortunately the code is so big and disorganized (20+ years old) that changing the saving scheme would need a re-writing half our application plus the web interface. We know it should be done but we also know that it never will be done as the clients do not pay for refactoring. The good thing in legacy systems are the gems what you can find. My favorite goes like this: Function GetUserID: Integer; Begin Case MenuID Of 0: Result := -1; 1: Result := -1; 2: Result := -1; [...] 100: Result := -1; Else Result := -1; End; End; ... and oooooh, those sweet comments 🙂 Too bad that most of them are in Hungarian, otherwise I'd have posted them on DevHumor. 1 Share this post Link to post
Alexander Sviridenkov 356 Posted April 5, 2020 19 minutes ago, aehimself said: I quickly realized. It threw a nullpointer exception anyway because of embedded JavaScript. And XML does not have that 🙂 This is only one of hundreds problems. Parsing HTML is much harder than XML. Share this post Link to post
aehimself 396 Posted April 5, 2020 7 minutes ago, Alexander Sviridenkov said: This is only one of hundreds problems. Parsing HTML is much harder than XML. Thank god we use XMLs in this case 🙂 Share this post Link to post
Tntman 14 Posted April 5, 2020 (edited) 18 hours ago, David Heffernan said: How are you generating the original html? Answer to this question is crucial for helping him out i think .. Anyway i can recommend "webpack" for this job.. for example in web development I am using webpack and some other tools/libraries to minify code and it is helping a lot. that said maybe owner of this topic could try to incorporate it somehow in his project.. Edited April 5, 2020 by Tntman Share this post Link to post