Jump to content

Recommended Posts

As for as I know, I don't think there is a ready-made one for Delphi. Maybe find a C library and wrap it for Delphi?

Share this post


Link to post
Posted (edited)

Would you mind elaborating why you need it?

In general it is not possible to effectively minify HTML content unless it contains heaps of comments.

In that case, simple gzip compression for http(s) transmission is as effective as (if not more, i.e. the cost to compress it using gzip is less compared to the cost of minifying it + compressing it and the benefit is almost the same)

 

Cheers

Edited by Alex7691
  • Like 1

Share this post


Link to post

I wrote a beautifier - minimizer for XML; HTML should work with minor adjustments. If you wish I can share the source.

Share this post


Link to post
On 3/27/2020 at 9:27 AM, Jacek Laskowski said:

I need this for html mailing (e-mail in html format)

Aside from the tags that cannot be replaced, and the text that you want displayed, I'm not sure what you can remove. It's not like javascript where you can change all of the variable names to 2- or 3-letter codes and remove most of the white space.

 

What do you imagine can be compressed out?

Share this post


Link to post
10 hours ago, David Schwartz said:

Aside from the tags that cannot be replaced, and the text that you want displayed, I'm not sure what you can remove. It's not like javascript where you can change all of the variable names to 2- or 3-letter codes and remove most of the white space.

 

What do you imagine can be compressed out? 

You can:
- delete end of lines (CR, CRLF)

- delete spaces between fields in css: style="color: white; border: 5px" => style="color:white;border:5px"
- delete all spaces and tabs of which there is more than one next to each other (except the <pre></pre> tag)
- if the styles are defined within HTML in <style type="text/css"> then all long style names like bold_red_important_text can be replaced by short names

  • Like 1

Share this post


Link to post
On 3/30/2020 at 1:38 PM, Jacek Laskowski said:

You can:
- delete end of lines (CR, CRLF)

- delete spaces between fields in css: style="color: white; border: 5px" => style="color:white;border:5px"
- delete all spaces and tabs of which there is more than one next to each other (except the <pre></pre> tag)
- if the styles are defined within HTML in <style type="text/css"> then all long style names like bold_red_important_text can be replaced by short names

Will it worth the efforts?

Share this post


Link to post
On 4/3/2020 at 6:27 PM, Fr0sT.Brutal said:

Will it worth the efforts?

Oh, you would be surprised. The last WYSIWYG program that wrote clean HTML code was Front Page Express around 2000. My favorite was creating a simple page, saving it in Word 2003 as HTML and then manually trimming the waste out. I usually could save aywhere between 5 or 20 kilobytes, keeping the original design.

Now imagine trimming the "useless" characters too, used only to make the code readable.

 

We deal with XMLs mostly at work. As an example, a beautified XML is 687722 characters, the same minimized is 455528; that's about 33% "compression".

beauty.PNG

mini.PNG

Share this post


Link to post
3 hours ago, aehimself said:

creating a simple page, saving it in Word 2003 as HTML and then manually trimming the waste out

Word has never been a good HTML generator )) CSS styles in all pages it created were a nightmare.

3 hours ago, aehimself said:

Now imagine trimming the "useless" characters too, used only to make the code readable.

 

3 hours ago, aehimself said:

As an example, a beautified XML is 687722 characters, the same minimized is 455528; that's about 33% "compression".

Well, I tried the first online minifier Google has given me and the source of this very page. Initial 76217, minified 63707, "compression" 16%. On the one hand, that's more than nothing. On the other hand, that makes the page source unreadable, could cause errors (f.ex. in embedded scripts or styles), and finally saves only 13 kB.

Share this post


Link to post
12 hours ago, Fr0sT.Brutal said:

Word has never been a good HTML generator )) CSS styles in all pages it created were a nightmare.

I know, just wanted bring a really negative example. Most of the CMS WYSIWYG editors are no exception though and since they love to use shared libraries I suppose webmail interfaces are guilty as well.

 

12 hours ago, Fr0sT.Brutal said:

Well, I tried the first online minifier Google has given me and the source of this very page. Initial 76217, minified 63707, "compression" 16%. On the one hand, that's more than nothing. On the other hand, that makes the page source unreadable, could cause errors (f.ex. in embedded scripts or styles), and finally saves only 13 kB.

As I mentioned I mainly work with XML, which is similar (but not the same) as HTML; and in XML the results are surprising already. I’ll go ahead and call minifying a compression method (a fairly inefficient one, though) from now on.

As with most compressions the output heavily depends on the input; but usually the larger the input, the better the ratio becomes. The purpose of a compression is not to make it readable, and if it causes errors it’s a bug in the code.

 

All I’m saying is whether if it’s efficient or not - minifying is still viable as this is the only way the output is still accepted by the target software.

 

Just out of curiosity I’ll run my XML minifyer on a HTML page and will post the results.

Share this post


Link to post

@aehimself 95% of real world HTMLs cannot be parsed by XML parser. Also you cannot correctly modify HTML without calculating CSS styles. F.e. in this case <p class="p1">This is text</p>   <p class="p1">This is text2</p>  spaces between p can be removed, but in other case <p class="p2">This is text</p>   <p class="p2">This is text2</p> they should be preserved.

Share this post


Link to post
5 hours ago, David Heffernan said:

How are you generating the original html? 

Are you addressing me?

2 hours ago, aehimself said:

As with most compressions the output heavily depends on the input; but usually the larger the input, the better the ratio becomes. The purpose of a compression is not to make it readable, and if it causes errors it’s a bug in the code.

 

Btw, what you need XML minification for? As far as I'm aware you can rarely see plain XML in the wild; most of formats suppose Zipped XML (docx, fb2.zip). DB engines often use compression internally as well, HTTP protocol allows deflate compression.

Share this post


Link to post

 

10 hours ago, Alexander Sviridenkov said:

@aehimself 95% of real world HTMLs cannot be parsed by XML parser. Also you cannot correctly modify HTML without calculating CSS styles. F.e. in this case <p class="p1">This is text</p>   <p class="p1">This is text2</p>  spaces between p can be removed, but in other case <p class="p2">This is text</p>   <p class="p2">This is text2</p> they should be preserved.

I quickly realized. It threw a nullpointer exception anyway because of embedded JavaScript. And XML does not have that 🙂

 

9 hours ago, Fr0sT.Brutal said:

Btw, what you need XML minification for? As far as I'm aware you can rarely see plain XML in the wild; most of formats suppose Zipped XML (docx, fb2.zip). DB engines often use compression internally as well, HTTP protocol allows deflate compression.

Our overgrown legacy system is talking to an other, overgrown legacy system through a custom protocol, transporting the XMLs in a beautified form. Now the issue is, that sometimes these two systems start to get misaligned because either of a 3rd overgrown legacy system, a bug in their software or a bug in ours. So after fixing a bug these information packets should be re-processed or re-sent. And as the protocol was designed you always need a reference to an earlier packet as well... quite complicated, resulting us having to save all communication between our software instances and their system. Originally, someone decided to keep the XMLs in a BLOB field in a table. Unfortunately the code is so big and disorganized (20+ years old) that changing the saving scheme would need a re-writing half our application plus the web interface. We know it should be done but we also know that it never will be done as the clients do not pay for refactoring.

 

The good thing in legacy systems are the gems what you can find. My favorite goes like this:

Function GetUserID: Integer;
Begin
 Case MenuID Of
  0: Result := -1;
  1: Result := -1;
  2: Result := -1;
  [...]
  100: Result := -1;
  Else Result := -1;
 End;
End;

... and oooooh, those sweet comments 🙂 Too bad that most of them are in Hungarian, otherwise I'd have posted them on DevHumor.

Share this post


Link to post
19 minutes ago, aehimself said:

 

I quickly realized. It threw a nullpointer exception anyway because of embedded JavaScript. And XML does not have that 🙂

 

 

This is only one of hundreds problems. Parsing HTML is much harder than XML.

Share this post


Link to post
7 minutes ago, Alexander Sviridenkov said:

This is only one of hundreds problems. Parsing HTML is much harder than XML.

Thank god we use XMLs in this case 🙂

Share this post


Link to post

 

18 hours ago, David Heffernan said:

How are you generating the original html? 

Answer to this question is crucial for helping him out i think ..

 

Anyway i can recommend "webpack" for this job..

 for example in web development I am using webpack and some other tools/libraries to minify code and it is helping a lot. that said maybe owner of this topic could try to incorporate it somehow in his project..

Edited by Tntman

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×