Jump to content
pyscripter

XML Parsing and Processing

Recommended Posts

XML used to be considered the universal data format.  Now is a bit passé with JSON, YAML etc being "in".   I got involved in XML parsing, since SVG files are in XML format.

 

XML Delphi support

At the surface the XML support in Delphi is very good:

  • You have TXMLDocument/IXMLDocument offering high-level support (Xml.xmldoc)
  • Support for the standard DOM interfaces (Xml.XmlDom)
  • Multiple implementations including (MSXML, OmniXML, OpenXML and more)
  • Ability to plug in your own implementation
  • Multiple platform support.

The most common way of accessing XML is through TXMLDocument/IXMLDocument.   However there is a big catch: PERFORMANCE.  Say you want to use MSXML and you specify 'MSXML' as your DefaultDomVendor.  (or you simply include the implementation unit Xml.Win.msxmldom in your uses clause).  Your create an XML document and you access the top node:

var Doc: IXMLDocument = TXMLDocument.Create(nil);
var Node: IXMLNode := Doc.DocumentElement;

Node is an IXMLInterface implemented by TXMLNode (TInterfacedObject defined in Xml.XmlDoc).

TXMLNode wraps an IDOMNode stored in a private field FDOMNode.  IDOMNode is defined in Xml.Xmldom.

The IDOMNode is implemented by the used vendor in this case Xml.Win.msxmldom by a class TMSDOMNode

TMSDOMNode (also a TInterfacedObject) wraps IXMLDOMNode stored in a private field FMSNode.  IXMLDOMNode is defined in Winapi.msxml.

 

As a result when you create any IXMLNode, a TXMLNode is created and this creates a TMSDOMNode which points to an IXMLDOMNode.   Any call/property access to IXMLNode translates in a call of IDOMNode which then calls IXMLDOMNode.  The created TInterfaced objects also need to be destroyed when you release your XML Node. The same two-level indirection applies to all XML objects (attributes, Children) and cause a huge degradation of performance.

 

Conclusion

If you care about speed forget about TXMLDocument.  You can access the Vendor implementation or even better in the case of MSXML the Microsoft ActiveX objects directly:

uses WinAPI.msxml
var  XML: IXMLDOMDocument3 := CoDOMDocument60.Create;
XML.loadXML(XMLString);
var   DocNode: IXMLDOMNode := XML.documentElement;

In SVG parsing and processing accessing directly the ActiveX objects reduced processing time by more than 50%.

 

Additional tip

A common performance pitfall with MSXML is explained in http://www.gerixsoft.com/blog/delphi/msxml.   The fastest way to iterate through ChildNodes is via getFirstChild/nextSibling and Attributes via nextNode.

 

 

 

 

 

Edited by pyscripter
  • Like 4

Share this post


Link to post

If you are interested in performance, http://kluug.net (Ondřej Pokorný) has two excellent libraries; the freeware OmniXML and the commercial OXml. I'm user and customer of OXml

 

On this page: http://kluug.net/oxml.php  choose the Performance tab and you will see a thorough comparision between most Delphi XML libraries. OXml shows about 10 times faster than MSXML

 

I purchased both OXml and OExport, and quite happy with them

 

Note: No, I'm not related to Kluug in any possible way, other than customer/user of his librtaries

  • Like 5

Share this post


Link to post

I don't necessarily agree that the default XML document is bad, considering all the features that you get. I do agree that if it let you be faster but with access to fewer features than that might be a compromise that developers may be willing to make. And I think that other offerings basically take that approach: they don't necessarily give you everything and you compromise on that. 

 

One thing that bothers me a little bit is that there seems to be no notion of using SAX processing which is probably much better suited to SVG than a full blown DOM. 

 

This is why I don't necessarily agree with what you're saying. 

 

A

Share this post


Link to post
2 hours ago, Andrea Raimondi said:

This is why I don't necessarily agree with what you're saying. 

Where exactly do we disagree?  I started by saying the features are great.  But if xml processing is a performance bottleneck you can improve performance drastically (see below) by accessing the underlying implementation directly with minimal changes to your code, since implementations follow the standard DOM interfaces.

Edited by pyscripter

Share this post


Link to post
18 hours ago, Javier Tarí said:

If you are interested in performance, http://kluug.net (Ondřej Pokorný) has two excellent libraries; the freeware OmniXML and the commercial OXml. I'm user and customer of OXml

OmniXML is included in Delphi, but I am not sure how it compares to the standalone one.

 

If you look at the benchmarks of OXml you will see that if you access the library through Delphi XML and the OXml vendor  you would consume almost 10 times more memory and  5.7 times more CPU time compared to raw access.  The navigation time increases 32 times!! Which is exactly the point I was trying to make.

 

image.thumb.png.b16fa4fd3de742a43be6cbd3b03603f5.png

Edited by pyscripter

Share this post


Link to post
15 hours ago, pyscripter said:

Where exactly do we disagree?  I started by saying the features are great.  But if xml processing is a performance bottleneck you can improve performance drastically (see below) by accessing the underlying implementation directly with minimal changes to your code, since implementations follow the standard DOM interfaces.

We disagree on the notion that the performance is an issue 🙂 

I think it's not, given how much stuff you can do out of the box. 

 

I would also like to point out that OXML is quite a neat library, we use it for importing Excel (xlsx ones) and it's awesome. 

  • Like 1

Share this post


Link to post
On 8/29/2020 at 5:44 PM, pyscripter said:

If you care about speed forget about TXMLDocument

If you care about speed use a SAX parser instead. The DOM model has different priorities.

  • Like 1

Share this post


Link to post
57 minutes ago, Anders Melander said:

If you care about speed use a SAX parser instead

... of TXMLDocument:classic_biggrin:.  Fully agree.

 

MSXML contains a SAX reader/writer.  Delphi implementation examples at http://www.craigmurphy.com/bug/.

 

XMLLite is a good alternative to SAX on Windows.  See the note about push and pull parsers.  Similar speed and much easier to program with.  And there is a Delphi wrapper (just a single unit to add to  your project).

Edited by pyscripter

Share this post


Link to post

I am using sOmniXmlVendor for mobile app compatibility.  The Xml.XMLIntf.IXMLNode.XML is not available.

How can I get the xml of an IXMLnode ? or how can I copy an IXMLnode to a new IXMLdocument?

Thank you in advance

SOLVED:

I did the following:
 

function getxml(x:inodexml):string;
var srcx:Txmldocument;
begin
srcx:=Txmldocument.Create(nil);
  try
      srcx.xml.Text:='';
      srcx.Active:=true;
      srcx.DOMDocument.appendChild(x.DOMNode);
      srcx.SaveToXML(result);
  finally srcx.Free; end;
end;

 

Edited by dkounal

Share this post


Link to post
14 hours ago, shineworld said:

I'use MSXML for many years and works very fine and fast.

Fine, if you are not on 32bit and memory is at limit when you save the XML. It will save nicely but truncated.

Fast, if you don't have to many attributes in a node. If you have 100 attributes it's starting to slow down perceptible. I had a case with tens of thousands.

These are corner cases but showstoppers for me.

 

Better try Neslib.Xml.

I have an implementation that is not faster than Neslib.Xml (slightly slower and more memory) but faster in my case if there are many attributes in a node because I use a dictionary for storing the attributes.

Edited by Cristian Peța

Share this post


Link to post

Actually, I use MSXML only for 32-bit applications (never done 64bit at moment)
To increase the amount of memory I use FastMM with these options:

{$IFDEF USES_EMBEDDED_FASTMM_MEMORY_MANAGER}

  {$IFDEF USES_FASTMM_OLD}
    {$SetPEFlags $20}
    {$DEFINE USES_FASTMM}
    FastMM4 in 'sources\memory-managers\FastMM4-old\FastMM4.pas',
    FastMM4Messages in 'sources\memory-managers\FastMM4-old\FastMM4Messages.pas',
  {$ENDIF}

  {$IFDEF USES_FASTMM_NEW}
    {$SetPEFlags $20}
    {$DEFINE USES_FASTMM}
    FastMM4 in 'sources\memory-managers\FastMM4-new\FastMM4.pas',
    FastMM4Messages in 'sources\memory-managers\FastMM4-new\FastMM4Messages.pas',
  {$ENDIF}

  {$IFDEF USES_FASTMM_AVX}
    {$SetPEFlags $20}
    {$DEFINE USES_FASTMM}
    FastMM4 in 'sources\memory-managers\FastMM4-AVX\FastMM4.pas',
    FastMM4Messages in 'sources\memory-managers\FastMM4-AVX\FastMM4Messages.pas',
  {$ENDIF}

{$ENDIF}

In this FastMM.inc file I can choose three different FastMM versions.
"{$SetPEFlags $20}" extends the 2GB limits of Win32.
I don't know if that influence also MSXML but Delphi program memory increases a lot.

I use also a lot of attributes (XMLMemento uses them to store data) but I did not reach thousand elements.

I'm searching for a good native Delphi library to substitute MSXML and have all in Delphi project...

Share this post


Link to post
2 minutes ago, shineworld said:

{$SetPEFlags $20}

IMAGE_FILE_LARGE_ADDRESS_AWARE. I use this also but not enough in some cases. And 64 bit is not yet an option because I don't want both 32 and 64 and there are about 10% with 32bit OS in case of my users.

 

3 minutes ago, shineworld said:

I'm searching for a good native Delphi library to substitute MSXML and have all in Delphi project...

https://github.com/neslib/Neslib.Xml

  • Like 1

Share this post


Link to post
On 8/29/2020 at 2:01 PM, Javier Tarí said:

If you are interested in performance, http://kluug.net (Ondřej Pokorný) has two excellent libraries; the freeware OmniXML and the commercial OXml. I'm user and customer of OXml

 

On this page: http://kluug.net/oxml.php  choose the Performance tab and you will see a thorough comparision between most Delphi XML libraries. OXml shows about 10 times faster than MSXML

 

I purchased both OXml and OExport, and quite happy with them

 

Note: No, I'm not related to Kluug in any possible way, other than customer/user of his librtaries

 

I am glad I found this thread the other day.  I was able ot get a large 3MF file to load much faster using OXml instead using the DOM method.  so I concure for OXml suggestion.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×