Jump to content
narag

XML parsing problems

Recommended Posts

I've been assigned the task of connecting to a SOAP web service and reading a list of offices that will be sent to me in XML. I've worked with SOAP and Delphi before, but not both at the same time in the last decade, so I've been going through what seems to be the usual procedure: creating an interface unit with the IDE wizard (Delphi 11) from the WSDL URL, using the properties that interface gives me to login (it works) and passing the XML to TXMLDocument.

 

This is where the joy ends.

 

The text I get doesn't have an XML header, it contains spaces inside the values and non-ANSI characters. I've tried to follow the execution, but it goes into sections of assembler and external calls, it seems that TXMLDocument is a wrapper around a Windows library.

 

The final error is usually AV when I ask for "ChildNodes" from the root.

 

From this description, can you think of any steps where I might be screwing up? Is there any component or library that can replace the ones I use? I also tried a unit that comes with SuperObject (I have to convert the list to JSON later), but it also gets stuck with the received XML. If it were just to get the list of offices, I would even consider making an ad hoc parser, but I'm afraid I'll soon have to interact a lot more with that API.

 

Thanks in advance!

Share this post


Link to post

I hate SOAP. I've used it on a couple of Delphi projects, and worked on a couple more where someone else was dealing with it, and the amount of effort needed just to get it to do basic stuff is horribly distorted. But when you're forced to interact with a system where all of their external APIs are built in SOAP, you're stuck with a pig-in-a-poke, IMHO. I know one guy who'd first build it in C# just to make sure the specs he was given were correct, because he said that devs frequently update the interface but don't update the public docs (that often involves two different teams). In one of my cases, Delphi's WSDL compiler generated crap, and I had to use a 3rd-party compiler to make my Delphi code work. Good luck with it!

Share this post


Link to post
6 hours ago, narag said:

The text I get doesn't have an XML header

Do you mean the XML prolog? can't you just add one if it is missing before you parse the XML?

6 hours ago, narag said:

it contains spaces inside the values

What's wrong with that?

6 hours ago, narag said:

and non-ANSI characters

And? Xml supports Unicode characters. Any conformant XML parser will handle that.

6 hours ago, narag said:

it seems that TXMLDocument is a wrapper around a Windows library.

By default, it uses MSXML on Windoes. But, you can change the library used, via the DOMVendor property.

6 hours ago, narag said:

The final error is usually AV when I ask for "ChildNodes" from the root.

Can you provide an example XML that you are having trouble with?

 

Share this post


Link to post
On 11/30/2024 at 12:30 AM, Remy Lebeau said:

And? Xml supports Unicode characters. Any conformant XML parser will handle that.

And still, that's the error that I get:

Project Xml2Json.exe raised exception class EDOMParseError with message 'Se encontró un carácter no válido en el contenido del texto.

Line: 10
    <Agent>SOMEMF Ga'.

And yes, that's the position where the first non-ANSI char is.

 

I started with the SOAP interface directly feeding the XML parser. That caused AV. Debugging step by step doesn't help very much, because most of the time I fall into System unit assembler that (I guess) dispatches interfaces calls. So I write the XML to file and then read it to make sure the encoding is right, but it still complains.

 

Is there another library that I can use?

 

Thank you!

Share this post


Link to post
On 11/29/2024 at 10:48 PM, David Schwartz said:

I hate SOAP. I've used it on a couple of Delphi projects, and worked on a couple more where someone else was dealing with it, and the amount of effort needed just to get it to do basic stuff is horribly distorted. But when you're forced to interact with a system where all of their external APIs are built in SOAP, you're stuck with a pig-in-a-poke, IMHO. I know one guy who'd first build it in C# just to make sure the specs he was given were correct, because he said that devs frequently update the interface but don't update the public docs (that often involves two different teams). In one of my cases, Delphi's WSDL compiler generated crap, and I had to use a 3rd-party compiler to make my Delphi code work. Good luck with it!

Most new interfaces use REST and JSON, that seem to be easier to get right. Still I'm not sure where to put the blame exactly. Maybe it's just me 🙂

Share this post


Link to post
2 minutes ago, narag said:

And still, that's the error that I get:

Project Xml2Json.exe raised exception class EDOMParseError with message 'Se encontró un carácter no válido en el contenido del texto. 

Line: 10
    <Agent>SOMEMF Ga'.

And yes, that's the position where the first non-ANSI char is.

Again, can you provide the actual XML?  The error message is complaining about an invalid character.  Non-ANSI characters on their own are not invalid characters in XML, but certain characters in certain contexts may be.  For instance, text elements do not allow Unicode codepoints in the range of control characters and surrogates.  We need more information to diagnose the problem.

2 minutes ago, narag said:

I started with the SOAP interface directly feeding the XML parser. That caused AV.

SOAP uses well-defined XML.  So, either your server is not sending valid XML correctly, or more likely you are just not parsing it correctly.

 

Also, to be clear - you are not getting an Access Violation.  You are getting a Parsing error.  Those are two different things.

2 minutes ago, narag said:

Debugging step by step doesn't help very much, because most of the time I fall into System unit assembler that (I guess) dispatches interfaces calls. So I write the XML to file and then read it to make sure the encoding is right, but it still complains.

Then the encoding is probably not correct to begin with.  Are you receiving and saving/parsing the XML as a string, or as raw bytes?  You should be doing the latter, not the former.

Share this post


Link to post

Remy, thank you for your help. I can't post the real file because it contains confidential data. Any example would be made up anyway.

 

You were right there were two different problems. The AV was caused by my bad understanding of the load methods. The encoding errors went away when directly feeding the SOAP output to the parser input. It seems that using a string in the middle was triggering automatic conversions that the parser didn't like.

Share this post


Link to post

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×