问题描述:

Im web-requsting an XML document. Xdocument.Load(stream) throws an exception because the XML contains &, and therefore expects ; like &.

I did read the stream to string and replaced & with &, but that broke all other correctly encoded special chars like ø.

Is there a simple way to encode all disallowed chars in the string before parsing to XDocument?

网友答案:

Try CDATA Sections in xml

A CDATA section can only be used in places where you could have a text node.

<foo><![CDATA[Here is some data including < , > or & etc) ]]></foo>
网友答案:

This kind of methods are not encouraged!! The reason lies in your question!

(replacing & by &amp; turns &gt; to &amp;gt;)

The better suggestion apart from using regex is modifying your source code which is generating such uncoded XML.
I have come across (.NET) code that use 'string concat' to come up with XML! (Instead one should use XML-DOM)
If you have an access to modify the source code then better go head with that .. because encoding such half-encoded XML is not promised with perfection!

网友答案:

@espvar,

This is an input XML:

<root><child>nospecialchars</child><specialchild>data&data</specialchild><specialchild2>You.. & I in this beautiful world</specialchild2>data&amp;</root>

And the Main function:

        string EncodedXML = encodeWithCDATA(XMLInput); //Calling our Custom function

        XmlDocument xdDoc = new XmlDocument();

        xdDoc.LoadXml(EncodedXML); //passed

The function encodeWithCDATA():

    private string encodeWithCDATA(string stringXML)
    {
        if (stringXML.IndexOf('&') != -1)
        {

            int indexofClosingtag = stringXML.Substring(0, stringXML.IndexOf('&')).LastIndexOf('>');
            int indexofNextOpeningtag = stringXML.Substring(indexofClosingtag).IndexOf('<');

            string CDATAsection = string.Concat("<![CDATA[", stringXML.Substring(indexofClosingtag, indexofNextOpeningtag), "]]>");

            string encodedLeftPart = string.Concat(stringXML.Substring(0, indexofClosingtag+1), CDATAsection);
            string UncodedRightPart = stringXML.Substring(indexofClosingtag+indexofNextOpeningtag);
            return (string.Concat(encodedLeftPart, encodeWithCDATA(UncodedRightPart)));
        }
        else
        {
            return (stringXML);
        }
    }

Encoded XML (ie, xdDoc.OuterXml):

<root>
  <child>nospecialchars</child>
  <specialchild>
    <![CDATA[>data&data]]>
  </specialchild>
  <specialchild2>
    <![CDATA[>You.. & I in this beautiful world]]>
  </specialchild2>
  <![CDATA[>data&amp;]]>
</root>

All I have used is, substring, IndexOf, stringConcat and recursive function call.. Let me know if you don't understand any part of the code.

The sample XML that I have provided possess data in the parent nodes as well, which is kind of HTML property .. ex: <div>this is <b>bold</b> text</div>.. and my code takes care of encoding data outside <b> tag if they have special character ie, &..

Please note that, I have taken care of encoding '&' only and .. data cannot have chars like '<' or '>' or single-quote or double-quote..

相关阅读:
Top