Skip to content Skip to sidebar Skip to footer

Xml.dom.minidom: Getting Cdata Values

I'm able to get the value in the image tag (see XML below), but not the Category tag. The difference is one is a CDATA section and the other is just a string. Any help would be ap

Solution 1:

p.getElementsByTagName('Category')[0].firstChild

minidom does not flatten away <![CDATA[ sections to plain text, it leaves them as DOM CDATASection nodes. (Arguably it should, at least optionally. DOM Level 3 LS defaults to flattening them, for what it's worth, but minidom is much older than DOM L3.)

So the firstChild of Category is a Text node representing the whitespace between the <Category> open tag and the start of the CDATA section. It has two siblings: the CDATASection node, and another trailing whitespace Text node.

What you probably want is the textual data of all children of Category. In DOM Level 3 Core you'd just call:

p.getElementsByTagName('Category')[0].textContent

but minidom doesn't support that yet. Recent versions do, however, support another Level 3 method you can use to do the same thing in a more roundabout way:

p.getElementsByTagName('Category')[0].firstChild.wholeText

Solution 2:

CDATA is its own node, so the Category elements here actually have three children, a whitespace text node, the CDATA node, and another whitespace node. You're just looking at the wrong one, is all. I don't see any more obvious way to query for the CDATA node, but you can pull it out like this:

[n for n in category.childNodes if n.nodeType==category.CDATA_SECTION_NODE][0]

Solution 3:

I've ran into a similar problem. My solution was similar to what ironfroggy answered, but implemented in a more general fashion:

for node in parentNode.childNodes:ifnode.nodeType==4:cdataContent=node.data.strip()

CDATA's node type is 4 (CDATA_SECTION_NODE)

Post a Comment for "Xml.dom.minidom: Getting Cdata Values"