Xml.dom.minidom: Getting Cdata Values
Solution 1:
p.getElementsByTagName('Category')[0].firstChild
minidom does not flatten away <![CDATA[ sections to plain text, it leaves them as DOM CDATASection nodes. (Arguably it should, at least optionally. DOM Level 3 LS defaults to flattening them, for what it's worth, but minidom is much older than DOM L3.)
So the firstChild of Category is a Text node representing the whitespace between the <Category> open tag and the start of the CDATA section. It has two siblings: the CDATASection node, and another trailing whitespace Text node.
What you probably want is the textual data of all children of Category. In DOM Level 3 Core you'd just call:
p.getElementsByTagName('Category')[0].textContent
but minidom doesn't support that yet. Recent versions do, however, support another Level 3 method you can use to do the same thing in a more roundabout way:
p.getElementsByTagName('Category')[0].firstChild.wholeText
Solution 2:
CDATA is its own node, so the Category elements here actually have three children, a whitespace text node, the CDATA node, and another whitespace node. You're just looking at the wrong one, is all. I don't see any more obvious way to query for the CDATA node, but you can pull it out like this:
[n for n in category.childNodes if n.nodeType==category.CDATA_SECTION_NODE][0]
Solution 3:
I've ran into a similar problem. My solution was similar to what ironfroggy answered, but implemented in a more general fashion:
for node in parentNode.childNodes:ifnode.nodeType==4:cdataContent=node.data.strip()
CDATA's node type is 4 (CDATA_SECTION_NODE
)
Post a Comment for "Xml.dom.minidom: Getting Cdata Values"