Skip to content Skip to sidebar Skip to footer

Handling Multiple Nodes When Parsing Xml With Python

For an assignment, I need to parse through a 2 million line XML file, and input the data into a MySQL database. Since we are using a python environment with sqlite for the class, I

Solution 1:

Notice that you were getting the number of characters in the first author here, for the code limits the result to only the first author (index 0) and then get its length :

author = authors.getElementsByTagName("author")[0].firstChild.data
num_authors = len(author)
print("Number of authors: ", num_authors )

Just don't limit the result to get all the authors :

author = authors.getElementsByTagName("author")
num_authors = len(author)
print("Number of authors: ", num_authors )

You can use list comprehension to get all author names, instead of author elements, in a list :

author = [a.firstChild.data for a in authors.getElementsByTagName("author")]
print(author)
# [u'J. K. Schneider', u'C. E. Richardson', u'F. W. Kiefer', u'Venu Govindaraju']

Post a Comment for "Handling Multiple Nodes When Parsing Xml With Python"