Skip to content Skip to sidebar Skip to footer

Passing Over Nonetype Attributes In Beautifulsoup

I am parsing an XML feed from Google using beautifulstonesoup and python, and it works great. I am also creating a csv and uploading it to Google Docs, which works fine as well. Th

Solution 1:

you can wrap the find like this:

def findnonempty(entry, arg):
    result = entry.find(arg):
    if result:
        return result.text
    else:
        return""

the you can either do the 7 calls one after each other or you can use map(), like

tags = ['ns1:familyname', 'ns1:givenname', ... ] # your tags
s = map(lambda tag: findnonempty(entry, tag), tags)
"".join(s)

Solution 2:

At first I didn't see why you thought it would break...you didn't have an "offending" data snippet. BeautifulSoup will gladly return an empty string.

At the END of your "have to scroll over there to see it" line it's finally clear that you are (as you did say in your intro) looking for an attribute.

entry.find('ns1:email',primary=True)['address']

The empty attribute will not return as silently as an empty text node (e.g. entry.find('ns1:familyname').text).

Never fear, just substitute the ['address'] notation with .get('address','') and it will return an empty string if empty rather than throw a KeyError

Solution 3:

It's easy enough to encapsulate the value-getting and printing into functions.

deffind(entry, spec, default=None):
    value = entry.find(spec)
    return default if value isNoneelse value.text

deffindandprint(entry, spec, default=None, newline=True):
    value = find(entry, spec, default)
    if value isnotNone:    # if we still don't have a value even afterprint value,         # considering default, don't print anythingif newline:
            print

Then you can just:

for entry in a:
    findandprint(entry, 'ns1:orgtitle',   default="")
    findandprint(entry, 'ns1:familyname', default="")

If you have a lot of attributes, and want to handle them all the same, then iterate over those too:

for entry in a:
    for attribute in ('ns1:orgtitle', 'ns1:familyname', ...):
        findandprint(entry, attribute, default="")

Post a Comment for "Passing Over Nonetype Attributes In Beautifulsoup"