Skip to content Skip to sidebar Skip to footer

Encoding Problem Downloading Html Using Mechanize And Python 2.6

browser = mechanize.Browser() page = browser.open(url) html = page.get_data() print html It shows some strange characters. I suppose that it is UTF-8 string but Python doesn't kn

Solution 1:

It was gzipped

def ungzipResponse(r,b):
    headers = r.info()
    if headers['Content-Encoding']=='gzip':
        import gzip
        gz = gzip.GzipFile(fileobj=r, mode='rb')
        html = gz.read()
        gz.close()
        headers["Content-type"] = "text/html; charset=utf-8"
        r.set_data( html )
        b.set_response(r)

response = browser.open(url)
ungzipResponse(response, browser)
html = response.read()

Solution 2:

u = html.decode('utf-8')

Solution 3:

you need to define the encoding like :

#!/usr/bin/python# -*- coding: iso-8859-15 -*-

mechanize need it .

for more information check this out http://www.python.org/dev/peps/pep-0263/

Post a Comment for "Encoding Problem Downloading Html Using Mechanize And Python 2.6"