Encoding Problem Downloading Html Using Mechanize And Python 2.6
browser = mechanize.Browser() page = browser.open(url) html = page.get_data() print html It shows some strange characters. I suppose that it is UTF-8 string but Python doesn't kn
Solution 1:
It was gzipped
def ungzipResponse(r,b):
headers = r.info()
if headers['Content-Encoding']=='gzip':
import gzip
gz = gzip.GzipFile(fileobj=r, mode='rb')
html = gz.read()
gz.close()
headers["Content-type"] = "text/html; charset=utf-8"
r.set_data( html )
b.set_response(r)
response = browser.open(url)
ungzipResponse(response, browser)
html = response.read()
Solution 2:
u = html.decode('utf-8')
Solution 3:
you need to define the encoding like :
#!/usr/bin/python# -*- coding: iso-8859-15 -*-
mechanize need it .
for more information check this out http://www.python.org/dev/peps/pep-0263/
Post a Comment for "Encoding Problem Downloading Html Using Mechanize And Python 2.6"