Webscraper Will Not Work
I have followed a tutorial pretty much to the letter, and I want my scraper to scrape all the links to the specific pages containing the info about each police station, but it retu
Solution 1:
Use BeautifulSoup
from bs4 import BeautifulSoup
from urllib2 import urlopen
f = urlopen("http://www.emergencyassistanceuk.co.uk/list-of-uk-police-stations.html").read()
bs = BeautifulSoup(f)
for tag in bs.find_all('span', {'class': 'listlink-police'}):
print tag.a['href']
Solution 2:
You are using regex to parse HTML. You shouldn't, because you end up with just this type of problem. For a start, the .*
wildcard will match as much text as it can. But once you fix that, you will pluck another fruit from the Tree of Frustration. Use a proper HTML parser instead.
Post a Comment for "Webscraper Will Not Work"