Skip to content Skip to sidebar Skip to footer

Webscraper Will Not Work

I have followed a tutorial pretty much to the letter, and I want my scraper to scrape all the links to the specific pages containing the info about each police station, but it retu

Solution 1:

Use BeautifulSoup

from bs4 import BeautifulSoup
from urllib2 import urlopen

f = urlopen("http://www.emergencyassistanceuk.co.uk/list-of-uk-police-stations.html").read()

bs = BeautifulSoup(f)

for tag in bs.find_all('span', {'class': 'listlink-police'}):
    print tag.a['href']

Solution 2:

You are using regex to parse HTML. You shouldn't, because you end up with just this type of problem. For a start, the .* wildcard will match as much text as it can. But once you fix that, you will pluck another fruit from the Tree of Frustration. Use a proper HTML parser instead.

Solution 3:

There are over 1.6k links with that class on it.

I think its working correctly... what makes you think it's not working?


And you should definitely use Beautiful Soup, it's stupid simple and extremely useable.

Post a Comment for "Webscraper Will Not Work"