Webscraper Will Not Work

July 31, 2024 Post a Comment

I have followed a tutorial pretty much to the letter, and I want my scraper to scrape all the links to the specific pages containing the info about each police station, but it retu

Solution 1:

Use BeautifulSoup

from bs4 import BeautifulSoup
from urllib2 import urlopen

f = urlopen("http://www.emergencyassistanceuk.co.uk/list-of-uk-police-stations.html").read()

bs = BeautifulSoup(f)

for tag in bs.find_all('span', {'class': 'listlink-police'}):
    print tag.a['href']

Solution 2:

You are using regex to parse HTML. You shouldn't, because you end up with just this type of problem. For a start, the .* wildcard will match as much text as it can. But once you fix that, you will pluck another fruit from the Tree of Frustration. Use a proper HTML parser instead.

Solution 3:

There are over 1.6k links with that class on it.

I think its working correctly... what makes you think it's not working?

And you should definitely use Beautiful Soup, it's stupid simple and extremely useable.

Free Interactive Python Tutorial

Webscraper Will Not Work

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Webscraper Will Not Work"