Pass Values Into Scrapy Callback
Solution 1:
I'm not 100% sure but I think you can't rename the scrapy image files however you want, scrapy does that.
What you want to do looks like a job for CrawlSpider
instead of Spider
.
CrawlSpider by itself follows every link it finds in every page recursively and you can set rules on what pages you want to scrap. Here are the docs.
If you are stubborn enough to keep Spider
you can use the meta tag on requests to pass the items and save links in them.
for link in soup.find_all("a"):
item=crawlerItem()
item['url'] = response.urljoin(link.get('href'))
request=scrapy.Request(url,callback=self.scrape_page)
request.meta['item']=item
yield request
To get the item just go look for it in the response:
defscrape_page(self, response):
item=response.meta['item']
In this specific example the item passed item['url']
is obsolete as you can get the current url with response.url
Also,
It's a bad idea to use Beautiful soup in scrapy as it just slows you down, the scrapy library is really well developed to the extent that you don't need anything else to extract data!
Post a Comment for "Pass Values Into Scrapy Callback"