XPATH For Scrapy
So i am using SCRAPY to scrape off the books of a website. I have the crawler working and it crawls fine, but when it comes to cleaning the HTML using the select in XPATH it is ki
Solution 1:
There are different ways to get this
Best to select multiple nodes is, selecting on the basis of ids or class. e.g:
sel.xpath("//div[@id='id']")
You can select like this
for i in range(0, upto_num_of_divs): list = sel.xpath("//div[%s]" %i)
You can select like this
for i in range(0, upto_num_of_divs): list = sel.xpath("//div[position > =1 and position() < upto_num_of_divs])
Solution 2:
Here is an example how you can parse your example html:
lis = hxs.select('//div/div[3]/div/div/div[2]/div/ul/li')
for li in lis:
book_el = li.select('a/span/text()')
Often enough you can do something like //div[@class="final-price"]//span
to get the list of all the spans in one xpath. The exact expression depends on your html, this is just to give you an idea.
Otherwise the code above should do the trick.
Post a Comment for "XPATH For Scrapy"