XPATH For Scrapy

December 28, 2022 Post a Comment

So i am using SCRAPY to scrape off the books of a website. I have the crawler working and it crawls fine, but when it comes to cleaning the HTML using the select in XPATH it is ki

Solution 1:

There are different ways to get this

Best to select multiple nodes is, selecting on the basis of ids or class. e.g:
```
sel.xpath("//div[@id='id']")
```

You can select like this

for i in range(0, upto_num_of_divs):
    list = sel.xpath("//div[%s]" %i)

You can select like this

for i in range(0, upto_num_of_divs):
    list = sel.xpath("//div[position > =1 and position() < upto_num_of_divs])

Solution 2:

Here is an example how you can parse your example html:

lis = hxs.select('//div/div[3]/div/div/div[2]/div/ul/li')
for li in lis:
    book_el = li.select('a/span/text()')

Often enough you can do something like //div[@class="final-price"]//span to get the list of all the spans in one xpath. The exact expression depends on your html, this is just to give you an idea.

Otherwise the code above should do the trick.

Free Interactive Python Tutorial

XPATH For Scrapy

Solution 1:

Solution 2:

Post a Comment for "XPATH For Scrapy"