Skip to content Skip to sidebar Skip to footer

XPATH For Scrapy

So i am using SCRAPY to scrape off the books of a website. I have the crawler working and it crawls fine, but when it comes to cleaning the HTML using the select in XPATH it is ki

Solution 1:

There are different ways to get this

  1. Best to select multiple nodes is, selecting on the basis of ids or class. e.g:

    sel.xpath("//div[@id='id']")
    
  2. You can select like this

    for i in range(0, upto_num_of_divs):
        list = sel.xpath("//div[%s]" %i)
    
  3. You can select like this

    for i in range(0, upto_num_of_divs):
        list = sel.xpath("//div[position > =1 and position() < upto_num_of_divs])
    

Solution 2:

Here is an example how you can parse your example html:

lis = hxs.select('//div/div[3]/div/div/div[2]/div/ul/li')
for li in lis:
    book_el = li.select('a/span/text()')

Often enough you can do something like //div[@class="final-price"]//span to get the list of all the spans in one xpath. The exact expression depends on your html, this is just to give you an idea.

Otherwise the code above should do the trick.


Post a Comment for "XPATH For Scrapy"