Skip to content Skip to sidebar Skip to footer

Python Scrapy Not Crawling All Urls In Scraped List

I am trying to scrape information from the pages listed on this page. https://pardo.ch/pardo/program/archive/2017/catalog-films.html the xpath selector: film_page_urls_startpage =

Solution 1:

I checked this

len(film_page_urls_startpage)

and I get only 11, not 23.

If I use xpath('//article/a/@href') then I get 23 urls.

There is no need to add @class. There is no other article.


EDIT:

If I do

for item in sel.xpath('//article/@class').extract():
    print('class:', item)

then I get

class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 evenclass: strip-list_link_all strip-list strip--color row row--5class: strip-list_link_all strip-list strip--color row row--5 even

So some items have even in class string and this was your problem.

Post a Comment for "Python Scrapy Not Crawling All Urls In Scraped List"