Skip to content Skip to sidebar Skip to footer

Python Beautifulsoup Extracting Titles According To Id

This is a subquestion of this one: Python associate urls's ids and url's titles in lists I have this HTML script: ): title = link.get_text().strip() if title and title notin (u'Voir cette vidéo', u'Lire la vidéo'): url = link['href'] links.setdefault(url, []).append(title)

The dict.setdefault() call sets an empty list for urls not yet encountered; this produces a dictionary with the URLs as keys, and the titles as a list of values per URL.


>>> page = '''\
... <a href=",101973832.html"
...    class="ss-titre">Monte le son</a>
... <div class="rs-cell-details">
...     <a href=",101973832.html" 
...        class="ss-titre">"Rubin_Steiner"</a>
...     <a href=",102103928.html"
...        class="ss-titre">Fare maohi</a>
... '''>>> links = {}
>>> soup = BeautifulSoup(page)
>>> for link in'a[href^=]'):
...     title = link.get_text().strip()
... if title and title notin (u'Voir cette  vidéo', u'Lire la vidéo'):
...         url = link['href']
...         links.setdefault(url, []).append(title)
... >>> from pprint import pprint
>>> pprint(links)
{',101506826.html': [u'Ce soir (ou jamais !)',
                                                                       u'"Qui est propri\xe9taire de quoi ? La propri\xe9t\xe9 mise \xe0 mal dans tous les domaines"'],
 ',102890631.html': [u'Clips'],
 ',102152859.html': [u'Fare maohi'],
 ',102292937.html': [u'Fare maohi'],
 ',102365651.html': [u'Fare maohi'],
 ',101972045.html': [u'Inspecteur Barnaby',
                                                                        u'"La musique en h\xe9ritage"'],
 ',101215383.html': [u'Le Lab.\xd4',
                                                                      u'"Episode 22"',
                                                                      u'Saison 3'],
 ',101970319.html': [u'Les Monsieur Madame',
                                                                             u'Saison 1'],
 ',101973832.html': [u'Monte le son !',
                                                                       u'"Rubin Steiner"'],
 ',101215382.html': [u'Music Explorer : les chasseurs de sons',
                                                                            u'"Episode 3/6"',
                                                                            u'Saison 1'],
 ',101641108.html': [u'Retour \xe0 Gor\xe9e'],
 ',101507102.html': [u'Singe mi singe moi',
                                                                        u'"Le chat"'],
 ',101777072.html': [u'Singe mi singe moi',
 ',102472310.html': [u'T.N.T'],
 ',102472336.html': [u'T.N.T'],
 ',102721018.html': [u'T.N.T'],
 ',103216774.html': [u'T.N.T.'],
 ',103216788.html': [u'T.N.T'],
 ',101959892.html': [u'Via cultura',
                                                                 u'"L\'Ochju, le Mauvais oeil"']}

Post a Comment for "Python Beautifulsoup Extracting Titles According To Id"