Как распечатать первую ссылку, сгенерированную из find.И все это с помощью Прекрасного Супа?

#html #web-scraping #beautifulsoup

Вопрос:

Код:

 import urllib.request from bs4 import BeautifulSoup from requests import get import urllib import requests   week_11_picURL = "https://www.packers.com/photos/game-photos-packers-at-vikings-week-11-2021#9258618e-e793-41ae-8d9a-d3792366dcbb"   response = get(week_11_picURL) print(response)  html_page = requests.get(week_11_picURL) soup = BeautifulSoup(html_page.content, 'html.parser') image = soup.findAll('div', class_="nfl-c-photo-album__picture-wrapper")  

Результат:

 lt;div class="nfl-c-photo-album__picture-wrapper" data-id="146a902d-8de3-484b-ba55-1cf9d26b129c" data-name="Game Photos: Packers at Vikings | Week 11:1"gt; lt;button aria-label="Open Lightbox View" class="nfl-c-photo-album__enlarge-button" title="Open Lightbox View"gt; lt;/buttongt; lt;picturegt;lt;!--[if IE 9]gt;lt;video style="display: none; "gt;lt;![endif]--gt;lt;source media="(min-width:1024px)" srcset="https://static.clubs.nfl.com/image/private/t_new_photo_album/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg 1x, https://static.clubs.nfl.com/image/private/t_new_photo_album_2x/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg 2x, https://static.clubs.nfl.com/image/private/t_new_photo_album_3x/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg"/gt;lt;source media="(min-width:768px)" srcset="https://static.clubs.nfl.com/image/private/t_new_photo_album/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg 1x, https://static.clubs.nfl.com/image/private/t_new_photo_album_2x/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg 2x, https://static.clubs.nfl.com/image/private/t_new_photo_album_3x/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg"/gt;lt;source srcset="https://static.clubs.nfl.com/image/private/t_new_photo_album/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg 1x, https://static.clubs.nfl.com/image/private/t_new_photo_album_2x/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg 2x, https://static.clubs.nfl.com/image/private/t_new_photo_album_3x/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg"/gt;lt;!--[if IE 9]gt;lt;/videogt;lt;![endif]--gt;lt;img alt="211121-game-photos-2560" class="img-responsive" src="https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg"/gt;lt;/picturegt; lt;div class="nfl-c-photo-album__picture-info"gt; lt;div class="nfl-c-photo-album__progress"gt; lt;span style=""gt;  1 / 129  lt;/spangt; lt;/divgt; lt;div class="nfl-c-photo-album__football-divider"gt; lt;span class="nfl-o-icon nfl-o-icon--medium"gt; lt;svg aria-hidden="true" class="nfl-o-icon--football" viewbox="0 0 24 24"gt; lt;use xlink:href="#football"gt;lt;/usegt; lt;/svggt; lt;/spangt; lt;/divgt; lt;div class="nfl-c-photo-album__copyright nfl-c-photo-album__copyright--centered"gt;  Evan Siegle, packers.com  lt;/divgt; lt;/divgt; lt;/divgt; lt;div class="nfl-c-photo-album__picture-wrapper" data-id="27ff497e-e149-45b7-b10a-19baa179e8a1" data-name="Game Photos: Packers at Vikings | Week 11:2"gt; lt;button aria-label="Open Lightbox View" class="nfl-c-photo-album__enlarge-button" title="Open Lightbox View"gt; lt;/buttongt; lt;picture is-lazy="/t_lazy"gt;lt;!--[if IE 9]gt;lt;video style="display: none; "gt;lt;![endif]--gt;lt;source data-srcset="https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg 1x, https://static.clubs.nfl.com/image/private/t_new_photo_album_2x/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg 2x, https://static.clubs.nfl.com/image/private/t_new_photo_album_3x/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg" media="(min-width:1024px)"/gt;lt;source data-srcset="https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg 1x, https://static.clubs.nfl.com/image/private/t_new_photo_album_2x/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg 2x, https://static.clubs.nfl.com/image/private/t_new_photo_album_3x/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg" media="(min-width:768px)"/gt;lt;source data-srcset="https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg 1x, https://static.clubs.nfl.com/image/private/t_new_photo_album_2x/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg 2x, https://static.clubs.nfl.com/image/private/t_new_photo_album_3x/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg"/gt;lt;!--[if IE 9]gt;lt;/videogt;lt;![endif]--gt;lt;img alt="211121-packers-vikings-1st-half-siegle-WM-001" class="img-responsive" data-src="https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg" src=""/gt;lt;/picturegt; lt;div class="nfl-c-photo-album__picture-info"gt; lt;div class="nfl-c-photo-album__progress"gt; lt;span style=""gt;  

Я хочу иметь возможность просто печатать только ссылки, сгенерированные в результате анализа этого html. Как бы я это сделал?

Говоря конкретно, я пытаюсь выделить ссылку, которая появляется сразу после »

Экс. эта ссылка

«https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg 1x»

Ответ №1:

Пожалуйста, обратите srcset data-srcset внимание, что в вашем супе несколько раз встречается сочетание и, а также источник. Также не используйте findAll() в новом коде более новый синтаксис find_all() .

Как это исправить?

Однако вы можете выбрать целевые элементы более конкретно с помощью css selectors

Вариант № 1

Сосредоточен только на источниках с data-srcset

 data = [x['data-srcset'].split(',')[0] for x in soup.select('.nfl-c-photo-album__picture-wrapper picture source[data-srcset]:first-child')]  
Вариант № 2

Также включите источник с srcset :

 soup.select('.nfl-c-photo-album__picture-wrapper picture source:first-child')  

Повторите набор результатов с try помощью и except , чтобы избежать ошибок, и добавьте результаты в список:

 data = []  for x in soup.select('.nfl-c-photo-album__picture-wrapper picture source:first-child'):  try:  data.append(x['srcset'].split(',')[0])  except:  data.append(x['data-srcset'].split(',')[0])  

Пример

 import urllib.request from bs4 import BeautifulSoup from requests import get import urllib import requests   week_11_picURL = "https://www.packers.com/photos/game-photos-packers-at-vikings-week-11-2021#9258618e-e793-41ae-8d9a-d3792366dcbb"   response = get(week_11_picURL) print(response)  html_page = requests.get(week_11_picURL) soup = BeautifulSoup(html_page.content, 'html.parser')  data = []  for x in soup.select('.nfl-c-photo-album__picture-wrapper picture source:first-child'):  try:  data.append(x['srcset'].split(',')[0])  except:  data.append(x['data-srcset'].split(',')[0])   data  

Выход

 ['https://static.clubs.nfl.com/image/private/t_new_photo_album/f_auto/packers/hjmcucejx2vmfshjkdkj.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/rgsvjp6sxu89ditolacv.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/zsogvqrqgaauqcdgejde.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/jyegqthuab2hsuygirqp.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/kwsq1fvn41f6kzqo4nkl.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/xludbah0g8oqlyvr7d0p.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/n6tkqlr65hv39hadt6tl.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/mhtylxhf2ito5f3y7cb7.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/an8onb7coak1psw7inp5.jpg 1x',  'https://static.clubs.nfl.com/image/private/t_new_photo_album/t_lazy/f_auto/packers/ttas30klcrtagdxnl2af.jpg 1x', ...]