Есть ли способ проверить, содержит ли src тега img определенную строку для очистки с использованием BS4

#python #web-scraping #beautifulsoup

#python #веб-очистка #beautifulsoup

Вопрос:

Я хочу очистить все изображения из 6-го столбца в таблице на основе атрибута src. Потому что в столбце 1 также есть другие изображения. Итак, я составляю список изображений на основе конкретного src.

Ссылка, которую я хочу очистить, это https://reelgood.com/tv и я очищаю только 6-й столбец, содержащий логотипы каналов, со значениями этого тега изображения < img src=»https://img.reelgood.com/service-logos/hbo_max.svg » alt=»hbo_max» >. Я только хочу проверить, все ли изображения, которые я очищаю, содержат «img.reelgood.com/service-logos «. так что я получаю только логотипы.

Комментарии:

1. Постарайтесь быть более конкретным. Какой веб-сайт? Какая таблица? Что именно вы хотите сделать? Предоставьте более подробную информацию.

2. Я быстро отредактировал ваш запрос,

3. Хорошо … Это здорово! Но что вы пробовали до сих пор?

4. Я попробовал images = soup.find_all(‘div’, class_=’css-1flk2s8 e11eoopx0′) приводит к <div class=»css-1flk2s8 e11eoopx0″><img alt=»netflix» src=»

img.reelgood.com/service-logos/netflix.svg «/></div >

5. Я только хочу получить alt (здесь имеется в виду netflix) изображения, чтобы проверить, существует ли этот канал в списке разрешенных каналов клиентом.

Ответ №1:

Это должно вам помочь:

 import requests
from bs4 import BeautifulSoup

r = requests.get('https://reelgood.com/tv')

soup = BeautifulSoup(r.text,'html5lib')

divs = soup.find_all('div', class_ = lambda value: value == 'css-1flk2s8 e11eoopx0' or value == 'css-1lcdwq3 e11eoopx0' or value == 'css-1d5ll0r e11eoopx0')

[print(f"Streamer = {div.img['alt']} , Image = {div.img['src']}") for div in divs]
  

Вывод:

 Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hbo , Image = https://img.reelgood.com/service-logos/hbo.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = hoopla , Image = https://img.reelgood.com/service-logos/hoopla.svg
Streamer = adult_swim_tveverywhere , Image = https://img.reelgood.com/service-logos/adult_swim.svg
Streamer = adult_swim , Image = https://img.reelgood.com/service-logos/adult_swim.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = fx_tveverywhere , Image = https://img.reelgood.com/service-logos/fx.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = amazon_prime , Image = https://img.reelgood.com/service-logos/amazon_prime.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hbo , Image = https://img.reelgood.com/service-logos/hbo.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = cbs_all_access , Image = https://img.reelgood.com/service-logos/cbs_all_access.svg
Streamer = hoopla , Image = https://img.reelgood.com/service-logos/hoopla.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = philo , Image = https://img.reelgood.com/service-logos/philo.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = comedycentral_tveverywhere , Image = https://img.reelgood.com/service-logos/comedy.svg
Streamer = comedy , Image = https://img.reelgood.com/service-logos/comedy.svg
Streamer = southpark , Image = https://img.reelgood.com/service-logos/southpark.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = tbs , Image = https://img.reelgood.com/service-logos/tbs.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = thecw , Image = https://img.reelgood.com/service-logos/thecw.svg
Streamer = tnt , Image = https://img.reelgood.com/service-logos/tnt.svg
Streamer = disney_plus , Image = https://img.reelgood.com/service-logos/disney_plus.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = crunchyroll_premium , Image = https://img.reelgood.com/service-logos/crunchyroll.svg
Streamer = funimation , Image = https://img.reelgood.com/service-logos/funimation.svg
Streamer = tubi_tv , Image = https://img.reelgood.com/service-logos/tubi_tv.svg
Streamer = crunchyroll_free , Image = https://img.reelgood.com/service-logos/crunchyroll.svg
Streamer = adult_swim_tveverywhere , Image = https://img.reelgood.com/service-logos/adult_swim.svg
Streamer = adult_swim , Image = https://img.reelgood.com/service-logos/adult_swim.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = philo , Image = https://img.reelgood.com/service-logos/philo.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = comedycentral_tveverywhere , Image = https://img.reelgood.com/service-logos/comedy.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = philo , Image = https://img.reelgood.com/service-logos/philo.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = amc_premiere , Image = https://img.reelgood.com/service-logos/amc_premiere.svg
Streamer = amc , Image = https://img.reelgood.com/service-logos/amc.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hbo , Image = https://img.reelgood.com/service-logos/hbo.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = amazon_prime , Image = https://img.reelgood.com/service-logos/amazon_prime.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = philo , Image = https://img.reelgood.com/service-logos/philo.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = peacock_free , Image = https://img.reelgood.com/service-logos/peacock.svg
Streamer = comedycentral_tveverywhere , Image = https://img.reelgood.com/service-logos/comedy.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = amazon_prime , Image = https://img.reelgood.com/service-logos/amazon_prime.svg
Streamer = peacock , Image = https://img.reelgood.com/service-logos/peacock.svg
Streamer = nbc_tveverywhere , Image = https://img.reelgood.com/service-logos/nbc.svg
Streamer = nbc , Image = https://img.reelgood.com/service-logos/nbc.svg
Streamer = amazon_prime , Image = https://img.reelgood.com/service-logos/amazon_prime.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = disney_plus , Image = https://img.reelgood.com/service-logos/disney_plus.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = fx_tveverywhere , Image = https://img.reelgood.com/service-logos/fx.svg
Streamer = fox_tveverywhere , Image = https://img.reelgood.com/service-logos/fox.svg
Streamer = fox , Image = https://img.reelgood.com/service-logos/fox.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = showtime , Image = https://img.reelgood.com/service-logos/showtime.svg
Streamer = showtime_free , Image = https://img.reelgood.com/service-logos/showtime.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hbo , Image = https://img.reelgood.com/service-logos/hbo.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hbo , Image = https://img.reelgood.com/service-logos/hbo.svg
Streamer = amazon_prime , Image = https://img.reelgood.com/service-logos/amazon_prime.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = peacock , Image = https://img.reelgood.com/service-logos/peacock.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = tbs , Image = https://img.reelgood.com/service-logos/tbs.svg
Streamer = nbc_tveverywhere , Image = https://img.reelgood.com/service-logos/nbc.svg
Streamer = nbc , Image = https://img.reelgood.com/service-logos/nbc.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = imdb_tv , Image = https://img.reelgood.com/service-logos/imdb_tv.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hbo , Image = https://img.reelgood.com/service-logos/hbo.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = hbo , Image = https://img.reelgood.com/service-logos/hbo.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = showtime , Image = https://img.reelgood.com/service-logos/showtime.svg
Streamer = cbs_all_access , Image = https://img.reelgood.com/service-logos/cbs_all_access.svg
Streamer = hbo_max , Image = https://img.reelgood.com/service-logos/hbo_max.svg
Streamer = tbs , Image = https://img.reelgood.com/service-logos/tbs.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = amc_premiere , Image = https://img.reelgood.com/service-logos/amc_premiere.svg
Streamer = imdb_tv , Image = https://img.reelgood.com/service-logos/imdb_tv.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = abc_tveverywhere , Image = https://img.reelgood.com/service-logos/abc.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
Streamer = youtube_premium , Image = https://img.reelgood.com/service-logos/youtube_premium.svg
Streamer = youtube_premium_free , Image = https://img.reelgood.com/service-logos/youtube_premium.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = amazon_prime , Image = https://img.reelgood.com/service-logos/amazon_prime.svg
Streamer = peacock_free , Image = https://img.reelgood.com/service-logos/peacock.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = crunchyroll_premium , Image = https://img.reelgood.com/service-logos/crunchyroll.svg
Streamer = funimation , Image = https://img.reelgood.com/service-logos/funimation.svg
Streamer = crunchyroll_free , Image = https://img.reelgood.com/service-logos/crunchyroll.svg
Streamer = adult_swim_tveverywhere , Image = https://img.reelgood.com/service-logos/adult_swim.svg
Streamer = adult_swim , Image = https://img.reelgood.com/service-logos/adult_swim.svg
Streamer = free , Image = https://img.reelgood.com/service-logos/free.svg
Streamer = imdb_tv , Image = https://img.reelgood.com/service-logos/imdb_tv.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = tbs , Image = https://img.reelgood.com/service-logos/tbs.svg
Streamer = hulu_plus , Image = https://img.reelgood.com/service-logos/hulu.svg
Streamer = philo , Image = https://img.reelgood.com/service-logos/philo.svg
Streamer = fubo_tv , Image = https://img.reelgood.com/service-logos/fubo_tv.svg
Streamer = fx_tveverywhere , Image = https://img.reelgood.com/service-logos/fx.svg
Streamer = viceland_tve , Image = https://img.reelgood.com/service-logos/viceland.svg
Streamer = fox_tveverywhere , Image = https://img.reelgood.com/service-logos/fox.svg
Streamer = netflix , Image = https://img.reelgood.com/service-logos/netflix.svg
  

Комментарии:

1. почему он извлекает только 39 результатов? В таблице 50 строк телешоу, и каждая строка содержит как минимум 1 логотип канала (на самом деле в некоторых строках более 1 логотипа)? Таким образом, должно быть не менее 50 результатов, но их всего 39. Это меня беспокоит.

2. На самом деле их должно быть 167, но извлекается только 94.

3. Ох… Ознакомьтесь с моим последним редактированием. Фактическое количество извлекаемых изображений равно 151, и мой последний код извлекает все изображения.