#python #python-3.x #web-scraping #beautifulsoup #python-requests
Вопрос:
Я пытаюсь соскрести series name
product code
с веб — страницы. Сценарий, который я создал, может получить product code
безупречно, но я не могу найти ни малейшего представления о том, как получить название серии вместе с кодом продукта.
Я пытался до сих пор:
import requests
from bs4 import BeautifulSoup
link = 'https://www.theimagingsource.com/products/industrial-cameras/usb-3.1-monochrome/'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
res = s.get(link)
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select("td.product-code > a[title]"):
print(item.get_text(strip=True))
Вывод, который я получаю:
DMK 38UX267
DMK 38UX255
DMK 38UX304
DMK 38UX253
DMK 37AUX287
DMK 37AUX273
DMK 37AUX290
Результат, который я хочу получить:
38 series - USB 3.1 monochrome industrial cameras DMK 38UX267
38 series - USB 3.1 monochrome industrial cameras DMK 38UX255
38 series - USB 3.1 monochrome industrial cameras DMK 38UX304
38 series - USB 3.1 monochrome industrial cameras DMK 38UX253
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX287
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX273
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX290
и так далее —
Ответ №1:
Воспользуйся .find_previous('h3')
:
import requests
from bs4 import BeautifulSoup
link = "https://www.theimagingsource.com/products/industrial-cameras/usb-3.1-monochrome/"
with requests.Session() as s:
s.headers[
"User-Agent"
] = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
res = s.get(link)
soup = BeautifulSoup(res.text, "lxml")
for item in soup.select("td.product-code > a[title]"):
print(
item.find_previous("h3").get_text(strip=True),
item.get_text(strip=True),
)
С принтами:
38 series - USB 3.1 monochrome industrial cameras DMK 38UX267
38 series - USB 3.1 monochrome industrial cameras DMK 38UX255
38 series - USB 3.1 monochrome industrial cameras DMK 38UX304
38 series - USB 3.1 monochrome industrial cameras DMK 38UX253
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX287
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX273
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX290
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX252
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX265
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX250
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX264
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX178
37 A series - USB 3.1 monochrome industrial cameras DMK 37AUX226
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX287
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX273
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX290
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX252
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX265
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX250
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX264
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX178
37 B series - USB 3.1 monochrome industrial cameras DMK 37BUX226