#web-scraping #beautifulsoup #tags
#очистка веб-страниц #beautifulsoup #Теги
Вопрос:
Я застрял с получением информации из HTML-кода с помощью beautifulsoup. Я извлек фрагмент HTML ниже, выполнив следующие шаги:
result = requests.get(url, headers = headers)
soup = BeautifulSoup(result.text, 'lxml')
tably = soup.find("table", id="table4")
last_row = tably.findAll('tr')[-1]
Теперь я хочу получить следующий результат:
Classification: Mass murderer
Characteristics: Militant Al-Takfir wa al-Hijran (Renunciation and Exile) faction
Number of victims: 23
Пример HTML:
<tr>
<td style="font-size: 8pt; color: #000000" width="100%">
<style color="#000000" face="Verdana">
Classification: <b>Mass murderer</b></font></td>
</tr>
<tr>
<td width="100%" style="font-size: 8pt; color: #000000">
<style="font-size: 8pt" color="#000000" face="Verdana">
Characteristics:amp;nbsp;<b>Militant Al-Takfir wa
al-Hijran </b>(Renunciation and Exile)<b> faction</b></font></td>
</tr>
<tr>
<td width="100%" style="font-size: 8pt; color: #000000">
<style="font-size: 8pt" color="#000000" face="Verdana">
Number of victims:amp;nbsp;<b>23</b></font></td>
</tr>
</font>
Ответ №1:
Возможно, вы захотите попробовать это:
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
headers = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36"
}
page = requests.get("https://murderpedia.org/male.A/a/abbas.htm", headers=headers).text
table = BeautifulSoup(page, "html5lib").find("table", {"id": "table4"})
output = [
" ".join(i.getText(strip=True).split()).split(":") for i
in table.find_all("td") if i.getText(strip=True)
][:9]
print(tabulate(output))
Вывод:
----------------- --------------------------------------------------------------
Classification Mass murderer
Characteristics Militant Al-Takfir wa al-Hijran(Renunciation and Exile)faction
Number of victims 23
Date of murders December 8,2000
Date of birth 1967
Victims profile Maleworshippers
Method of murder Shooting(Kalashnikov assault rifle)
Location Omdurman, Sudan
Status Shot to death by police