#python #selenium #beautifulsoup #screen-scraping
#python #селен #beautifulsoup #очистка экрана
Вопрос:
Я хочу очистить страницу трендов Github и придумал этот код. По какой-то причине он не работал должным образом и вместо этого выдавал какой-то другой код сеанса. Есть идеи, почему? Вот мой код —
#!/usr/bin/python3
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Firefox()
driver.get('https://github.com/trending')
content_element = driver.find_elements_by_xpath("/html/body/div[4]/main/div[3]/div/div[2]/article[1]/h1/a")
for element in content_element:
print(element)
driver.close()
Спасибо
Ответ №1:
Вы можете извлечь все трендовые репозитории, beautifulsoup
используя этот пример:
import requests
from bs4 import BeautifulSoup
url = 'https://github.com/trending'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
for a in soup.select('article h1 a'):
print('{:<50} {}'.format(a.get_text(strip=True, separator=' '), 'https://github.com' a['href']))
С принтами:
cli / cli https://github.com/cli/cli
gnebbia / kb https://github.com/gnebbia/kb
schollz / croc https://github.com/schollz/croc
onevcat / Kingfisher https://github.com/onevcat/Kingfisher
moby / moby https://github.com/moby/moby
matterport / Mask_RCNN https://github.com/matterport/Mask_RCNN
google / googletest https://github.com/google/googletest
FreeCAD / FreeCAD https://github.com/FreeCAD/FreeCAD
iamadamdev / bypass-paywalls-chrome https://github.com/iamadamdev/bypass-paywalls-chrome
vuejs / vue-next https://github.com/vuejs/vue-next
microsoft / onefuzz https://github.com/microsoft/onefuzz
twintproject / twint https://github.com/twintproject/twint
lyhue1991 / eat_tensorflow2_in_30_days https://github.com/lyhue1991/eat_tensorflow2_in_30_days
snakers4 / silero-models https://github.com/snakers4/silero-models
hediet / vscode-debug-visualizer https://github.com/hediet/vscode-debug-visualizer
tannerlinsley / react-query https://github.com/tannerlinsley/react-query
proxysu / windows https://github.com/proxysu/windows
mozilla / send https://github.com/mozilla/send
jaywcjlove / linux-command https://github.com/jaywcjlove/linux-command
material-shell / material-shell https://github.com/material-shell/material-shell
iamkun / dayjs https://github.com/iamkun/dayjs
swisskyrepo / PayloadsAllTheThings https://github.com/swisskyrepo/PayloadsAllTheThings
TheCherno / Hazel https://github.com/TheCherno/Hazel
HeroTransitions / Hero https://github.com/HeroTransitions/Hero
pytorch / pytorch https://github.com/pytorch/pytorch