#python #web-scrapin& #data-structures #beautifulsoup
#python #очистка веб-страниц #структуры данных #beautifulsoup
Вопрос:
Я пытался получить все сведения о предстоящем событии из учреждения:-
import requests
from bs4 import BeautifulSoup
response = requests.&et("http://www.iit&.ac.in/home/eventsall/events")
soup = BeautifulSoup(response.content,"html.parser")
cards = soup.find_all("div", attrs={"class": "newsarea"})
iit&_title = []
iit&_date = []
iit&_link = []
for card in cards[0:6]:
iit&_date.append(card.find("div", attrs={"class": "ndate"}).text)
iit&_title.append(card.find("div", attrs={"class": "ntitle"}).text.strip())
iit&_link.append(card.find("div", attrs={"class": "ntitle"}).a['href'])
print("Upcomin& event details scraped from iit& website:- n")
for i in ran&e(len(iit&_title)):
print("Title:- ", iit&_title[i])
print("Dates:- ", iit&_date[i])
print("Link:- ", iit&_link[i])
print('n')
И приведенный выше код предоставил мне эти детали:-
Upcomin& event details scraped from iit& website:-
Title:- 4 batch for the certification pro&ramme on AI amp; ML by Eckovation in association with Eamp;ICT Academy IIT Guwahati
Dates:- 15 Au& 2020 - 15 Au& 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Title:- 8th International and 47th National conference on Fluid Mechanics and Fluid Power
Dates:- 09 Dec 2020 - 11 Dec 2020
Link:- https://event.iit&.ac.in/fmfp2020/
Title:- 4 months Internship pro&ramme on VLSI Circuit Desi&n
Dates:- 10 Au& 2020 - 10 Dec 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Title:- 6 week Trainin& cum Internship pro&ramme on AI amp; ML under TEQIP-III or&ainsed by Assam Science Technolo&y University
Dates:- 10 Au& 2020 - 20 Sep 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Title:- 6 week Trainin& cum Internship pro&ramme on Industry 4.0 (Industrial IoT) under TEQIP-III or&ainsed by Assam Science Technolo&y University
Dates:- 10 Au& 2020 - 20 Sep 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Title:- 6 week Trainin& cum Internship pro&ramme on Robotics Fundamentals under TEQIP-III or&ainsed by Assam Science Technolo&y University
Dates:- 10 Au& 2020 - 20 Sep 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Теперь, начиная с последних пяти часов, я ломаю голову над тем, как сохранить свои результаты таким образом, чтобы я мог получить к ним доступ позже с помощью простого цикла for.
Как я могу сделать это возможным?
Комментарии:
1. Вы имеете в виду запись в файл?
2. Может быть, вы хотели бы сохранить список этих событий с
pickle
библиотекой?3. Нет, нет, я не хочу записывать эти данные во внешний файл, я тоже пробовал этот подход, но он мне почему-то не подошел, мне просто нужно сохранить значения этих трех переменных в один.
Ответ №1:
Вы можете использовать, например, json
модуль для записи данных на диск:
import json
import requests
from bs4 import BeautifulSoup
response = requests.&et("http://www.iit&.ac.in/home/eventsall/events")
soup = BeautifulSoup(response.content,"html.parser")
cards = soup.find_all("div", attrs={"class": "newsarea"})
events = []
for card in cards[0:6]:
events.append((
card.find("div", attrs={"class": "ntitle"}).text.strip(),
card.find("div", attrs={"class": "ndate"}).text,
card.find("div", attrs={"class": "ntitle"}).a['href']
))
# save data:
with open('data.json', 'w') as f_out:
json.dump(events, f_out)
# ...
# load data back:
with open('data.json', 'r') as f_in:
events = json.load(f_in)
print("Upcomin& event details scraped from iit& website:- n")
for t, d, l in events:
print("Title:- ", t)
print("Dates:- ", d)
print("Link:- ", l)
print('n')
С принтами:
Upcomin& event details scraped from iit& website:-
Title:- 4 batch for the certification pro&ramme on AI amp; ML by Eckovation in association with Eamp;ICT Academy IIT Guwahati
Dates:- 15 Au& 2020 - 15 Au& 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Title:- 8th International and 47th National conference on Fluid Mechanics and Fluid Power
Dates:- 09 Dec 2020 - 11 Dec 2020
Link:- https://event.iit&.ac.in/fmfp2020/
Title:- 4 months Internship pro&ramme on VLSI Circuit Desi&n
Dates:- 10 Au& 2020 - 10 Dec 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Title:- 6 week Trainin& cum Internship pro&ramme on AI amp; ML under TEQIP-III or&ainsed by Assam Science Technolo&y University
Dates:- 10 Au& 2020 - 20 Sep 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Title:- 6 week Trainin& cum Internship pro&ramme on Industry 4.0 (Industrial IoT) under TEQIP-III or&ainsed by Assam Science Technolo&y University
Dates:- 10 Au& 2020 - 20 Sep 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Title:- 6 week Trainin& cum Internship pro&ramme on Robotics Fundamentals under TEQIP-III or&ainsed by Assam Science Technolo&y University
Dates:- 10 Au& 2020 - 20 Sep 2020
Link:- http://eict.iit&.ac.in/online_courses_trainin&.html
Комментарии:
1. Нельзя ли решить эту проблему без записи данных в какой-либо новый файл?