Получаются только значения с определенным месяцем и годом

#python #html #html-table #python-requests

Вопрос:

У меня есть список HTML, и он содержит таблицу. Я хотел бы записать все столбцы в список.

Самые свежие данные всегда помечены lt;tr class=""gt; . Однако я не всегда хочу получать доступ к новейшим данным, а только к определенным данным. Если вы посмотрите на веб-сайт, вы увидите, что там есть данные за каждый месяц.

Теперь я хотел бы сказать, что хотел бы получить данные за август 2021 года. Теперь у меня следующая проблема: на каждый месяц приходится семь файлов. Первые 5 отмечены днем, месяцем и годом. Однако последние два отмечены N/A , но все равно относятся к одному и тому же дню/месяцу.

Как я могу получить всю информацию за август 2021 года?

 import requests import re from bs4 import BeautifulSoup from datetime import datetime   DATASET_URL = "http://insideairbnb.com/get-the-data.html" DATASET_CITY = "Antwerp" DATASET_MONTHYEAR = "09.2021"  # Converts 29 September, 2021 to 09.2021 def datetimeConverter(d):  if(d == 'N/A'):  return 'N/A'  return datetime.strptime(d, '%m.%Y').strftime('%d %B, %Y')  #use requests r = requests.get(DATASET_URL) content = r.content  #soup! soup = BeautifulSoup(content, "html.parser")   city_table = soup.find(class_=DATASET_CITY.lower())  print(city_table)  
 table class="table table-hover table-striped antwerp"gt; lt;theadgt; lt;trgt; lt;th class="col-md-3" data-field="host_id"gt;Date Compiledlt;/thgt; lt;th class="col-md-3" data-field="host_id"gt;Country/Citylt;/thgt; lt;th class="col-md-3" data-field="host_id"gt;File Namelt;/thgt; lt;th class="col-md-3" data-align="right" data-field="count"gt;  Description  lt;/thgt; lt;/trgt; lt;/theadgt; lt;tbodygt; lt;tr class=""gt; lt;tdgt;29 September, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/listings.csv.gz" onclick="var that=this;ga('send','event', 'download','listings',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;listings.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Listings data for Antwerplt;/tdgt; lt;/trgt; lt;tr class=""gt; lt;tdgt;29 September, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/calendar.csv.gz" onclick="var that=this;ga('send','event', 'download','calendar',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;calendar.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Calendar Data for listings in Antwerplt;/tdgt; lt;/trgt; lt;tr class=""gt; lt;tdgt;29 September, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/reviews.csv.gz" onclick="var that=this;ga('send','event', 'download','reviews',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;reviews.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Review Data for listings in Antwerplt;/tdgt; lt;/trgt; lt;tr class=""gt; lt;tdgt;29 September, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/visualisations/listings.csv" onclick="var that=this;ga('send','event', 'download','listings_visualisation',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;listings.csvlt;/agt;lt;/tdgt; lt;tdgt;Summary information and metrics for listings in Antwerp (good for visualisations).lt;/tdgt; lt;/trgt; lt;tr class=""gt; lt;tdgt;29 September, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/visualisations/reviews.csv" onclick="var that=this;ga('send','event', 'download','reviews_visualisation',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt; reviews.csvlt;/agt;lt;/tdgt; lt;tdgt;Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing).lt;/tdgt; lt;/trgt; lt;tr class=""gt; lt;tdgt;N/Alt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/visualisations/neighbourhoods.csv" onclick="var that=this;ga('send','event', 'download','neighbourhoods',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;neighbourhoods.csvlt;/agt;lt;/tdgt; lt;tdgt;Neighbourhood list for geo filter. Sourced from city or open source GIS files.lt;/tdgt; lt;/trgt; lt;tr class=""gt; lt;tdgt;N/Alt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/visualisations/neighbourhoods.geojson" onclick="var that=this;ga('send','event', 'download','geojson',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;neighbourhoods.geojsonlt;/agt;lt;/tdgt; lt;tdgt;GeoJSON file of neighbourhoods of the city.lt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/data/listings.csv.gz" onclick="var that=this;ga('send','event', 'download','listings',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;listings.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Listings data for Antwerplt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/data/calendar.csv.gz" onclick="var that=this;ga('send','event', 'download','calendar',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;calendar.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Calendar Data for listings in Antwerplt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/data/reviews.csv.gz" onclick="var that=this;ga('send','event', 'download','reviews',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;reviews.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Review Data for listings in Antwerplt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/listings.csv" onclick="var that=this;ga('send','event', 'download','listings_visualisation',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;listings.csvlt;/agt;lt;/tdgt; lt;tdgt;Summary information and metrics for listings in Antwerp (good for visualisations).lt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/reviews.csv" onclick="var that=this;ga('send','event', 'download','reviews_visualisation',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt; reviews.csvlt;/agt;lt;/tdgt; lt;tdgt;Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing).lt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;N/Alt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/neighbourhoods.csv" onclick="var that=this;ga('send','event', 'download','neighbourhoods',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;neighbourhoods.csvlt;/agt;lt;/tdgt; lt;tdgt;Neighbourhood list for geo filter. Sourced from city or open source GIS files.lt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;N/Alt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/neighbourhoods.geojson" onclick="var that=this;ga('send','event', 'download','geojson',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;neighbourhoods.geojsonlt;/agt;lt;/tdgt; lt;tdgt;GeoJSON file of neighbourhoods of the city.lt;/tdgt; lt;/trgt;  

What I Want

enter image description here

The list should look like at the end

 list= [["http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/data/listings.csv.gz",   "listings.csv.gz",  "Description", "27 August, 2021"]  ,[...],  ["http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/neighbourhoods.geojson",   "neighbourhoods.geojson",  "Description", "27 August, 2021"]]  

The html what I want

 lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/data/listings.csv.gz" onclick="var that=this;ga('send','event', 'download','listings',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;listings.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Listings data for Antwerplt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/data/calendar.csv.gz" onclick="var that=this;ga('send','event', 'download','calendar',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;calendar.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Calendar Data for listings in Antwerplt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/data/reviews.csv.gz" onclick="var that=this;ga('send','event', 'download','reviews',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;reviews.csv.gzlt;/agt;lt;/tdgt; lt;tdgt;Detailed Review Data for listings in Antwerplt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/listings.csv" onclick="var that=this;ga('send','event', 'download','listings_visualisation',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;listings.csvlt;/agt;lt;/tdgt; lt;tdgt;Summary information and metrics for listings in Antwerp (good for visualisations).lt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;27 August, 2021lt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/reviews.csv" onclick="var that=this;ga('send','event', 'download','reviews_visualisation',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt; reviews.csvlt;/agt;lt;/tdgt; lt;tdgt;Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing).lt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;N/Alt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/neighbourhoods.csv" onclick="var that=this;ga('send','event', 'download','neighbourhoods',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;neighbourhoods.csvlt;/agt;lt;/tdgt; lt;tdgt;Neighbourhood list for geo filter. Sourced from city or open source GIS files.lt;/tdgt; lt;/trgt; lt;tr class="archived"gt; lt;tdgt;N/Alt;/tdgt; lt;tdgt;Antwerplt;/tdgt; lt;tdgt;lt;a href="http://data.insideairbnb.com/belgium/vlg/antwerp/2021-08-27/visualisations/neighbourhoods.geojson" onclick="var that=this;ga('send','event', 'download','geojson',this.href);setTimeout(function(){location.href=that.href;},200);return false;"gt;neighbourhoods.geojsonlt;/agt;lt;/tdgt; lt;tdgt;GeoJSON file of neighbourhoods of the city.lt;/tdgt; lt;/trgt;  

Ответ №1:

Вы должны отфильтровать результат дальше, чтобы получить требуемый результат.

Приведенный ниже код должен дать вам желаемые результаты:

 import requests from bs4 import BeautifulSoup from datetime import datetime  DATASET_URL = "http://insideairbnb.com/get-the-data.html" DATASET_CITY = "Antwerp" DATASET_MONTHYEAR = "09.2021"  # Converts 09.2021 to 2021-09 def datetimeConverter(d):  return datetime.strptime(d, '%m.%Y').strftime('%Y-%m')   def filter_records_by_date(record):  return datetimeConverter(DATASET_MONTHYEAR) in record.find_all('td')[2].a.get('href')   # use requests r = requests.get(DATASET_URL) content = r.content  # soup! soup = BeautifulSoup(content, "html.parser") city_table = soup.find(class_=DATASET_CITY.lower()).findAll('tr')[1:] filtered_city_table = list(filter(filter_records_by_date, city_table)) query_date = filtered_city_table[0].td.text data = [] for row in filtered_city_table:  cells = row.findChildren('td')  val = [cells[2].find('a').get("href"), cells[3].text, query_date]  data.append(val) print(data) print(len(data))  

Выход: [ ['http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/listings.csv.gz', 'Detailed Listings data for Antwerp', '29 September, 2021'], ['http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/calendar.csv.gz', 'Detailed Calendar Data for listings in Antwerp', '29 September, 2021'], ['http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/data/reviews.csv.gz', 'Detailed Review Data for listings in Antwerp', '29 September, 2021'], ['http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/visualisations/listings.csv', 'Summary information and metrics for listings in Antwerp (good for visualisations).', '29 September, 2021'], ['http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/visualisations/reviews.csv', 'Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing).', '29 September, 2021'], ['http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/visualisations/neighbourhoods.csv', 'Neighbourhood list for geo filter. Sourced from city or open source GIS files.', '29 September, 2021'], ['http://data.insideairbnb.com/belgium/vlg/antwerp/2021-09-29/visualisations/neighbourhoods.geojson', 'GeoJSON file of neighbourhoods of the city.', '29 September, 2021']]

Комментарии:

1. Большое спасибо. Я хотел бы указать только 08.2021. Как я мог тогда сказать, что он должен рассмотреть 27 августа 2021 года, а не сентябрь?

2. Я хочу установить дату и прочитать правильные данные на основе даты. Единственное, чего не хватает, — это возможности установить дату.

3. Вы бы указали «месяц, год» (например, 08.2021) или конкретную дату «27 августа 2021 года»?

4. Я имею в виду, что я ввожу только месяц и год, например 08.2021 (August 2021) , и затем он должен найти все значения, которые были созданы в августе, и тогда это будет 27 August, 2021 . Поэтому следует учитывать только месяц и год, день не важен. Но я хотел бы зачитать разные месяцы.

5. Нравится DATASET_MONTHYEAR = "08.2021"