Удалите несколько ссылок из файла json

#python #json #scrapy #web-crawler

#питон #json #скребок #веб-обходчик

Вопрос:

Я пытаюсь очистить несколько ссылок, которые я ранее очистил и сохранил в файле json.

это работает до сих пор, но я не хочу просто удалять этот один URL-адрес, но все из моего файла json.

 import scrapy import json  class RatingSpider(scrapy.Spider):  name = "rating"   def start_requests(self):  urls = [  'https://www.darkpattern.games/game/3478/0/ragnarok-m-eternal-love-rom-.html'  ]  for url in urls:  yield scrapy.Request(url=url, callback=self.parse)    def parse(self, response):  for rating in response.css('div.score_box'):  yield {  'reported': rating.css('div.score_heading *::text').extract()    }

файл json выглядит следующим образом

 [  {  "title": [  "ntttttt",  "Ragnarok M: Eternal Love(ROM)",  "ntttttt",  "tttttt",  "The classic adventure returns",  "nttttt"  ],  "link": [  "/game/3478/0/ragnarok-m-eternal-love-rom-.html"  ],  "rating": [  "ntttttt",  "ntttttt",  "-4.68",  "nttttt"  ]  } ]

есть какие-нибудь предложения о том, как это сделать?

Ответ №1:

Я не вижу в вашем примере, где вы читаете из своего файла json. Вам нужно было бы сделать что-то вроде этого:

 with open("your json file", "r") as f:  jsonlist = json.load(f)  for i in range(len(jsonlist)):  url = jsonlist[i]["link"][0] do something with url - run request or store in list, etc. Also, Your sample json contains a relative url so I assume the rest of the file is the same and the base url is https://www.darkpattern.games so you would need to concatenate the base url - https://www.darkpattern.games - and the relative urls prior to running the requests.

Вопрос:

Ответ №1:

Комментарии:

Вам также может понравиться

Регулярное выражение пути Python необязательно совпадает

Как присвоить значение выбранной опции из выпадающего списка переменной в jsp?

Создайте записную книжку OneNote в библиотеке документов (кроме OneDrive) с помощью api graph