Python Scrapy - требуется исправление для FormRequest, используемого для разбивки на страницы с использованием вызовов AJAX - вроде как потеряно здесь

#python #python-3.x #ajax #scrapy

Вопрос:

 import scrapy
from lxml.html import fromstring
from ..items import PontsItems
from scrapy.http import FormRequest


class Names(scrapy.Spider):
    name = 'enseafr'

    download_delay = 5.0
    current_page = 1

    def start_requests(self):

        my_url = 'https://www.ponts.org/annuaire/ajax/loadresult'
        formdata = {'page': str(self.current_page)}
        return [
            FormRequest(my_url, formdata=formdata, callback=self.parse)
        ]

    def parse(self, response):
        items = PontsItems()

        for item in response.xpath('//div[@class="single_desc"]'):
            name = item.xpath('./div[@class="single_libel"]/a/text()').get().strip()
            description = item.xpath('./div[@class="single_details"]/div/text()').get()
            description = fromstring(description).text_content().strip()
            year = item.xpath('./div[@class="single_details"]/div/b/text()').get()

            items['name'] = name
            items['description'] = description
            items['year'] = year
            yield items

        next_page = response.xpath('//a[@class="next"]/@href').get()
        if next_page is not None:
            self.current_page = self.current_page   1
            formdata = {'page': str(self.current_page)}
            yield FormRequest(my_url, formdata=formdata, callback=self.parse)

Все, что он делает, это печатает
2020-09-16 00:50:09 [scrapy.core.engine] ОТЛАДКА: обход (200) <POST » rel=»nofollow noreferrer»>https://www.ponts.org/annuaire/ajax/loadresult > (ссылка: отсутствует)
2020-09-16 00:50:09 [scrapy.core.engine] ИНФОРМАЦИЯ: Закрытие spider (завершено)

Я вроде как потерял, как действовать здесь.

Ответ №1:

Scrapy не поддерживает загрузку javascript. Запрошенная вами страница не имеет формы, отображаемой в ответе, полученном scrapy.

Если вам нужно загрузить javascript в свой spider, попробуйте