Конвейер Scrapy xml

#python #xml #scrapy #pipeline

#python #xml #scrapy #конвейер

Вопрос:

Мне нужно создать паук, который должен выводить XML-файл для любой статьи.

The pipeline.py:

 from scrapy.exporters import XmlItemExporter
from datetime import datetime

class CommonPipeline(object):
    def process_item(self, item, spider):
        return item

class XmlExportPipeline(object):
    def __init__(self):
        self.files = {}

    def process_item(self, item, spider):
        file = open((spider.name   datetime.now().strftime("_%H%M%S%f.xml")), 'w b')
        self.files[spider] = file
        self.exporter = XmlItemExporter(file)
        self.exporter.start_exporting()
        self.exporter.export_item(item)
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()
        return item

Вывод:

 <?xml version="1.0" encoding="utf-8"?>
    <items>
        <item>
            <text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora  </text_img>
            <title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
            <url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
            <content>    Nelson Argaña, hijo de Luis María Arg ...</content>
            <sum_content>4805</sum_content>
            <time>14:30:06</time>
            <date>20190323</date>
        </item>
    </items>

Но мне нужен вывод, подобный этому:

 <?xml version="1.0" encoding="iso-8859-1"?>
    <article>
        <text_img> Nelson Argaña. Foto: Gustavo Velázquez 970AM. Hace 1 hora  </text_img>
        <title>Nelson Argaña lamentó que Mario Abdo esté rodeado de corruptos </title>
        <url>https://www.lanacion.com.py/politica/2019/03/23/nelson-argana-lamento-que-mario-abdo-este-rodeado-de-corruptos/</url>
        <content>    Nelson Argaña, hijo de Luis María Arg ...</content>
        <sum_content>4805</sum_content>
        <time>14:30:06</time>
        <date>20190323</date>
    </article>

The settings.py:

 ITEM_PIPELINES = {
    'common.pipelines.XmlExportPipeline': 300,
}
FEED_EXPORTERS_BASE = {
    'xml': 'scrapy.contrib.exporter.XmlItemExporter',
}

Я попытался добавить в settings.py:

 FEED_EXPORT_ENCODING = 'iso-8859-1'
FEED_EXPORT_FIELDS = ["article"]

Но это не работает.

Я использую Scrapy 1.4.0

1. Попробуйте (внутри process_item ) — self.exporter = XmlItemExporter(file, item_element="article", root_element="articles") . Смотрите docs.scrapy.org/en/latest/topics/exporters.html#xmlitemexporter

2. Спасибо за ваш комментарий. Я пробовал этот вариант, но мне нужен только тег <article>, а не <articles><статья>. Это обязательный запрос и кодировка тоже.

3. Поэтому используйте только ‘item_element’

4. Это не работает. Root_element по умолчанию отображается как <элементы> . Я пробовал root_element=False, root_element=None, root_element =»но это не работает. То же самое происходит в обратном порядке.

Вопрос:

Комментарии:

Вам также может понравиться

Сбой Task.Start, когда действия продолжения не являются встроенными

Не удается открыть диалоговое окно структуры проекта в Android Studio 4.1

Компонент UIPickerView Selectrow не отвечает