Перенос определенных строк из одного XML-файла в другой

#python #xml #elementtree

#python #xml #elementtree

Вопрос:

Я обрабатываю все записи из дампа stackoverflow. Поскольку он такой большой и выполнение любой из моих программ занимает так много времени, я хотел бы создать отдельный XML-файл, содержащий только записи с интересующими меня тегами. Я пытаюсь использовать ElementTree для выполнения этой задачи. Я могу найти нужные записи, но у меня возникают проблемы с записью их в другой XML-файл.

 import xml.etree.ElementTree as ET

if __name__ == '__main__':
    posts = ET.Element('data')
    row = ER.SubElement(posts, "row")
    tree = ET.parse('Posts.xml')
    root = tree.getroot()

    for child in root:
        if child.get('Tags') and 'pytorch' in child.get('Tags') or child.get('Tags') and 'tensorflow' in child.get('Tags') or child.get('Tags') and 'keras' in child.get('Tags'):
            ET.SubElement(row, child)

    mydata = ET.tostring(posts)
    myfile = open("subposts.xml", "w")
    myfile.write(mydata)
  

Однако я получаю сообщение об ошибке:

  File "/local/mez2113/stackoverflow/create_sub_posts.py", line 13, in <module>
    mydata = ET.tostring(posts)
  File "/opt/anaconda3/lib/python3.7/xml/etree/ElementTree.py", line 1136, in tostring
    short_empty_elements=short_empty_elements)
  File "/opt/anaconda3/lib/python3.7/xml/etree/ElementTree.py", line 774, in write
    qnames, namespaces = _namespaces(self._root, default_namespace)
  File "/opt/anaconda3/lib/python3.7/xml/etree/ElementTree.py", line 886, in _namespaces
    _raise_serialization_error(tag)
  File "/opt/anaconda3/lib/python3.7/xml/etree/ElementTree.py", line 1058, in _raise_serialization_error
    "cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize <Element 'row' at 0x7f2b2f9dcf98> (type Element)
  

Пример исходного XML:

 <posts>
      <row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="261" ViewCount="16799" Body="amp;<pamp;>I have an absolutely positioned amp;<codeamp;>divamp;</codeamp;> containing several children, one of which is a relatively positioned amp;<codeamp;>divamp;</codeamp;>. When I use a amp;<strongamp;>percentage-based widthamp;</strongamp;> on the child amp;<codeamp;>divamp;</codeamp;>, it collapses to '0' width on amp;<a href=amp;quot;http://en.wikipedia.org/wiki/Internet_Explorer_7amp;quot; rel=amp;quot;noreferreramp;quot;amp;>Internetamp;amp;nbsp;Exploreramp;amp;nbsp;7amp;</aamp;>, but not on Firefox or Safari.amp;</pamp;>amp;#xA;amp;#xA;amp;<pamp;>If I use amp;<strongamp;>pixel widthamp;</strongamp;>, it works. If the parent is relatively positioned, the percentage width on the child works.amp;</pamp;>amp;#xA;amp;#xA;amp;<olamp;>amp;#xA;amp;<liamp;>Is there something I'm missing here?amp;</liamp;>amp;#xA;amp;<liamp;>Is there an easy fix for this besides the amp;<emamp;>pixel-based widthamp;</emamp;> on theamp;#xA;child?amp;</liamp;>amp;#xA;amp;<liamp;>Is there an area of the CSS specification that covers this?amp;</liamp;>amp;#xA;amp;</olamp;>amp;#xA;" OwnerUserId="9" LastEditorUserId="63550" LastEditorDisplayName="Rich B" LastEditDate="2016-03-19T06:05:48.487" LastActivityDate="2018-10-16T16:54:34.953" Title="Percentage width child element in absolutely positioned parent on Internet Explorer 7" Tags="amp;<pytorchamp;>amp;<hickamp;>amp;<css3amp;>amp;<internet-explorer-7amp;>" AnswerCount="6" CommentCount="0" FavoriteCount="12" />
      <row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="261" ViewCount="16799" Body="amp;<pamp;>I have an absolutely positioned amp;<codeamp;>divamp;</codeamp;> containing several children, one of which is a relatively positioned amp;<codeamp;>divamp;</codeamp;>. When I use a amp;<strongamp;>percentage-based widthamp;</strongamp;> on the child amp;<codeamp;>divamp;</codeamp;>, it collapses to '0' width on amp;<a href=amp;quot;http://en.wikipedia.org/wiki/Internet_Explorer_7amp;quot; rel=amp;quot;noreferreramp;quot;amp;>Internetamp;amp;nbsp;Exploreramp;amp;nbsp;7amp;</aamp;>, but not on Firefox or Safari.amp;</pamp;>amp;#xA;amp;#xA;amp;<pamp;>If I use amp;<strongamp;>pixel widthamp;</strongamp;>, it works. If the parent is relatively positioned, the percentage width on the child works.amp;</pamp;>amp;#xA;amp;#xA;amp;<olamp;>amp;#xA;amp;<liamp;>Is there something I'm missing here?amp;</liamp;>amp;#xA;amp;<liamp;>Is there an easy fix for this besides the amp;<emamp;>pixel-based widthamp;</emamp;> on theamp;#xA;child?amp;</liamp;>amp;#xA;amp;<liamp;>Is there an area of the CSS specification that covers this?amp;</liamp;>amp;#xA;amp;</olamp;>amp;#xA;" OwnerUserId="9" LastEditorUserId="63550" LastEditorDisplayName="Rich B" LastEditDate="2016-03-19T06:05:48.487" LastActivityDate="2018-10-16T16:54:34.953" Title="Percentage width child element in absolutely positioned parent on Internet Explorer 7" Tags="amp;<pytorchamp;>amp;<cssamp;>amp;<css3amp;>amp;<internet-explorer-7amp;>" AnswerCount="6" CommentCount="0" FavoriteCount="12" />
</posts>
  

Комментарии:

1. Пожалуйста, добавьте образец xml

2. child имеет тип, <class 'xml.etree.ElementTree.Element'> который вы не можете передать вместо этого в ElementTree.SubElement use Element.append .

3. @stovfl итак, я бы использовал ET.Element.append(child) в этом случае?

Ответ №1:

Спасибо за всю помощь в комментариях!!

 import xml.etree.ElementTree as ET

if __name__ == '__main__':
    posts = ET.Element('data')
    tree = ET.parse('Sub_posts.xml')
    root = tree.getroot()

    for child in root:
        if child.get('Tags') and 'pytorch' in child.get('Tags') or child.get('Tags') and 'tensorflow' in child.get('Tags') or child.get('Tags') and 'keras' in child.get('Tags'):
            posts.append(child)

    mydata = ET.tostring(posts).decode()
    myfile = open("subposts.xml", "w")
    myfile.write(mydata)
  

Альтернатива для 'Tags' сопоставления:

 tags1 = set(['pytorch', 'tensorflow', 'keras'])
for child in root:
    if tags1 amp; set([t[1:] for t in child.get('Tags').split('>') if t]):
        print('match')