#python #xml #elementtree
#python #xml #elementtree
Вопрос:
Я обрабатываю все записи из дампа stackoverflow. Поскольку он такой большой и выполнение любой из моих программ занимает так много времени, я хотел бы создать отдельный XML-файл, содержащий только записи с интересующими меня тегами. Я пытаюсь использовать ElementTree для выполнения этой задачи. Я могу найти нужные записи, но у меня возникают проблемы с записью их в другой XML-файл.
import xml.etree.ElementTree as ET
if __name__ == '__main__':
posts = ET.Element('data')
row = ER.SubElement(posts, "row")
tree = ET.parse('Posts.xml')
root = tree.getroot()
for child in root:
if child.get('Tags') and 'pytorch' in child.get('Tags') or child.get('Tags') and 'tensorflow' in child.get('Tags') or child.get('Tags') and 'keras' in child.get('Tags'):
ET.SubElement(row, child)
mydata = ET.tostring(posts)
myfile = open("subposts.xml", "w")
myfile.write(mydata)
Однако я получаю сообщение об ошибке:
File "/local/mez2113/stackoverflow/create_sub_posts.py", line 13, in <module>
mydata = ET.tostring(posts)
File "/opt/anaconda3/lib/python3.7/xml/etree/ElementTree.py", line 1136, in tostring
short_empty_elements=short_empty_elements)
File "/opt/anaconda3/lib/python3.7/xml/etree/ElementTree.py", line 774, in write
qnames, namespaces = _namespaces(self._root, default_namespace)
File "/opt/anaconda3/lib/python3.7/xml/etree/ElementTree.py", line 886, in _namespaces
_raise_serialization_error(tag)
File "/opt/anaconda3/lib/python3.7/xml/etree/ElementTree.py", line 1058, in _raise_serialization_error
"cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize <Element 'row' at 0x7f2b2f9dcf98> (type Element)
Пример исходного XML:
<posts>
<row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="261" ViewCount="16799" Body="amp;<pamp;>I have an absolutely positioned amp;<codeamp;>divamp;</codeamp;> containing several children, one of which is a relatively positioned amp;<codeamp;>divamp;</codeamp;>. When I use a amp;<strongamp;>percentage-based widthamp;</strongamp;> on the child amp;<codeamp;>divamp;</codeamp;>, it collapses to '0' width on amp;<a href=amp;quot;http://en.wikipedia.org/wiki/Internet_Explorer_7amp;quot; rel=amp;quot;noreferreramp;quot;amp;>Internetamp;amp;nbsp;Exploreramp;amp;nbsp;7amp;</aamp;>, but not on Firefox or Safari.amp;</pamp;>amp;#xA;amp;#xA;amp;<pamp;>If I use amp;<strongamp;>pixel widthamp;</strongamp;>, it works. If the parent is relatively positioned, the percentage width on the child works.amp;</pamp;>amp;#xA;amp;#xA;amp;<olamp;>amp;#xA;amp;<liamp;>Is there something I'm missing here?amp;</liamp;>amp;#xA;amp;<liamp;>Is there an easy fix for this besides the amp;<emamp;>pixel-based widthamp;</emamp;> on theamp;#xA;child?amp;</liamp;>amp;#xA;amp;<liamp;>Is there an area of the CSS specification that covers this?amp;</liamp;>amp;#xA;amp;</olamp;>amp;#xA;" OwnerUserId="9" LastEditorUserId="63550" LastEditorDisplayName="Rich B" LastEditDate="2016-03-19T06:05:48.487" LastActivityDate="2018-10-16T16:54:34.953" Title="Percentage width child element in absolutely positioned parent on Internet Explorer 7" Tags="amp;<pytorchamp;>amp;<hickamp;>amp;<css3amp;>amp;<internet-explorer-7amp;>" AnswerCount="6" CommentCount="0" FavoriteCount="12" />
<row Id="6" PostTypeId="1" AcceptedAnswerId="31" CreationDate="2008-07-31T22:08:08.620" Score="261" ViewCount="16799" Body="amp;<pamp;>I have an absolutely positioned amp;<codeamp;>divamp;</codeamp;> containing several children, one of which is a relatively positioned amp;<codeamp;>divamp;</codeamp;>. When I use a amp;<strongamp;>percentage-based widthamp;</strongamp;> on the child amp;<codeamp;>divamp;</codeamp;>, it collapses to '0' width on amp;<a href=amp;quot;http://en.wikipedia.org/wiki/Internet_Explorer_7amp;quot; rel=amp;quot;noreferreramp;quot;amp;>Internetamp;amp;nbsp;Exploreramp;amp;nbsp;7amp;</aamp;>, but not on Firefox or Safari.amp;</pamp;>amp;#xA;amp;#xA;amp;<pamp;>If I use amp;<strongamp;>pixel widthamp;</strongamp;>, it works. If the parent is relatively positioned, the percentage width on the child works.amp;</pamp;>amp;#xA;amp;#xA;amp;<olamp;>amp;#xA;amp;<liamp;>Is there something I'm missing here?amp;</liamp;>amp;#xA;amp;<liamp;>Is there an easy fix for this besides the amp;<emamp;>pixel-based widthamp;</emamp;> on theamp;#xA;child?amp;</liamp;>amp;#xA;amp;<liamp;>Is there an area of the CSS specification that covers this?amp;</liamp;>amp;#xA;amp;</olamp;>amp;#xA;" OwnerUserId="9" LastEditorUserId="63550" LastEditorDisplayName="Rich B" LastEditDate="2016-03-19T06:05:48.487" LastActivityDate="2018-10-16T16:54:34.953" Title="Percentage width child element in absolutely positioned parent on Internet Explorer 7" Tags="amp;<pytorchamp;>amp;<cssamp;>amp;<css3amp;>amp;<internet-explorer-7amp;>" AnswerCount="6" CommentCount="0" FavoriteCount="12" />
</posts>
Комментарии:
1. Пожалуйста, добавьте образец xml
2.
child
имеет тип,<class 'xml.etree.ElementTree.Element'>
который вы не можете передать вместо этого вElementTree.SubElement
use Element.append .3. @stovfl итак, я бы использовал
ET.Element.append(child)
в этом случае?
Ответ №1:
Спасибо за всю помощь в комментариях!!
import xml.etree.ElementTree as ET
if __name__ == '__main__':
posts = ET.Element('data')
tree = ET.parse('Sub_posts.xml')
root = tree.getroot()
for child in root:
if child.get('Tags') and 'pytorch' in child.get('Tags') or child.get('Tags') and 'tensorflow' in child.get('Tags') or child.get('Tags') and 'keras' in child.get('Tags'):
posts.append(child)
mydata = ET.tostring(posts).decode()
myfile = open("subposts.xml", "w")
myfile.write(mydata)
Альтернатива для
'Tags'
сопоставления:
tags1 = set(['pytorch', 'tensorflow', 'keras'])
for child in root:
if tags1 amp; set([t[1:] for t in child.get('Tags').split('>') if t]):
print('match')