python - XPath Child Traversal Methods and Performance -
i'm using lxml on python 2.7.
given node, node , child, child_element, difference between these: node.xpath('./child_element')
node.xpath("*[local-name()='child_element']")
in other words, what's going on under hood here? there reason 1 ought "better" (in terms of performance or correctness)?
i've read through lxml docs , deal of other xpath query resources , not finding real clarification.
it's question, not easy find answer.
the main difference local-name() not consider prefixes (namespaces) tags.
for example, given node <x:html xmlns:x="http://www.w3.org/1999/xhtml"/>, local-name match html tag, while //html not work, , neither //x:html.
please consider following code, if have questions feel free ask.
show me code
setup:
from lxml.etree import fromstring tree = fromstring('<x:html xmlns:x="http://www.w3.org/1999/xhtml"/>') it not possible use tag selector:
tree.xpath('//html') # [] tree.xpath('//x:html') # xpathevalerror: undefined namespace prefix but using local-name can still element (considering namespace)
tree.xpath('//*[local-name() = "html"]') # [<element {http://www.w3.org/1999/xhtml}html @ 0x103b8d848>] or strict namespace using name():
tree.xpath('//*[name() = "x:html"]') # [<element {http://www.w3.org/1999/xhtml}html @ 0x103b8d848>] performance
i parsed website tree , used following queries:
%timeit tree.xpath('//*[local-name() = "div"]') # 1000 loops, best of 3: 570 µs per loop %timeit tree.xpath('//div') # 10000 loops, best of 3: 44.4 µs per loop now onto actual namespaces. parsed block here.
example = """ ... """ lxml.etree import fromstring tree = fromstring(example) %timeit tree.xpath('//hr:author', namespaces = {'hr' : 'http://eric.van-der-vlist.com/ns/person'}) # 100000 loops, best of 3: 18.2 µs per loop %timeit tree.xpath('//*[local-name() = "author"]') # 10000 loops, best of 3: 37.7 µs per loop conclusion
i had rewrite conclusion since after using namespace method became obvious gain when using namespaces there. 2 times faster when specifying namespace (causing optimizations), rather using local-name.
Comments
Post a Comment