python - XPath Child Traversal Methods and Performance -
i'm using lxml on python 2.7.
given node, node
, child, child_element
, difference between these: node.xpath('./child_element')
node.xpath("*[local-name()='child_element']")
in other words, what's going on under hood here? there reason 1 ought "better" (in terms of performance or correctness)?
i've read through lxml docs , deal of other xpath query resources , not finding real clarification.
it's question, not easy find answer.
the main difference local-name()
not consider prefixes (namespaces) tags.
for example, given node <x:html xmlns:x="http://www.w3.org/1999/xhtml"/>
, local-name
match html
tag, while //html
not work, , neither //x:html
.
please consider following code, if have questions feel free ask.
show me code
setup:
from lxml.etree import fromstring tree = fromstring('<x:html xmlns:x="http://www.w3.org/1999/xhtml"/>')
it not possible use tag selector:
tree.xpath('//html') # [] tree.xpath('//x:html') # xpathevalerror: undefined namespace prefix
but using local-name
can still element (considering namespace)
tree.xpath('//*[local-name() = "html"]') # [<element {http://www.w3.org/1999/xhtml}html @ 0x103b8d848>]
or strict namespace using name()
:
tree.xpath('//*[name() = "x:html"]') # [<element {http://www.w3.org/1999/xhtml}html @ 0x103b8d848>]
performance
i parsed website tree , used following queries:
%timeit tree.xpath('//*[local-name() = "div"]') # 1000 loops, best of 3: 570 µs per loop %timeit tree.xpath('//div') # 10000 loops, best of 3: 44.4 µs per loop
now onto actual namespaces. parsed block here.
example = """ ... """ lxml.etree import fromstring tree = fromstring(example) %timeit tree.xpath('//hr:author', namespaces = {'hr' : 'http://eric.van-der-vlist.com/ns/person'}) # 100000 loops, best of 3: 18.2 µs per loop %timeit tree.xpath('//*[local-name() = "author"]') # 10000 loops, best of 3: 37.7 µs per loop
conclusion
i had rewrite conclusion since after using namespace method became obvious gain when using namespaces there. 2 times faster when specifying namespace (causing optimizations), rather using local-name
.
Comments
Post a Comment