python - XPath Child Traversal Methods and Performance -


i'm using lxml on python 2.7.

given node, node , child, child_element, difference between these: node.xpath('./child_element')

node.xpath("*[local-name()='child_element']")

in other words, what's going on under hood here? there reason 1 ought "better" (in terms of performance or correctness)?

i've read through lxml docs , deal of other xpath query resources , not finding real clarification.

it's question, not easy find answer.

the main difference local-name() not consider prefixes (namespaces) tags.

for example, given node <x:html xmlns:x="http://www.w3.org/1999/xhtml"/>, local-name match html tag, while //html not work, , neither //x:html.

please consider following code, if have questions feel free ask.

show me code

setup:

from lxml.etree import fromstring tree = fromstring('<x:html xmlns:x="http://www.w3.org/1999/xhtml"/>') 

it not possible use tag selector:

tree.xpath('//html') # []  tree.xpath('//x:html') # xpathevalerror: undefined namespace prefix 

but using local-name can still element (considering namespace)

tree.xpath('//*[local-name() = "html"]') # [<element {http://www.w3.org/1999/xhtml}html @ 0x103b8d848>] 

or strict namespace using name():

tree.xpath('//*[name() = "x:html"]') # [<element {http://www.w3.org/1999/xhtml}html @ 0x103b8d848>] 

performance

i parsed website tree , used following queries:

%timeit tree.xpath('//*[local-name() = "div"]') # 1000 loops, best of 3: 570 µs per loop  %timeit tree.xpath('//div') # 10000 loops, best of 3: 44.4 µs per loop 

now onto actual namespaces. parsed block here.

example = """ ... """ lxml.etree import fromstring tree = fromstring(example)  %timeit tree.xpath('//hr:author',                     namespaces = {'hr' : 'http://eric.van-der-vlist.com/ns/person'}) # 100000 loops, best of 3: 18.2 µs per loop  %timeit tree.xpath('//*[local-name() = "author"]') # 10000 loops, best of 3: 37.7 µs per loop 

conclusion

i had rewrite conclusion since after using namespace method became obvious gain when using namespaces there. 2 times faster when specifying namespace (causing optimizations), rather using local-name.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -