regex - How to use sed to fix an xml issue -


i have xml following (invalid) structure

<tag1>text1<tag2>text2</tag1><tag3>text3</tag3><tag1></tag2>text4</tag1> 

i want use sed change into

<tag1>text1<tag2>text2<tag3>text3</tag3></tag2>text4</tag1> 

i.e. want remove </tag1>...<tag1> (and move in between under enclosing tag1), if encounter invalid xml substring <tag1></*

i have tried using sed without success (one such attempt below)

sed -e 's/<\/tag1>\(.*\)<tag1><\//\1<\//g' 

it work example above, if have 2 occurrence of same condition removes first </tag1> , last <tag1> instead of performing replacement twice

echo '<tag1>text1<tag2>text2</tag1><tag3>text3</tag3><tag1></tag2>text4</tag1><tag1>text5<tag4>text6</tag1><tag3>text7</tag3><tag1></tag4>text8</tag1>' | sed -e 's/<\/tag1>\(.*\)<tag1><\//\1<\//g' 

outputs

<tag1>text1<tag2>text2<tag3>text3</tag3><tag1></tag2>text4</tag1><tag1>text5<tag4>text6</tag1><tag3>text7</tag3></tag4>text8</tag1> 

i think sed expands re cover largest selection, should if not want such thing ?

you want non-greedy matching, best of knowledge, sed doesn't support it. can use perl or have use sed?

try: perl -p -e 's/<\/tag1>(.*?)<tag1>(\<\/.+?<\/tag1>)/\1\2/g'

i think issue regex has match through end of actual closing or else closing tag becomes beginning of next match.


Comments

Popular posts from this blog

searchKeyword not working in AngularJS filter -

sequelize.js - Sequelize: sort by enum cases -

user interface - how to replace an ongoing process of image capture from another process call over the same ImageLabel in python's GUI TKinter -