regex - How to use sed to fix an xml issue -

i have xml following (invalid) structure

<tag1>text1<tag2>text2</tag1><tag3>text3</tag3><tag1></tag2>text4</tag1>

i want use sed change into

<tag1>text1<tag2>text2<tag3>text3</tag3></tag2>text4</tag1>

i.e. want remove </tag1>...<tag1> (and move in between under enclosing tag1), if encounter invalid xml substring <tag1></*

i have tried using sed without success (one such attempt below)

sed -e 's/<\/tag1>\(.*\)<tag1><\//\1<\//g'

it work example above, if have 2 occurrence of same condition removes first </tag1> , last <tag1> instead of performing replacement twice

echo '<tag1>text1<tag2>text2</tag1><tag3>text3</tag3><tag1></tag2>text4</tag1><tag1>text5<tag4>text6</tag1><tag3>text7</tag3><tag1></tag4>text8</tag1>' | sed -e 's/<\/tag1>\(.*\)<tag1><\//\1<\//g'

outputs

<tag1>text1<tag2>text2<tag3>text3</tag3><tag1></tag2>text4</tag1><tag1>text5<tag4>text6</tag1><tag3>text7</tag3></tag4>text8</tag1>

i think sed expands re cover largest selection, should if not want such thing ?

you want non-greedy matching, best of knowledge, sed doesn't support it. can use perl or have use sed?

try: perl -p -e 's/<\/tag1>(.*?)<tag1>(\<\/.+?<\/tag1>)/\1\2/g'

i think issue regex has match through end of actual closing or else closing tag becomes beginning of next match.

Search This Blog

Brant

regex - How to use sed to fix an xml issue -

Comments

Post a Comment

Popular posts from this blog

searchKeyword not working in AngularJS filter -

sequelize.js - Sequelize: sort by enum cases -

user interface - how to replace an ongoing process of image capture from another process call over the same ImageLabel in python's GUI TKinter -