regex - How to use sed to fix an xml issue -
i have xml following (invalid) structure
<tag1>text1<tag2>text2</tag1><tag3>text3</tag3><tag1></tag2>text4</tag1> i want use sed change into
<tag1>text1<tag2>text2<tag3>text3</tag3></tag2>text4</tag1> i.e. want remove </tag1>...<tag1> (and move in between under enclosing tag1), if encounter invalid xml substring <tag1></*
i have tried using sed without success (one such attempt below)
sed -e 's/<\/tag1>\(.*\)<tag1><\//\1<\//g' it work example above, if have 2 occurrence of same condition removes first </tag1> , last <tag1> instead of performing replacement twice
echo '<tag1>text1<tag2>text2</tag1><tag3>text3</tag3><tag1></tag2>text4</tag1><tag1>text5<tag4>text6</tag1><tag3>text7</tag3><tag1></tag4>text8</tag1>' | sed -e 's/<\/tag1>\(.*\)<tag1><\//\1<\//g' outputs
<tag1>text1<tag2>text2<tag3>text3</tag3><tag1></tag2>text4</tag1><tag1>text5<tag4>text6</tag1><tag3>text7</tag3></tag4>text8</tag1> i think sed expands re cover largest selection, should if not want such thing ?
you want non-greedy matching, best of knowledge, sed doesn't support it. can use perl or have use sed?
try: perl -p -e 's/<\/tag1>(.*?)<tag1>(\<\/.+?<\/tag1>)/\1\2/g'
i think issue regex has match through end of actual closing or else closing tag becomes beginning of next match.
Comments
Post a Comment