javascript - regular expression : ignore html tags -
i have html content this:
<p>the bedding hardly <strong>able cover</strong> , seemed ready slide off moment.</p>
here's complete version of html. http://collabedit.com/gkuc2
i need search string hardly able cover (just example), want ignore html tags inside string i'm looking for. because in html file there's html tags inside string , simple search won't find it.
the use case is: have 2 versions of file:
- an html file text , tags
- the same file raw text (removed tags , spaces)
the sub-string want search (the needle) text version (that doesn't contain html tag) , want find it's position in html version (the file has tags).
what regular expression work?
put between each letter:
(?:<[^>]+>)* and replace spaces with:
(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)* like:
h(?:<[^>]+>)*a(?:<[^>]+>)*r(?:<[^>]+>)*d(?:<[^>]+>)*l(?:<[^>]+>)*y(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*a(?:<[^>]+>)*b(?:<[^>]+>)*l(?:<[^>]+>)*e(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*t(?:<[^>]+>)*o(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*c(?:<[^>]+>)*o(?:<[^>]+>)*v(?:<[^>]+>)*e(?:<[^>]+>)*r you need ones between each letter if want allow tags break words, like: this b<b>old</b>
this without letter break:
hardly(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*able(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*to(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*cover this should work cases. however, if html malformed in < or > not htmlencoded, may run issues. may break on script blocks or other elements cdata sections.
Comments
Post a Comment