javascript - regular expression : ignore html tags -


i have html content this:

<p>the bedding hardly <strong>able cover</strong> , seemed ready slide off moment.</p>

here's complete version of html. http://collabedit.com/gkuc2

i need search string hardly able cover (just example), want ignore html tags inside string i'm looking for. because in html file there's html tags inside string , simple search won't find it.

the use case is: have 2 versions of file:

  • an html file text , tags
  • the same file raw text (removed tags , spaces)

the sub-string want search (the needle) text version (that doesn't contain html tag) , want find it's position in html version (the file has tags).

what regular expression work?

put between each letter:

(?:<[^>]+>)* 

and replace spaces with:

(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)* 

like:

h(?:<[^>]+>)*a(?:<[^>]+>)*r(?:<[^>]+>)*d(?:<[^>]+>)*l(?:<[^>]+>)*y(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*a(?:<[^>]+>)*b(?:<[^>]+>)*l(?:<[^>]+>)*e(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*t(?:<[^>]+>)*o(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*c(?:<[^>]+>)*o(?:<[^>]+>)*v(?:<[^>]+>)*e(?:<[^>]+>)*r 

you need ones between each letter if want allow tags break words, like: this b<b>old</b>

this without letter break:

hardly(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*able(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*to(?:\s*<[^>]+>\s*)*\s+(?:\s*<[^>]+>\s*)*cover 

this should work cases. however, if html malformed in < or > not htmlencoded, may run issues. may break on script blocks or other elements cdata sections.


Comments

Popular posts from this blog

searchKeyword not working in AngularJS filter -

sequelize.js - Sequelize: sort by enum cases -

user interface - how to replace an ongoing process of image capture from another process call over the same ImageLabel in python's GUI TKinter -