java - Regex: Want to change case of letter following one of a set, except HTML entity -


examples:

rythm&blues                   -> rythm&blues   .. don't wear white/live -> don't wear white/live 

first convert whole string lowercase (because want have uppercase @ start of word).

i using split pattern: [&/\\.\\s-] , convert parts' first letter uppercase.

it works well, except, converts html entities of course: e.g. don't converted don't entity should left alone.

while writing discover additional problem... initial conversion lowercase potentially messes html entities well. so, entities should totally left alone. (e.g. ç not same ç)

an html entity matched this: &[a-z][a-z][a-z]{1,5};

i thinking of doing groups, unfortunately find hard figure out.

this pattern seems handle situation

"\\w+|&#?\\w+;\\w*" 

there may edge cases, can adjust accordingly come up.

pattern breakdown:

  • \\w+ - match word
  • &#?\\w+;\\w* - match html entity

code sample:

public static void main(string[] args) throws exception {     string[] lines = {         "rythm&blues",         ".. don&apos;t wear white/live"     };      pattern pattern = pattern.compile("\\w+|&#?\\w+;\\w*");     (int = 0; < lines.length; i++) {         matcher matcher = pattern.matcher(lines[i]);         while (matcher.find()) {             if (matcher.group().startswith("&")) {                 // handle html entities                   // there letters after semi-colon                  // need lower case                 if (!matcher.group().endswith(";")) {                     string htmlentity = matcher.group();                     int semicolonindex = htmlentity.indexof(";");                     lines[i] = lines[i].replace(htmlentity,                             htmlentity.substring(0, semicolonindex) +                                     htmlentity.substring(semicolonindex + 1)                                             .tolowercase());                 }             } else {                 // uppercase first letter of word , lowercase                 // rest of word                 lines[i] = lines[i].replace(matcher.group(),                          character.touppercase(matcher.group().charat(0)) +                                  matcher.group().substring(1).tolowercase());             }         }     }      system.out.println(arrays.tostring(lines)); } 

results:

[rythm&blues, .. don&apos;t wear white/live] 

Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -