performance - Java: Find similar strings -
i have java list (it map if necessary) lot of strings.
- list(hello,hell,car,cartoon,...)
i want find similar strings given string in efficient way.
i think should use levenshtein distance, don't want iterate through list.
do think idea divide main list in pieces common prefix?
then have map prefixes key , list value:
- hel -> list(hello,hell,...)
- car -> list(car,cartoon,...)
in way search strings same prefix searched one. apply levenshtein distance strings , not main-list.
is idea? thanks
you once calculate soundex code of every entry, , map soundex list of original word. soundex reducing code 1 single key similar sounding words.
map<string, set<string>> soundextowords = ... (string word : words) { string sdex = soundex(word); set<string> similarwords = soundextowords.get(sdex)); if (similarwords == null) { similarwords = new hashset<>(); soundextowords.put(sdex, similarwords); } similarwords.add(word); } set<string> similarwords(string word) { return soundextowords.get(soundex(word)); }
soundex typically 1 language, english, , english quite reducive.
Comments
Post a Comment