elasticsearch - how edge ngram token filter differs from ngram token filter? -


as new elastic search, not able identify difference between ngram token filter , edge ngram token filter.

how these 2 differ each other in processing tokens?

i think documentation pretty clear on this:

this tokenizer similar ngram keeps n-grams start @ beginning of token.

and best example ngram tokenizer again comes documentation:

curl 'localhost:9200/test/_analyze?pretty=1&analyzer=my_ngram_analyzer' -d 'fc schalke 04'       # fc, sc, sch, ch, cha, ha, hal, al, alk, lk, lke, ke, 04 

with tokenizer definition:

                    "type" : "ngram",                     "min_gram" : "2",                     "max_gram" : "3",                     "token_chars": [ "letter", "digit" ] 

in short:

  • the tokenizer, depending on configuration, create tokens. in example: fc, schalke, 04.
  • ngram generates groups of characters of minimum min_gram size , maximum max_gram size input text. basically, tokens split small chunks , each chunk anchored on character (it doesn't matter character is, of them create chunks).
  • edgengram same chunks start beginning of each token. basically, chunks anchored @ beginning of tokens.

for same text above, edgengram generates this: fc, sc, sch, scha, schal, 04. every "word" in text considered , every "word" first character starting point (f fc, s schalke , 0 04).


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -