database - Unique Filter to Elastic Search Column not working (duplicate items inserted) -
i've modified contactnumber
field have unique
filter
by updating index settings follows
curl -xput localhost:9200/test-index2/_settings -d ' { "index":{ "analysis":{ "analyzer":{ "unique_keyword_analyzer":{ "only_on_same_position":"true", "filter":"unique" } } } }, "mappings":{ "business":{ "properties":{ "contactnumber":{ "analyzer":"unique_keyword_analyzer", "type":"string" } } } } }'
a sample item looks this,
doc_type:"business" contactnumber:"(+12)415-3499" name:"sam's pizza" address:"somewhere on earth"
the filter not work, duplicate items inserted, i'd no 2 documents having same contactnumber
in above, i've set only_on_same_position
-> true
existing duplicate values truncated/deleted
what doing wrong in settings?
that's elasticsearch couldn't out of box... need make uniqueness functionality available in app. idea can think of have phone number _id
of document , whenever insert/update es use contactnumber
_id
, associate document 1 exists or create new one.
for example:
put /test-index2 { "mappings": { "business": { "_id": { "path": "contactnumber" }, "properties": { "contactnumber": { "type": "string", "analyzer": "keyword" }, "address": { "type": "string" } } } } }
then index something:
post /test-index2/business { "contactnumber": "(+12)415-3499", "address": "whatever 123" }
getting back:
get /test-index2/business/_search { "query": { "match_all": {} } }
it looks this:
"hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "test-index2", "_type": "business", "_id": "(+12)415-3499", "_score": 1, "_source": { "contactnumber": "(+12)415-3499", "address": "whatever 123" } } ] }
you see there _id
of document phone number itself. if want change or insert document (the address different, there new field - whatever_field
- contactnumber
same):
post /test-index2/business { "contactnumber": "(+12)415-3499", "address": "whatever 123 456", "whatever_field": "whatever value" }
elasticserach "updates" existing document , responds with:
{ "_index": "test-index2", "_type": "business", "_id": "(+12)415-3499", "_version": 2, "created": false }
created
false
, means document has been updated, not created. _version
2
again says document has been updated. , _id
phone number indicate document has been updated.
looking again in index, es stores this:
"hits": [ { "_index": "test-index2", "_type": "business", "_id": "(+12)415-3499", "_score": 1, "_source": { "contactnumber": "(+12)415-3499", "address": "whatever 123 456", "whatever_field": "whatever value" } } ]
so, new field there, address has changed, contactnumber
, _id
same.
Comments
Post a Comment