perl - Index a continuous stream of documents in SOLR -

i need index thousands of documents continuously solr. documents picked redis queue ( ssdb) , indexed solr

whenever use library function $solr->add() there new http call made solr. there better mechanism index bulk items solr ?

there couple of points should consider:

first, suggest de-couple process of pulling data queue , indexing solr. mean have process pull data queue, massage correctly , write out solr json file. file should have multiple records; number determined size of each record , total number of items have in each batch. play around different settings.

then have process index files, can have multiple processes indexing these solr well. ensure de-queuing process not hung on indexing items solr; not quick.

next; evaluate library of choice. if have data local solr server, can pass path via request index or can stream on network. first approach faster, requires store json locally on solr box.

finally, review committing strategy. kind of delay acceptable? doing lot of faceting or fq queries? if delay alight , answer second question yes, suggest have aggressive hard commit , relaxed soft commit. if need new items picked quickly, consider lowering soft commit while testing performance until find balance.

hope helps.

Search This Blog

Brant

perl - Index a continuous stream of documents in SOLR -

Comments

Post a Comment

Popular posts from this blog

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -