nlp - Corenlp document level multithreading -


i have 8 million wikipedia articles parse. want run 7 operations: tokenize,ssplit,pos,lemma,ner,parse,dcoref. each document taking approx 20 secs. in rate take months parse whole data set in single thread. there 'nthreads' option simultaneously parsing successive sentences. co-reference analyzer cannot work on single sentence level. can split documents in multiple buckets , run corenlp on each of them simultaneously resource hungry. there simpler way run multi-threaded corenlp @ document level (not sentence) ? (i have 100 gb ram , 50 cores).


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -