nlp - Corenlp document level multithreading -
i have 8 million wikipedia articles parse. want run 7 operations: tokenize,ssplit,pos,lemma,ner,parse,dcoref. each document taking approx 20 secs. in rate take months parse whole data set in single thread. there 'nthreads' option simultaneously parsing successive sentences. co-reference analyzer cannot work on single sentence level. can split documents in multiple buckets , run corenlp on each of them simultaneously resource hungry. there simpler way run multi-threaded corenlp @ document level (not sentence) ? (i have 100 gb ram , 50 cores).
Comments
Post a Comment