regex - How to get sentences from a paragraph with custom list of words in Python -
i trying read paragraph , capture sentences in words matching dynamic list of words.
the python pre-processing steps identify list of words. want use list of words , identify sentences in paragraph has @ least 1 of words list. identified sentences appended new variable.
input: "machine learning science of getting computers act without being explicitly programmed. machine learning pervasive today use dozens of times day without knowing it. many researchers think best way make progress towards human-level ai."
list of words: computer, researcher
output: machine learning science of getting computers act without being explicitly programmed.many researchers think best way make progress towards human-level ai.
what best way accomplish ?
based partially on this answer:
import nltk tokenizer = nltk.data.load('tokenizers/punkt/english.pickle') text = "machine learning science of getting computers act without being explicitly programmed. machine learning pervasive today use dozens of times day without knowing it. many researchers think best way make progress towards human-level ai." word_list = ['computer', 'researcher'] output_list = [] sentence in tokenizer.tokenize(text): word in word_list: if word in sentence: output_list.append(sentence) break # useful when word_list large
you need run nltk.download()
beforehand , download punkt
in models
tab.
Comments
Post a Comment