python - Using semantic word representation (e.g. word2vec) to build a classifier -


i want build classifier forum posts automatically categorize these posts defined categories(so multiclass classification not binary classification) using semantic word representations. task want make use of word2vec , doc2vec , check feasability of using these models support fast selection of training data classifier. @ moment have tried both models , work charm. however, not want manually label each sentence predict describing, want leave task word2vec or doc2vec models. so, question : algorithm can use in python classifier? ( thinking apply clustering on word2vec or doc2vec - manually label each cluster (this require time , not best solution). previously, made use of "linearsvc"(from svm) , onevsrestclassifier, however, labeled each sentence (by manually training vector "y_train" ) in order predict class new test sentence belong to. alghorithm , method in python use type of classifier(making use of semantic word representations train data)?

the issue things word2vec/doc2vec , on - usupervised classifier - uses context. so, example if have sentence "today hot day" , "today cold day" thinks hot , cold very similar , should in same cluster.

this makes pretty bad tagging. either way, there implementation of doc2vec , word2vec in gensim module python - can use google-news dataset's prebuilt binary , test whether meaningful clusters.

the other way try implement simple lucene/solr system on computer , begin tagging few sentences randomly. on time lucene/solr suggest tags clearfor document, , come out pretty decent tags if data not bad.

the issue here problem youre trying solve isnt particularly easy nor solvable - if have good/clear data, may able auto classify 80-90% of data ... if bad, wont able auto classify much.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -