问题描述:

Can someone help me with the syntax for hunpos tagging a corpus in nltk?

  1. What do I import for the hunpos.HunPosTagger module?

  2. How do I HunPosTag the corpus? See the code below.


import nltk

from nltk.corpus import PlaintextCorpusReader

from nltk.corpus.util import LazyCorpusLoader

corpus_root = './'

reader = PlaintextCorpusReader (corpus_root, '.*')

ntuen = LazyCorpusLoader ('ntumultien', PlaintextCorpusReader, reader)

ntuen.fileids()

isinstance (ntuen, PlaintextCorpusReader)

# So how do I hunpos tag `ntuen`? I can't get the following code to work.

# please help me to correct my python syntax errors, I'm new to python

# but i really need this to work. sorry

##from nltk.tag import hunpos.HunPosTagger

ht = HunPosTagger('english.model')

for sentence in ntu.sent() ##looping through the no. of sentence

ht.tag(ntusent()[i])

网友答案:
import nltk 
from nltk.tag.hunpos import HunposTagger
from nltk.tokenize import word_tokenize

corpus = "so how do i hunpos tag my ntuen ? i can't get the following code to work."
#please help me to correct my python syntax errors, i'm new to python 
#but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunposTagger('en_wsj.model')
print ht.tag(word_tokenize(corpus))

I feel like the problem is you're not tokenizing the words, but there are other reasons the code may not work (it's HunposTagger, not HunPosTagger). I made this simplified example from your question. If you have any more questions please post a comment.

I got everything from here: http://code.google.com/p/hunpos/

python hunpos.py

[('so', 'RB'), ('how', 'WRB'), ('do', 'VBP'), ('i', 'FW'), ('hunpos', 'NN'), ('tag', 'NN'), ('my', 'PRP$'), ('ntuen', 'NN'), ('?', '.'), ('i', 'FW'), ('ca', 'MD'), ("n't", 'RB'), ('get', 'VB'), ('the', 'DT'), ('following', 'JJ'), ('code', 'NN'), ('to', 'TO'), ('work', 'VB'), ('.', '.')]

相关阅读:
Top