问题描述:

i need to use porter stemmer algorithm to get stem word in my application,but when i test the algorithm which i get from http://www.tartarus.org/~martin/PorterStemmer, the result of stemming isn't give me correct stem word, eg :

happy --> happi

virus --> viru

etc

can you help me to solve it?

网友答案:

Quoting from your link:

2. Why is the stemmer not producing proper words?

It is often taken to be a crude error that a stemming algorithm does not leave a real word after removing the stem. But the purpose of stemming is to bring variant forms of a word together, not to map a word onto its ‘paradigm’ form.

And connected with this,

3. Why are there errors?

The question normally comes in the form, why should word X be stemmed to x1, when one would have expected it to be stemmed to x2? It is important to remember that the stemming algorithm cannot achieve perfection. On balance it will (or may) improve IR performance, but in individual cases it may sometimes make what are, or what seem to be, errors. Of course, this is a different matter from suggesting an additional rule that might be included in the stemmer to improve its performance.

相关阅读:
Top