问题描述:

I want to construct a feature vector of each document from the lucene index.

I've also got a set of keywords, and want to construct a feature vector of them.

Then I will try to match the document according to the similarity of feature vectors of documents and keywords.

So, any hints on how lucene can help me address these three tasks?

Much thanks.

网友答案:

As bmargulies says, you can use Mahout. Here's some documentation on it: https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text#CreatingVectorsfromText-FromLucene

相关阅读:
Top