来源:互联网 时间:1970-01-01


  1. Lucene的算分模型
  2. Boost


1. tf - Term Frequency. The frequency with which a term appears in a document. Given a search query, the higher the term frequency, the higher the document score.

2. idf - Inverse Document Frequency. The rarer a term is across all documents in the index, the higher it's contribution to the score.

3. coord - Coordination Factor. The more query terms that are found in a document, the higher it's score.

coord is the coordination factor - if there are multiple terms in a query, the more terms that match, the higher the score.

4. fieldNorm - Field length. The more words that a field contains, the lower it's score. This factor penalizes documents with longer field values. In another word, matches on a smaller field score higher than matches on a larger field

Boost可以分为index-time boost和query-time boost:

Index-time boosts are applied when adding documents, and apply to the entire document or to specific fields.

Query-time boosts are applied when constructing a search query, and apply to specific fields.