GET /_search
{
"query": {
"dis_max": {
"queries": [
{ "term": { "title": "iphone" } },
{ "term": { "body": "iphone" } }
],
// "tie_breaker": 0.7
}
}
}
This is the official document written about dis_max:
Returns documents matching one or more wrapped queries, called query clauses or clauses.
If a returned document matches multiple query clauses, the dis_max query assigns the document the highest relevance score from any matching clause, plus a tie breaking increment for any additional matching subqueries.
The first part is pretty clear that the highest score of the matching clause will be used instead of the sum when the tie_breaker is not defined. The last part about “plus a tie breaking increment” is more unclear to me.
hmm… Let’s just try playing around with the queries on our machine and see what happens then.
Said we have docs:
"Doc 1": {
"title": "iphone"
}
"Doc 2": {
"body": "iphone"
}
"Doc 3": {
"title": "iphone"
"body": "iphone 13"
}
You might get the scores:
Doc 1: 0.54265 (Match “title”)
Doc 2: 0.54265 (Match “body”)
Doc 3: 0.54265 (Match both “title” and “body”, but only the max score will be taken.)
Since Doc 3 is more relevant than the other two, we might want Doc 3 to rank higher with a higher score. That’s why we need to also introduce tie_breaker here. If you add tie_breaker=0.7 in the above query. The scores will become:
Doc 1: 0.54265 + 0*0.7 = 0.54265
Doc 2: 0.54265 + 0*0.7 = 0.54265
Doc 3: 0.54265 + 0.43211*0.7 = 0.84513
Now Doc 3 has a higher score than the other two when it also adds up the other clause(“body”: “iphone 13”) matching score.
We can see where the formula from by peaking at the Lucene’s source code(Line 2):
|
|