From RankNet to LambdaRank to LambdaMART: An Overview (2010)

https://www.microsoft.com/en-us/research/publication/from-ranknet-to-lambdarank-to-lambdamart-an-overview/

1 Introduction

2 RankNet

$s_i = f(x_i)$ and $s_j = f(x_j)$

$P_{ij} = P(U_i > U_j) = \frac{1}{1 + e^{-\sigma(s_i - s_j)}}$

$C = - \bar{P_{ij}} \log{P_{ij}} - (1 - \bar{P_{ij}})\log(1 - P_{ij})$

$\frac{\partial C}{\partial s_i} = \sigma(\frac{1}{2}(1 - S_{ij}) -\frac{1}{1 + e^{\sigma(s_i - s_j)}}) = -\frac{\partial C}{\partial s_j}$

$w_k \rightarrow w_k - \eta \frac{\partial C}{\partial w_k}$

2.1 Factoring RankNet: Speeding Up Ranknet Training

\[\frac{\partial C}{\partial w_k} = \frac{\partial C}{\partial s_i} \frac{\partial s_i}{\partial w_k} + \frac{\partial C}{\partial s_j} \frac{\partial s_j}{\partial w_k} = \sigma(\frac{1}{2}(1 - S_{ij}) -\frac{1}{1 + e^{\sigma(s_i - s_j)}})(\frac{\partial s_i}{\partial w_k}-\frac{\partial s_j}{\partial w_k})\] \[\frac{\partial C}{\partial w_k} = \lambda_{ij} (\frac{\partial s_i}{\partial w_k}-\frac{\partial s_j}{\partial w_k})\]

where

\[\lambda_{ij} = \frac{\partial C(s_i - s_j)}{\partial s_i} = \sigma(\frac{1}{2}(1 - S_{ij}) -\frac{1}{1 + e^{\sigma(s_i - s_j)}})\] \[\delta w_k = - \eta \sum_{\{i,j\} \in I}(\lambda_{ij} \frac{\partial s_i}{\partial w_k} - \lambda_{ij} \frac{\partial s_j}{\partial w_k}) = \eta \sum_{i} \lambda_i \frac{\partial s_i}{\partial w_k}\]

where

\[\lambda_i = \sum_{j:\{i,j\} \in I} \lambda_{ij} - \sum_{j:\{j,i\} \in I} \lambda_{ji}\]

3 Information Retrieval Measures

NDCG

4 LambdaRank

The key observation of LambdaRank is thus that in order to train a model, we don’t need the costs themselves: we only need the gradients (of the costs with respect to the model scores).

\[\lambda_{ij} = \frac{\partial C(s_i - s_j)}{\partial s_i} = \frac{-\sigma}{1+e^{\sigma(s_i - s_j)}}|\Delta_{NDCG}|\]

5 MART

GBDT(Gradient Boosting Decision Tree)

MART(Multiple Additive Regression Tree)

6 MART for Two Class Classification

7 LambdaMART

LambdaRank Gradient + MART