From RankNet to LambdaRank to LambdaMART: An Overview (2010)
From RankNet to LambdaRank to LambdaMART: An Overview (2010)
1 Introduction
2 RankNet
$s_i = f(x_i)$ and $s_j = f(x_j)$
$P_{ij} = P(U_i > U_j) = \frac{1}{1 + e^{-\sigma(s_i - s_j)}}$
$C = - \bar{P_{ij}} \log{P_{ij}} - (1 - \bar{P_{ij}})\log(1 - P_{ij})$
$\frac{\partial C}{\partial s_i} = \sigma(\frac{1}{2}(1 - S_{ij}) -\frac{1}{1 + e^{\sigma(s_i - s_j)}}) = -\frac{\partial C}{\partial s_j}$
$w_k \rightarrow w_k - \eta \frac{\partial C}{\partial w_k}$
2.1 Factoring RankNet: Speeding Up Ranknet Training
\[\frac{\partial C}{\partial w_k} = \frac{\partial C}{\partial s_i} \frac{\partial s_i}{\partial w_k} + \frac{\partial C}{\partial s_j} \frac{\partial s_j}{\partial w_k} = \sigma(\frac{1}{2}(1 - S_{ij}) -\frac{1}{1 + e^{\sigma(s_i - s_j)}})(\frac{\partial s_i}{\partial w_k}-\frac{\partial s_j}{\partial w_k})\] \[\frac{\partial C}{\partial w_k} = \lambda_{ij} (\frac{\partial s_i}{\partial w_k}-\frac{\partial s_j}{\partial w_k})\]where
\[\lambda_{ij} = \frac{\partial C(s_i - s_j)}{\partial s_i} = \sigma(\frac{1}{2}(1 - S_{ij}) -\frac{1}{1 + e^{\sigma(s_i - s_j)}})\] \[\delta w_k = - \eta \sum_{\{i,j\} \in I}(\lambda_{ij} \frac{\partial s_i}{\partial w_k} - \lambda_{ij} \frac{\partial s_j}{\partial w_k}) = \eta \sum_{i} \lambda_i \frac{\partial s_i}{\partial w_k}\]where
\[\lambda_i = \sum_{j:\{i,j\} \in I} \lambda_{ij} - \sum_{j:\{j,i\} \in I} \lambda_{ji}\]3 Information Retrieval Measures
NDCG
4 LambdaRank
The key observation of LambdaRank is thus that in order to train a model, we don’t need the costs themselves: we only need the gradients (of the costs with respect to the model scores).
\[\lambda_{ij} = \frac{\partial C(s_i - s_j)}{\partial s_i} = \frac{-\sigma}{1+e^{\sigma(s_i - s_j)}}|\Delta_{NDCG}|\]5 MART
GBDT(Gradient Boosting Decision Tree)
MART(Multiple Additive Regression Tree)
6 MART for Two Class Classification
7 LambdaMART
LambdaRank Gradient + MART