In many scenarios, such as a google search or a product recommendation in an online shop, we have tons of data and limited space to display it. We cannot show all the products of an online shop to the user as a possible next best offer. Neither would a user want to scroll through all the pages indexed by a search engine to find the most relevant page that matches his search keywords. The most relevant content should be on top. Learning to rank (LTR) models are supervised machine learning models that attempt to optimize the order of items. So compared to classification or regression models, they do not care about exact scores or predictions, but the relative order. LTR models are typically applied in search engines, but gained popularity in other fields such as product recommendations as well.

There are 3 types of models: Pointwise, Pairwise and Listwise LTR models.

## Pointwise LTR

Pointwise LTR models optimize for **predicing a key metric**. For example, you rank product recommendations according to the highest probability that a user clicks on an item (classification models) or on the revenue a product creates (linear regression models).

The models you would use are the same as for classical classification or regression problems, e.g. a logistic regression or a support vector machine model. You can evaluate your model by looking at your usual metrics (accuracy, precision, recall, root mean squared error etc.) at a position k. For example how many of the relevant items made it in the top k?

## Pairwise LTR

Pairwise LTR models optimize the **relative order of pairs**. The idea is to compare the order of two items at a time. So as a first step, you have to create item pairs. The best model with the lowest loss has the maximum number of pairs in the correct order. So if the more relevant item is on the top, great! You would not add any loss. If the more relevant item is on the bottom position, this adds to the loss. You evaluate your model by looking at pairwise metrics, e.g. pairwise accuracy at position: The number of pairs in the correct order divided by the total number of pairs.

RankSVM is a model you could use. It applies the ideas of a classical Support Vector Machine Model and computes loss by minimizing the discordant pairs of the model with the perfect" order from the ground truth. In other words: it punishes item-pairs where the order is opposite from the ideal order.

## Listwise LTR

Listwise LTR models optimize the **total order of pairs**. So instead of looking at individual pairs, the entire resultset is evaluated. Listwise models are often applied, if there is a limited number of items that can be selected. For example, if you have only 50 products in your online shop and you want to display those in the best order. Listwise LTR models make sure that you find the optimal order for those 50 items.

For example you can use an XGboost model that optimizes for NDCG (Normalized Discounted Cumulative Gain). NDCG relies on three assumptions:

- The more revant an item, the more useful it is. So very relevant items are more useful than somewhat relevant items which are more useful than irrelevant items (cumulative gain).
- Highly relevant items are more useful when appearing in the higher positions.
- The result of the ranking should be irrelevant to the query performed (normalization).

Put differently, this approach makes sure, that the best items are on top and heavily penalize, if they are not.

Now you know the three basic approaches for solving ranking problems. The following table gives an overview with examples.

pointwise | pairiwse | listwise | ||
---|---|---|---|---|

Optimize | closeness to label | relative order of pairs | total order | |

Loss Function | Σ(y_pred - y)^2 | Σ max(0, 1 - score_1 - score_2 | e.g. DCG Σ y/(log_2(rank) + 1) | |

Evaluation metric | e.g. accuracy@k | e.g. pairwise accuracy | e.g. DCG@k, NDCG@k | |

Model | e.g. linear/logistic regression | e.g. RankSVM | e.g. XGboost |