Learning Anities From Multiple Type of Feedbacks

Get Complete Project Material File(s) Now! »

On predicting sets coherence

The main objective of predicting sets coherence is to suggest a new item for a user to add to her cart, that is to increase the value of users already well engaged with the brand but can also be used to generate set of items to be consumed together like a music playlist. This particular recommender systems problem belongs to the more general framework of session-based recommender systems which de ne all recommender systems designed at the session level. Session-based recommender systems have received more and more attention in the recent years, for instance in 2015 the RecSys Challenge was dedicated to session-based recommender systems. Unlike a nity based algorithms whose goal is to present one or several items that the user could buy independently, in basket completion the notion of set of items is crucial in order to present an item that is coherent with the items already in the basket since the goal is that the whole set will be purchased together. Three main categories exist to tackle the basket completion issue: associative classi cation, collaborative ltering variations and Determinant Point Processes. Besides, more recently deep learning techniques have been developed using recurrent neural network ([Hidasi et al., 2015]).

Associative Classi cation

Associative Classi cation ([Agrawal et al., 1993, Hipp et al., 2000, Liu et al., 1998]) are simple methods that are based on association rules that compute sets supports to perform recommendation. The general idea is to count co-occurences of two sets using all past observed baskets in order to nd the most frequent associations compared with the occurence of one of the two sets. Then, if this occurence ratio is high enough, the second set is considered to be a good recommendation when the rst set is in a user basket. However, relying only on such a ratio may reduce to recommending only must popular items. Additional criteria are then de ned to produce di erent rules that will de ne what to recommend. Such approaches have several strong limits: the di erent criteria are arbitrarily de ned and require a manual choice of thresholds, like the minimal value of the two sets co-occurence over one set occurence ratio, the impossibility to scale to large dataset and nally the necessity to update all previously computed rules whenever a new basket is observed, which can be very time consuming.

Adaptation of Collaborative Filtering

In order to scale to large item catalogs, a solution can consist in using collaborative ltering by considering each basket as one unique user. However in applications where baskets are very small (for instance in e-commerce it is very rare that people buy more than three products at the same time) the extreme sparsity of the matrix can prevent from learning correclty. Another solution consists in using the item latent factors obtained by the collaborative ltering algorithm in order to compute item-to-item similarities using for instance the cosinus between the two items. Then the most similar item is proposed to the user. The limit of such an approach is that recommending similar items may not be relevant since people are more likely to buy complementary items instead of two very similar ones. To that extent, people often add constraints to the recommendation like forcing the additional item to recommend to be in a di erent category than the initial item ([Teo et al., 2016]).

Determinantal Point Processes

More advanced methods based on Determinantal Point Processes (dpps) ([Gartrell et al., 2017, Gartrell et al., 2016]) have shown very good accuracy and im-proved diversity in the recommendation. Determinantal Point Processes are proba-bility measure over set of points de ned by the determinant of a kernel matrix of the size of the item catalogs where each entry (i; j) of the kernel matrix encodes the sim-ilarity between product i and j. Based on past observed baskets, the kernel matrix of the dpp is learned, using for instance maximimization of the likelihood, and items to recommend are sampled from the obtained probability distribution. In order to scale to large item catalogs, [Gartrell et al., 2017, Gartrell et al., 2016] assumed a low rank constraint on the kernel matrix. This assumption proved to be both more e ective to learn and predict and also produced more accurate recommendations.

E ective exploration-exploitation with bandit learning

When future observation directly depends on the output of the system, one must take care of the exploration vs exploitation dilemma in order to collect new information while also providing accurate recommendation. Besides in recommender systems applications, new users and new items arrive frequently and exploiring those users and items must be done appropriately. Recommender systems then leverage results and solutions from multi-arm bandit to solve this issue.
The exploration-exploitation dilemma is studied in the reinforcement learning theory and more speci cally in the multi-arm bandit theory. Concretely the problem is to select the next action a (called arm, which in our case corresponds to an item) within a set of possible actions A in order to maximize the cumulative reward (i.e., the sum of the previously collected rewards) on the long term rather than maximize the immediate next step reward. Each arm is characterized by an unknown distribution and at each step the system observes the reward of the pulled arm. A simple solution to introduce exploration in the system is to pull an arm at random a small fraction of the time and pull the best expected arm the rest of the time. This strategy is called « -greedy where » stands for the small fraction of time that random pulling is used, in practice » 1%. A more advanced solution is to use the Upper Con dence Bound (ucb) algorithm ([Auer, 2003]) that follows the optimism in face of uncertainty principle to select the next arm that increases the chance to select an arm with a lot of uncertainty. Intuitively when pulling the arm which in the best case scenario has the highest reward, there are two possible outcomes: either we did select the best arm and so we are good, either we did not and we gain knowledge on a good candidate. On the contrary a greedy strategy could stop pulling the best arm because of a few bad results in the rst attempts. More formally, the ucb algorithm pulls at time t the arm It that follows equation 2.3. It = arg a2A a;s r 2s 3 log t max ^ + (2.3) where s is the number of times arm a has been pulled before time t and ^a;s is the empirical mean rewards observed by pulling arm a. This equation comes from a high-probability upper bound on the expected reward of each arm. See gure 2.2 for an illustration of ucb state at a given time step. A similar approach can be developed in a bayesian framework using Thompson Sampling ([Kaufmann et al., 2012, Russo et al., 2017]). In such a framework one assumes that the reward obtained by pulling an arm follows a certain distribution with unknown parameters. At each time step, for each arm, a reward is sampled according to the current estimation of the distribution, and the arm with the highest sampled reward is pulled. Then its distributions parameters are updated following Bayes’ rule, and one moves to the next step.
In this multi-arm bandit framework all arms are independent from each other, thus pulling one arm does not provide any information on the other arms perfor-mance which can be an issue when dealing with a lot of arms. Linear bandit allows to overcome this problem.

READ Human development in social and socio-cultural learning milieus

Recommender Systems as matrix completion

State-of-the-art methods focus on the problem of lling the sparse input matrix R 2 Rn p, with n the number of users and p the number of items, see gure 3.1, according to the known entries of the matrix. Each row represents a user u 2 U = : : (1; : : : ; n) = [n], each column an item i 2 I = (1; : : : ; p) = [p] and each entry ru;i, if available, the user-item a nity. We denote D = f(u; i) such that ru;i is known g U I. In practice the matrix is highly sparse, for instance the matrix on the Net ix challenge has only 1:2% available entries. The matrix completion problem in recommender system is particularly di cult because unlike image completion one items users = × can not use local information since the rows and columns ordering is random and available input is not uniformly distributed. Indeed some items are very popular and will have been rated by many users when some others will be almost never rated which can results in having some blank area very di cult to ll appropriately. This behaviour is called MNAR for Missing Not At Random. Learning is then performed by minimizing some distance between the model predictions and the available entries. However some structure assumptions on the matrix are required otherwise any matrix that has the exact same values at the known coordinates and random values elsewhere will have exactly zero error but no good completion can be expected. The more realistic the regularization is, the better the performance of the algorithm. A usual assumption is that the input matrix is low rank, which means that there exist some correlation among users and among items. Consequently the matrix can be factorized into two sub matrices U and V such that R = U V T where U 2 Rn K and V 2 Rp K has as many columns as the assumed rank K of R, see gure 3.2. U contains users latent descriptors and V items latent descriptors. The low rank decomposition has the advantage to reduce the memory since (n + p)K (with K n; p) values need to be stocked instead of np. Indeed the full matrix has never to be computed, since to decide what to recommend to one user, only its latent factors and the items latent factors are required. The problem then falls back to learning the two matrices U and V . Historically, matrix factorisation is learned through Singular Value Decomposition (svd) where only largest eigenvalues are kept, which also allows to reduce the noise that can lie in the small eigenvalues. However such techniques are not really reliable for applications with sparse matrices. Consequently, the input matrix have to be lled with arbitrary values, often zero or the mean of the available values, before applying svd, which can lead to poor performance. A better approach consists in learning the factorization using only values in D like als-wr.
als-wr [Zhou et al., 2008] – Alternating Least Square with Weighted- – regularization is one of the most popular collaborative ltering algorithm thanks to its high scalability and the available machine learning libraries that implement this method like mahout or spark. als-wr relies on two main ideas. First, since some users provided more feedback than other and likewise some items received more feedback than others, the regularization should not be the same for each user and item but should rather depend on this number of feedback. With nu and ni respectively the number of feedback provided by user u and provided for item i, the objective function is then.

Higher Order Alternating Least Squares

Hoals is a tensor completion algorithm suitable for Tucker or CP decomposition. While the algorithm applies for any n-order tensor, we describe it for n = 3 for the sake of simplicity. The main idea of Hoals is that the decomposition of an n-order tensor can be parallelized over each of its modes independently, thus reducing it to a series of matrix decomposition operations. Furthermore, each matrix decomposi-tion can be further parallelized across each component. The resulting process can be easily split over multiple machines, whose jobs are completely independent and whose results are merged only at the last step. Merging only at the last step also has the additional advantage of not having to synchronize the machine that is wait other machines to nish to continue.
A recommender system is provided with a tensor R which is only partially lled with the feedback in D (i.e., feedback for triples (user, item, action)) and the ob-jective is to construct a full tensor Rb with entries that are as close as possible to the « true » ones (i.e., the feedback a user would give to an item for a speci c action). Similar to the case of matrix completion, this problem is ill-posed (i.e., all tensors Rb with the same entries as R in D are equivalent) unless constraints on the structure of the tensor Rb are introduced. In particular, we introduce rank-constraints in the decomposition of Rb either following Tucker or CP.

Table of contents :

1 Introduction
1.1 Overview of Recommender Systems
1.2 Limitations and open questions
1.3 Overview of the dissertation
1.4 From an algorithm to a recommendation system
2 Recommender Systems
2.1 Machine Learning in Recommender Systems
2.1.1 On predicting user-item anity
2.1.2 On predicting sets coherence
2.1.3 Eective exploration-exploitation with bandit learning
2.2 Measure, performance and open datasets
2.2.1 Measure and performance
2.2.2 Available datasets
3 Learning Anities From Multiple Type of Feedbacks
3.1 Introduction
3.2 Background
3.2.1 Recommender Systems as matrix completion
3.2.2 Recommender systems as tensor completion
3.3 Higher Order Alternating Least Squares
3.4 Algorithm guarantee
3.5 Experiments
3.5.1 Testing accuracy
3.5.2 Testing scalability
3.6 Related Works
3.6.1 Collaborative Filtering
3.6.2 Tensor Factorization
3.6.3 Transfer Learning
3.6.4 Cold-start
3.7 Industrial learnings
3.8 Conclusion
4 Basket completion with multi-task DPP
4.1 Introduction
4.2 Background
4.3 Model
4.3.1 Logistic DPP
4.3.2 Multi-Task DPP
4.3.3 Prediction
4.4 Experiments
4.4.1 Accuracy
4.4.2 Capturing directed completion rules
4.5 Related Work
4.5.1 Basket Completion
4.5.2 Determinantal Point Processes
4.5.3 Diversity
4.6 Industrial learnings
4.7 Conclusion
5 Fighting Boredom in RS with linear RL
5.1 Introduction
5.2 Background
5.2.1 Linear bandit
5.2.2 Reinforcement Learning
5.3 Problem Formulation
5.4 Model Validation on Real Data
5.5 Linear Upper-Condence bound for Reinforcement Learning
5.6 Experiments
5.7 Related Work
5.7.1 Non stationarity in multi-arm bandit
5.7.2 Non stationarity in recommender systems
5.7.3 Reinforcement Learning
5.8 Industrial learnings
5.9 Conclusion
6 Conclusion
6.1 Contributions
6.2 Future Works
A On Convergence of Matrix Completion
A.1 Background
A.2 Nuclear norm minimization
A.3 Alternating Least Square
A.4 Other Approaches
A.5 Conclusion .