Recommendation systems in multi-stakeholder environments often require optimizing for multiple objectives simultaneously to meet supplier and consumer demands. Serving recommendations in these settings relies on efficiently combining the objectives to address each stakeholder’s expectations, often through a scalarization function with pre-determined and fixed weights. In practice, selecting these weights becomes a consequent problem. Recent work has developed algorithms that adapt these weights based on application-specific needs by using RL to train a model. While this solves for automatic…