Optimal Design for Reward Modeling in RLHF

Preprint version, October 24, 2024

Authors: A.Scheid, E. Boursier, A. Durmus, M. Jordan, P. Menard, E. Moulines, M. Valko

Direct Link