Optimal Design for Reward Modeling in RLHF
Preprint version, October 24, 2024
Authors: A.Scheid, E. Boursier, A. Durmus, M. Jordan, P. Menard, E. Moulines, M. Valko
Preprint version, October 24, 2024
Authors: A.Scheid, E. Boursier, A. Durmus, M. Jordan, P. Menard, E. Moulines, M. Valko