Optimal Design for Reward Modeling in RLHF

Published:

Direct Link