Showing a user two photos of the states the agent achieved and then asking the user which state is closer to a goal is one method for gathering user feedback for reinforcement learning. For example, maybe a’s robot will probably open a kitchen cupboard. In one image, the robot might be seen opening the cabinet, while in the other, it might be seen opening the microwave. The picture of the “better” state would be selected by a user.
Some previous methods attempt to optimize a reward function that the agent would use to learn the task by utilizing this crowdsourced, binary feedback. Nonetheless, on the grounds that nonexperts are probably going to commit errors, the prize capability can turn out to be extremely boisterous, so the specialist could stall out and never arrive at its objective.
“Fundamentally, the specialist would go over the top with the prize capability. It would try to perfectly match the reward function. Therefore, we simply use the reward function to inform the robot of which areas it ought to investigate rather than directly optimizing over it,” Torne states.