He and his coworkers separated the process into two distinct stages, with each stage being guided by its own algorithm. They call their new support learning strategy Immense (Human Directed Investigation).
On one side, an objective selector calculation is consistently refreshed with publicly supported human criticism. Instead of serving as a reward, the feedback serves to direct the agent’s exploration. In a way, the non-expert users leave breadcrumbs for the agent to follow as it progresses toward its goal.
On the opposite side, the specialist investigates all alone, in a self-regulated way directed by the objective selector. It gathers pictures or recordings of activities that it attempts, which are then shipped off people and used to refresh the objective selector.