Human Guided Exploration (HuGE) facilitates rapid AI agent learning with assistance from humans despite potential human errors.
Training an AI agent in tasks like opening a kitchen cabinet often employs reinforcement learning, where a human expert develops and updates a reward function to guide the agent’s trial-and-error learning. This process, though effective, can be time-consuming and complex, especially for multi-step tasks.
Researchers from MIT, Harvard University, and the University of Washington have developed a new reinforcement learning method that relies on crowdsourced feedback from non-expert users worldwide. This approach allows the AI agent to learn more efficiently, overcoming the challenges of error-prone data that often impede similar methods.
Noisy feedback
The Human Guided Exploration (HuGE) method, developed for reinforcement learning, is innovative in utilising user feedback. Users are shown two images of states achieved by an AI agent and asked to choose the one closer to the goal, like a robot opening a cabinet versus a microwave. Unlike earlier methods where such binary, non-expert feedback directly optimised a reward function, often leading to errors, HuGE separates the process. It uses a goal selector algorithm, updated with human feedback, not as a reward but as guidance for the agent’s exploration. Simultaneously, the agent independently explores and collects data, refining the goal selector. This dual approach narrows the exploration field and allows asynchronous feedback, ensuring the agent continues learning without immediate feedback or amidst incorrect inputs, thus streamlining the learning process.
Faster learning
In testing their HuGE method, researchers conducted simulations and real-world experiments, using it to train robotic arms and navigate complex tasks like maze-solving and block-stacking. They gathered input from 109 non-expert users across multiple continents, finding that HuGE accelerated learning compared to other methods. Crowdsourced data proved more effective than synthetic data, with non-expert users labelling images or videos quickly. The research underscores the importance of aligning AI with human values, a crucial aspect in developing AI learning strategies.
In the future, the team intends to upgrade HuGE to learn from natural language and physical interaction with robots and to apply this method for teaching multiple agents simultaneously.