MIT researchers are teaching robots to adapt using language models, enabling them to handle unexpected situations and complete tasks.
Robots are being trained to perform increasingly complex household tasks, from wiping up spills to serving food, often through imitation, where they are programmed to copy motions guided by a human. While robots are excellent mimics, they may only know how to handle unexpected bumps and nudges if engineers program them to adjust accordingly, which could result in them having to restart their tasks from the beginning.
MIT engineers have integrated robot motion data with large language models (LLMs) to give robots common sense for off-path situations. This enables robots to break down tasks into subtasks and adjust to disruptions without restarting or needing explicit programming for every potential failure.
Language task
The researchers demonstrated a marble-scooping task involving a sequence of subtasks like reaching, scooping, and pouring. Without specific programming for each subtask, a robot nudged off course would have to restart. They explored using LLMs, which can process text to generate logical lists of subtasks, like “reach,” “scoop,” and “pour.” This approach could enable robots to self-correct in real time without extensive additional programming.
Mapping marbles
The team developed an algorithm to connect a robot’s physical position or image state with a subtask’s natural language label, known as “grounding.” This algorithm learns to automatically identify the robot’s semantic subtask, such as “reach” or “scoop,” based on its physical coordinates or image view. In experiments with a robotic arm trained on a marble-scooping task, the team demonstrated this approach by guiding the robot through the task and using a pre-trained LLM to list the steps involved.
The team then allowed the robot to carry out the scooping task using the newly learned grounding classifiers. As the robot progressed, experimenters pushed and nudged it off its path and knocked marbles off its spoon. Instead of stopping and starting over or continuing unquestioningly, the robot could self-correct and complete each subtask before moving on to the next.