Thursday, November 21, 2024

Smarter Robots Learn To Adapt and Complete Tasks

- Advertisement -

MIT researchers are teaching robots to adapt using language models, enabling them to handle unexpected situations and complete tasks.

In this collaged image, a robotic hand tries to scoop up red marbles and put them into another bowl while a researcher’s hand frequently disrupts it. The robot eventually succeeds.
Credits:Image: Jose-Luis Olivares, MIT. Stills courtesy of the researchers
In this collaged image, a robotic hand tries to scoop up red marbles and put them into another bowl while a researcher’s hand frequently disrupts it. The robot eventually succeeds.
Credits:Image: Jose-Luis Olivares, MIT. Stills courtesy of the researchers

Robots are being trained to perform increasingly complex household tasks, from wiping up spills to serving food, often through imitation, where they are programmed to copy motions guided by a human. While robots are excellent mimics, they may only know how to handle unexpected bumps and nudges if engineers program them to adjust accordingly, which could result in them having to restart their tasks from the beginning.

MIT engineers have integrated robot motion data with large language models (LLMs) to give robots common sense for off-path situations. This enables robots to break down tasks into subtasks and adjust to disruptions without restarting or needing explicit programming for every potential failure.

- Advertisement -

Language task

The researchers demonstrated a marble-scooping task involving a sequence of subtasks like reaching, scooping, and pouring. Without specific programming for each subtask, a robot nudged off course would have to restart. They explored using LLMs, which can process text to generate logical lists of subtasks, like “reach,” “scoop,” and “pour.” This approach could enable robots to self-correct in real time without extensive additional programming.

Mapping marbles

The team developed an algorithm to connect a robot’s physical position or image state with a subtask’s natural language label, known as “grounding.” This algorithm learns to automatically identify the robot’s semantic subtask, such as “reach” or “scoop,” based on its physical coordinates or image view. In experiments with a robotic arm trained on a marble-scooping task, the team demonstrated this approach by guiding the robot through the task and using a pre-trained LLM to list the steps involved.

The team then allowed the robot to carry out the scooping task using the newly learned grounding classifiers. As the robot progressed, experimenters pushed and nudged it off its path and knocked marbles off its spoon. Instead of stopping and starting over or continuing unquestioningly, the robot could self-correct and complete each subtask before moving on to the next.

Nidhi Agarwal
Nidhi Agarwal
Nidhi Agarwal is a journalist at EFY. She is an Electronics and Communication Engineer with over five years of academic experience. Her expertise lies in working with development boards and IoT cloud. She enjoys writing as it enables her to share her knowledge and insights related to electronics, with like-minded techies.

SHARE YOUR THOUGHTS & COMMENTS

EFY Prime

Unique DIY Projects

Electronics News

Truly Innovative Electronics

Latest DIY Videos

Electronics Components

Electronics Jobs

Calculators For Electronics