MIT engineers have created Clio, a method that lets robots make smart choices like humans and remember only important details to complete tasks.
Imagine you need to clean a cluttered kitchen countertop scattered with different sauce packets. If you want to clear the counter, you might collect all the packets and sweep them off together. However, if you want to separate the mustard packets first, you would sort them by sauce type. If you’re specifically looking for Grey Poupon mustard, conduct a more detailed search to find this exact brand.
MIT engineers have developed a method named Clio, after the Greek muse of history, that enables robots to make intuitive, task-relevant decisions similar to humans. Clio allows a robot to process a list of tasks described in natural language and then determine the level of detail needed to interpret its environment effectively. The robot then “remembers” only the parts of a scene pertinent to the tasks at hand.
Open fields
The latest research has shifted toward “open-set” recognition, employing deep-learning techniques to develop neural networks capable of processing billions of internet-sourced images and corresponding descriptive texts, like a Facebook photo of a dog with the caption “Meet my new puppy!” Through exposure to millions of such image-text pairs, these neural networks learn to pinpoint scene elements linked to specific terms, enabling a robot to identify a dog in an entirely new setting. Despite these advancements, challenges persist in effectively parsing scenes in ways that are directly applicable to specific tasks.
Information bottleneck
The team’s approach combines computer vision with neural networks to analyze millions of open-source images and texts, using mapping tools to segment images for processing. They apply the “information bottleneck” principle from information theory to retain crucial segments for specific tasks. Their system, Clio, was tested in real-world applications such as organizing a cluttered apartment and assisting Boston Dynamics’ robot, Spot, in an office environment. Running in real-time on Spot’s onboard computer, Clio effectively identified and mapped target objects, enabling the robot to complete tasks efficiently.
Looking ahead, the team intends to enhance Clio’s capabilities to manage more complex tasks and incorporate the latest developments in photorealistic visual scene representations.