A lot has happened since 2009—the year when Pranav Mistry, the Indian technology wizard who is currently working as a researcher at the MIT Media Lab, US, unveiled the ‘SixthSense’ technology.
‘SixthSense’ is a wearable gestural interface that allows users to project the digital information existing on the World Wide Web onto any surface around them and use natural hand gestures to interact with that information.
Vandana Sharma of EFY Bureau caught up with Mistry to know where the future of computing is headed, what India needs to do to come up with path-breaking and life-transforming innovations and a lot more…
FEBRUARY 2012: Q. What led you to work on the SixthSense?
A. From the very beginning, I wondered what the future of computing would be like and how we will interact with the digital information space, which has hitherto remained confined to the rectangular screens of our mobile phones, laptops and tablets. I always used to think why can’t this model be broken?
More than this, I thought it would be interesting to use the real world as the interaction space with the digital world. Before the SixthSense project, there were many other projects that I undertook to achieve this end, but probably my approach was not right. However, gradually things began to fall in place.
Q. SixthSense looks no less than magic. Could you demystify it in simple words?
A. To develop the SixthSense interface, I used a combination of very simple hardware components comprising a camera, sensors, an Internet-enabled mobile device and a projector.
In the latest version of the interface, I am using a laser projector with a laser diode inside, which can project on any surface. Technically, one interesting thing about a laser projector is that it never goes out of focus. Since the application that I have suggested in the interface requires the user to wear a projector on his body, the laser projector becomes advantageous as the user doesn’t have to adjust the focus.
So hardware-wise it is very simple. The plus point of these hardware components is that they are cheap and going smaller and smaller every month, leave alone a year.
If you view the video presentation I made at TED (http://bit.ly/2GDYFj), you will observe that I am just making a gesture of taking a picture and the picture is actually getting clicked. To do this, the system needs to understand the gesture the user is making. So the key intelligence that has gone in SixthSense is actually in gathering the understanding of the scene and deciding what to project, where to project, what is in front of the user, what kind of gestures the user is making, etc.
All this intelligence comes from the computer-vision software and machine-learning technology. The camera also sees what the user sees. It not only captures the gestures but also the scene and objects around it. Like, if the user holds a book in his hands, the camera matches the cover of the book with the cover of thousands of books available online. Once a match is struck, it can tell you the price, user reviews and also whether your friends already have a copy of the book or not.
Q. How does the device search over the Internet?
A. The device is connected to the cloud. It uses a lot of search engine application program interfaces (APIs) like Amazon APIs. As it connects you to the Internet world, it enables access to all the dynamic information/data while you continue to be in the physical world.
Of course, the device doesn’t always make use of the Internet. It uses many of the software available on it and the mobile phone. For example, it can take pictures without going to the Internet. It can save and modify pictures, zoom in, zoom out and do a lot more.