Device makers are focusing on making user-interfaces easy, clean and intuitive. As more and more technology surrounds our life, it should blend with the environment, be less blatant and mingle naturally with users
Janani Gopalakrishnan Vikram
JULY 2012: It is easier to interact with humans than with machines. The reason is simple: Even if the person at the other end speaks a different language, we will exercise our intelligence to understand what he is trying to say, just as he tries to put his point forth using his body language. So the communication effort is two-way. However, with machines you have to communicate in a precise manner that they will understand. You cannot say just anything to them and leave their software to do the interpretation. One wrong command can send things helter-skelter.
Hope is in sight. A lot of human-machine interaction experts around the world are working on simplifying our interactions with machines. They are coming up with innovative user-interfaces that are easier, more natural, intuitive and human.
It is becoming possible to give instructions verbally to computers and mobile phones, express your instructions as simple gestures, or even make your machines understand what you are thinking. Speech, gestures and thought are promising to be the future of human-machine interfacing.
While the medium of interface is one key trigger for innovation in the area of user-interfaces, there are other interesting motives as well. These include making user-interfaces that consume and dissipate less energy, are smaller, lighter and less harmful to the environment, use more natural materials, and meet the special needs of challenged users or specific applications. Designers are also trying to make user interfaces that are less conspicuous and blend with the environment. Otherwise, a future scenario filled with computers (the vision of ubiquitous computing) would look like a computer science lab!
“Good user interfaces are crucial for good user experience. A technology, no matter how good, will hardly reach a breakthrough if designers don’t manage to make the user-interface as intuitive and attractive as possible. To be interested in a new product or technology, users need to understand its advantages or find themselves impressed or involved,” says Rajeev Karwal, CEO and founder, Milagrow Business & Knowledge Solutions.
“At Milagrow, we are constantly studying the consumers in their own environment. Our product and R&D team has to spend at least two days every month with the consumers using our current or future products. We are focusing on interface with robotic products, convergence electronics and mobile products,” Karwal adds.
There is rising awareness about the importance of user-interfaces and many related innovations such as SixthSense, LG Magic Wand remote control, Microsoft Surface, Prezi, Air Glove, Samsung’s Smart TV range and Tan Le’s EPOC headset. In fact, user experience and new user interfaces were the underlying theme of Consumer Electronics Show 2012. Shawn DuBravac, research director of CES, predicted on the eve of the event that 2012 would be the ‘year of the interface.’ This fits in with his other predictions about technology moving into the background, becoming less visible to the user and morphing with the environment.
“In the last few years, we have seen apps like email, Twitter and Facebook on consumer electronics devices like TVs,” DuBravac said. “Manufacturers wanted to show these properties because it said to consumers that the product is connected. But the experience is generally poor. The next focus will be on improving the user experience.”
DuBravac also noted that companies will replace complexity with simplicity, and also that natural interfaces like gesture and voice control will show up in more and more devices such as tablets during 2012.
User interface is a key area of innovation today as it can add value to an existing device, meet the special needs of a new device and make life easier for users. Here let us look at some recent innovations in this space.
Wave a wand or chant a spell
Last year was the year of touch with devices sporting touchscreens and multi-touch capabilities ruling the roost. Computer operating systems such as Windows 8 and Mac OS X Lion too feature a touchscreen-inspired interface as against the traditional desktop environment. This shows that touch is gradually becoming the main interface for not just mobile devices but primary computing devices too.
Now device makers seem to be going one step ahead, towards friction-less and more natural user interfaces such as gesture and voice. Microsoft Kinect, Apple’s iPad (since October 2011) and iPhone (with Siri) are some brilliant examples. A lot of smartphone makers are including voice-enabled search and other such features in their phones. More recently, Samsung and other consumer electronics companies have also entered the smart interaction game.
Samsung’s new PNE8000 plasma series, and the UNES7500 and UNES8000 LED-based LCDs TVs have a built-in camera and microphone, so you can control the television by speaking or gesturing to it. Two remotes are also included. One is a standard multi-button type, while the other is a touchpad with a few buttons and a microphone.
But, voice training is a painful exercise with most voice-enabled devices like these. The system takes too long to train to the user’s voice. Repeated instructions to raise the volume might end up opening Skype—found a reviewer!
Gesture control appears to be a bit better. In the Samsung model, it is activated by holding a hand up in front of the TV. Then, you need to move your hand to guide a cursor, grab with your fist to make a selection, or make an anti-clockwise circular motion to undo. However, you need to have perfect, shadow-free lighting for gesture control.
LG has been showing off its magic remote controls for quite a few years now, every year adding more features to it. Last year, the company demonstrated gesture control for televisions. LG’s remote works similar to the Nintendo Wii, where users can use certain hand and arm gestures to operate the onscreen menus. For example, if you gesture a ‘V’ in the air, it brings up a list of recently watched videos. A wave of the remote can also cause the TV to switch from 2D to 3D and vice versa.
This year, the company has added voice recognition for text input, a scroll wheel and more magic gestures. The voice control system allows you to search for TV and Web content, and also access social media content. The voice features are powered by a platform from Nuance—the maker of Dragon Dictate, a popular Mac tool.
Both Samsung and LG have made a decent attempt at gesture and voice control but a lot more fine-tuning is needed before this goes mainstream.
Speaking of remotes, Canonical has launched a remote-less television called Ubuntu TV. Dubbed the television for human beings, Ubuntu TV has no remote. Users can work with touch or gesture control, or use their smartphone as a remote. The TV runs on the Ubuntu Linux operating system. It includes most Internet TV features like DVR abilities, access to online movies and TV shows, streaming capabilities and intelligent search. Ubuntu One, Ubuntu’s own cloud, will also be built into the new TV so that users can watch movies and TV shows stored on other devices.
Last year, a group of four students in Ahmedabad developed an innovative device called the Air Glove. The device won great acclaim at an inter-collegiate contest. It is an infrared-based gesture interface device that can be used as a remote control for televisions, air-conditioners, computers or any other appliance fitted with the Air Glove infrared interface. The device has an advantage over radio frequency based devices and can operate multiple appliances simultaneously. Moreover, the infrared technology has pin-point accuracy, making it more viable.
While the team demonstrated the concept with a large form-factor device, they are now developing a wrist-watch model.
Betting big on touch and gesture user interfaces
Microsoft Surface has already become a famed technology. The 360-degree interface uses PixelSense technology to see and respond to touch and gestures of real-world objects. It accepts over 50 simultaneous inputs, and has been used in the 102cm (40-inch) Samsung SUR40, which can be embedded on tables and walls.
Kinect, another Microsoft technology, has also made a mark. This motion-sensing input device for Xbox 360 and Windows PCs uses a webcam-like peripheral and associated software that lets users control and interact with the device through gestures and spoken commands. Microsoft hopes that Kinect will play a huge role in industries like healthcare, apart from interactive television and computer systems.
During a talk at CES, Steve Ballmer commented that Windows 8’s Metro user interface would drive a new magic that will make “one plus one equal three.” The Metro features a touch-enabled, tiled Start screen. Each tile represents an application and display information related to that application, such as the number of unread messages for an email application or the document currently being used on a wordprocessor. Metro applications can share information between them, and Microsoft hopes it to revolutionise cross-device user interfacing.
Indeed, there is a lot of focus on gesture-based computing these days, and hardware and software products are coming up to enable gesture-based user interfaces. San Francisco based Leap has introduced a small plastic device for enabling gesture applications on Mac. You need to simply plug the Leap into a USB port, load the Leap Motion software and calibrate the device by waving your arms as instructed. Having done that, the device becomes capable of tracking the motion of your hands or fingers very precisely (in the scale of around 1/100th of a millimetre), within a space of around 0.22 cubic metres (8 cubic feet).
Leap hopes that more developers will create software to take advantage of this gesture capability. It has provided sample applications and a software development kit on its website.
Pranav Mistry’s SixthSense is another interesting gesture computing technology. His work with MIT’s Media Lab involves converting natural hand movements into digital information, to interact with computer systems. The SixthSense prototype comprises a pocket projector, a mirror and a camera. The hardware components are coupled in a pendant-like device, and connected to the mobile computing device in the user’s pocket.
The projector projects visual information enabling surfaces, walls and physical objects around us to be used as interfaces, while the camera recognises and tracks the user’s hand gestures and physical objects using computer-vision based techniques. The software program processes the video stream data captured by the camera and tracks the locations and movements of the coloured markers at the tip of the user’s fingers. This information is interpreted into gestures, which, in turn, act as instructions to an application.
SixthSense supports multi-touch and multi-user interaction. For more information about it, read the interview with Pranav Mistry published in EFY’s February 2012 issue (http://bit.ly/yTj8Vg).
Sparsh, another of Mistry’s projects, is equally interesting. In an interview with EFY, Mistry explains “Sparsh is an interaction method that lets users conceptually transfer media from one digital device to their body, and pass it to another digital device through simple touch gestures. In Sparsh, we are playing with the perception of the user. When a user touches any data, it appears as if it becomes part of the user. But actually the user acts just as a token or ID and the data gets copied in a particular folder that belongs to the user’s ID in the cloud, and stays there.
The data is matched with the token of the user in the cloud. Now when the user touches another media or device where he wishes to paste the data, the device accesses the data corresponding to the user’s ID in the cloud.
“It is something like e-mail in concept, which can be checked from anywhere. But if I show something as simple as an e-mail to my grandma, she may wonder how I am able to check the same e-mail in office that I checked at home. She may ask if I brought it along from office. The concept of Sparsh is similar, where the user appears to have become a USB drive but the magic is happening in the cloud,” Mistry adds.
Bend or twist, for a change
One of the software-based factors in improving user-interfaces is the zooming user interface (ZUI). Although the concept has been around for over a decade, the vision of creating the best ZUI continues to fire many an innovation.
Basically, a ZUI is a type of graphical user interface in which the users can change the scale of the viewed area in order to see more detail or less, and browse through different documents. Instead of constraining the user’s view with fixed-size windows, the ZUI assumes the screen to be an infinitely-sized one, with information elements placed along the way. The user can pan the screen, zoom in or zoom out, and generally view things as they are placed in their mind.
The ZUI concept has powered the imagination of many device-makers including Sony, Apple, Google and Microsoft, leading to constant innovations. Every company has quite a few ZUI experiments dotting its timeline. If you take Microsoft, you will find that DeepFish, Canvas for OneNote and some Kinect applications are all examples of its trysts with ZUIs. ZUIs form the basis of successful tools like Google Maps and Microsoft Photosynth.
But the present ZUIs are nothing compared to what they will really be once we have flexible displays. In November 2011, Nokia demonstrated a prototype handheld device that lets the user bend and twist the screen to complete actions like scrolling and zooming. Samsung too will soon be debuting phones with flexible displays, followed by tablets.
Such flexible displays will be the ultimate ZUI technology. You will have durable, flexible and large displays. And, you would just have to bend the screen or twist it to zoom in or see along the sides. Or, perhaps you can fold it to focus on some areas and ignore the rest. You can roll up your screen and put it into your pocket while you travel. Once you reach a coffee shop, you can spread the screen atop your table and have fun!
Or, chant a spell!
Voice training is a problem with voice-controlled devices because it is very difficult to achieve the noise-free environment or constant voice tone and accent required by these nascent tools. A clogged nose can play havoc, causing all your instructions to be misunderstood by the systems.
A lot of research is happening across the world to make speech system training more natural and accurate. Several toolkits and developmental aids are also coming up to enable researchers, designers and developers. One recent success is the Open Source Kaldi toolkit.
2. Minimise user effort in understanding and using the device
3. Make the interaction more human-like
4. Reduce unnecessary interactions—be it input or output
5. Avoid clutter; keep it simple
6. Improve portability and energy-efficiency
7. Focus on health and environmental factors
8. Cater to users or applications with special needs
Kaldi is written in C++ and pro-vides a speech recognition system based on finite-state transducers, using the freely available OpenFst, together with detailed documentation and scripts for building complete recognition systems. It is more full-fledged than other ASR toolkits like HTK, Julius (both written in C), Sphinx-4 (writ-ten in Java) and RWTH ASR (written in C++). It works on common platforms like UNIX, Linux and Windows.
The advent of Siri instils a little more confidence in voice recognition. Apple’s brainchild, Siri on the iPhone 4S allows you to handle text, reminders, Web search, etc using voice. Siri and other new-generation technologies like Nuance are a bit more accurate than older voice recognition software. Plus, with a little bit of intelligence, Siri is often able to understand in just one or two attempts what you are trying to say. Quite like a personal assistant, you can actually tell Siri to “call the cook and tell her to make pasta for dinner.” You can dictate a whole e-mail or ask the phone to recommend restaurants nearby, all by simply talking to it.
Siri is the outcome of Apple’s research partnerships with organisations ranging from DARPA to Bing, Yahoo and Google. As a result, the natural language user interface is quite mature and intelligent.
Other voice recognition tools include Google’s Voice Actions and Microsoft’s TellMe speech interface. However, these are not really speech application platforms but simply programs that process your voice and carry out some instruction. There is no high-level integration.
Real speech interface
For a long time now, developers have been hoping for a whole speech platform that not only allows you to interact with specific built-in functions but is also extensible to third-party applications just like a real operating system’s graphical user interface. Imagine a speech user interface that gives you one single method of activating the device and then all its functions are accessible through the speech command of the user and the voice feedback of the device. There should be no need to use a screen, mouse or keyboard! Further, a real speech user interface should allow third-party applications to extend their functions to other programs through a series of application programming interfaces.
In a recent review, mobile expert Adam Z. Lein commented that the latest Xbox 360 offered something close to this—if not a complete platform, at least an extensible speech interface. The device appears to have a consistent speech interface that is shared by all applications and games that have been programmed to make use of it.
“No matter if I’m in the Dashboard, the YouTube app, Netflix, Crackle, MSNBC or Star Wars Kinect, if I say ‘Xbox’ it will start listening to me and highlight the relevant commands that I can say. I can tell the TV what I want it to do instead of picking up a plastic controller and pressing buttons. Plus, since the speech user interface is shared throughout the system, apps like YouTube and Netflix don’t have to reinvent the wheel in order to get proper voice control support,” writes Lein.
“Unfortunately, the speech interface of Xbox still requires your eyes on the screen in many situations since it does not provide voice feedback for any command. Hope-fully, we will finally see some real forward movement in the area of speech user interfaces for Windows Phone 8 and Windows 8 since that seems like the next step for human-computer interactions of the future—considering we already have 3D gesture recognition with Kinect,” adds Lein.
The new models of Cadillac focus on the so-called Cadillac user experience or CUE. Now Cadillac users can put away their phones or tablets, yet channel the devices’ capability and media into the car using an elegant and intuitive user experience.
• The CUE innovations are primarily focused on proximity sensing, natural voice recognition, haptic feedback and a unique capacitive touchscreen.
• Proximity sensing minimises the display when not in use.
• The haptic feedback gives users a pulsing sensation when they select menu items.
• The 8-inch capacitive touchscreen allows for swipe and pinching gestures to move and resize items on the main screen.
• The built-in 3D navigation includes turn-by-turn natural speech recognition and auto-fill location input.
The CUE designer team has also opened up the Linux-based CUE platform to developers, to ensure that the CUE stays ahead of tech trends always.
Why not simply think?
While voice and gesture control are still in a nascent stage, products for thought-based user interfaces are already hitting the market. These too are still very, very nascent but nevertheless there is an option for applications or people with special needs.
One interesting product in this spectrum is inventor Tan Le’s EPOC headset, developed and marketed by Emotiv Systems. The Emotiv EPOC is a wireless neuro-headset that taps the latest developments in neuroscience to enable high-resolution neuro-signal acquisition and processing. It is a decent-sized wearable headset that uses electroencephalogram (EEG) readings to collect and decipher the user’s thoughts and translate them into instructions.
Users can control and influence the virtual environment with their mind, access applications and play games developed specifically for the EPOC. Or, they can use the EmoKey to connect to current PC games and experience them in a completely new way. The EEG readings can be used by researchers for neuro-therapy, biofeedback, and brain computer interface study and design. A software development kit is also available for developers to come out with interesting applications using EPOC.
The Sekati brain-computer application programming interface is another interesting tool. The framework allows you to interface with an EEG device via a socket (provided by the open ThinkGearConnector server; part of NeuroSky’s open development SDK) and allows you to read connection status, signal strength, NeuroSky’s proprietary algorithmic eSense values as well as raw EEG values. The tool may be used for medical observation, games or applications.
While NeuroSky and Emotiv are the two main commercial competitors, there are other players such as Cyberkinetics, Mindball and Starlab too which make brain-computer interface products.
In short, there is a lot of interesting work happening in the space of user interfaces—whether based on speech, touch, gesture or thought. However, designers have to keep in mind that it is not just the technology but the usability and user-engagement that are important for any user interface.
“A user interface must address what first-century BC architect Vitruvius stated—architecture (read ‘all creative design’) must address commodity, firmness and delight: value, structure and aesthetics,” reminds Ranjit Makkuni, president-Sacred World Foundation (www.sacredworld.com), multimedia researcher, designer and musician.
The author is a technically-qualified freelance writer, editor and hands-on mom based in Chennai