Leap hopes that more developers will create software to take advantage of this gesture capability. It has provided sample applications and a software development kit on its website.
Pranav Mistry’s SixthSense is another interesting gesture computing technology. His work with MIT’s Media Lab involves converting natural hand movements into digital information, to interact with computer systems. The SixthSense prototype comprises a pocket projector, a mirror and a camera. The hardware components are coupled in a pendant-like device, and connected to the mobile computing device in the user’s pocket.
The projector projects visual information enabling surfaces, walls and physical objects around us to be used as interfaces, while the camera recognises and tracks the user’s hand gestures and physical objects using computer-vision based techniques. The software program processes the video stream data captured by the camera and tracks the locations and movements of the coloured markers at the tip of the user’s fingers. This information is interpreted into gestures, which, in turn, act as instructions to an application.
SixthSense supports multi-touch and multi-user interaction. For more information about it, read the interview with Pranav Mistry published in EFY’s February 2012 issue (http://bit.ly/yTj8Vg).
Sparsh, another of Mistry’s projects, is equally interesting. In an interview with EFY, Mistry explains “Sparsh is an interaction method that lets users conceptually transfer media from one digital device to their body, and pass it to another digital device through simple touch gestures. In Sparsh, we are playing with the perception of the user. When a user touches any data, it appears as if it becomes part of the user. But actually the user acts just as a token or ID and the data gets copied in a particular folder that belongs to the user’s ID in the cloud, and stays there.
The data is matched with the token of the user in the cloud. Now when the user touches another media or device where he wishes to paste the data, the device accesses the data corresponding to the user’s ID in the cloud.
[stextbox id=”info” caption=”Design tip”]In a recent speech in Austin, Kay Hofmeester and Daniel Wigdor, both user experience experts and members of the Microsoft Surface team, commented that input is a language—one which must be created by the designer, learned by the user and taught with a new type of user interface. So whether it is based on touch, gesture or speech, a user interface must also include the necessary features to teach the user how to use it properly. This learning should hopefully be smooth and natural. A good technology with a sharp learning curve will be more of a horror than a pleasure![/stextbox]
“It is something like e-mail in concept, which can be checked from anywhere. But if I show something as simple as an e-mail to my grandma, she may wonder how I am able to check the same e-mail in office that I checked at home. She may ask if I brought it along from office. The concept of Sparsh is similar, where the user appears to have become a USB drive but the magic is happening in the cloud,” Mistry adds.
Bend or twist, for a change
One of the software-based factors in improving user-interfaces is the zooming user interface (ZUI). Although the concept has been around for over a decade, the vision of creating the best ZUI continues to fire many an innovation.
Basically, a ZUI is a type of graphical user interface in which the users can change the scale of the viewed area in order to see more detail or less, and browse through different documents. Instead of constraining the user’s view with fixed-size windows, the ZUI assumes the screen to be an infinitely-sized one, with information elements placed along the way. The user can pan the screen, zoom in or zoom out, and generally view things as they are placed in their mind.
The ZUI concept has powered the imagination of many device-makers including Sony, Apple, Google and Microsoft, leading to constant innovations. Every company has quite a few ZUI experiments dotting its timeline. If you take Microsoft, you will find that DeepFish, Canvas for OneNote and some Kinect applications are all examples of its trysts with ZUIs. ZUIs form the basis of successful tools like Google Maps and Microsoft Photosynth.
But the present ZUIs are nothing compared to what they will really be once we have flexible displays. In November 2011, Nokia demonstrated a prototype handheld device that lets the user bend and twist the screen to complete actions like scrolling and zooming. Samsung too will soon be debuting phones with flexible displays, followed by tablets.
Such flexible displays will be the ultimate ZUI technology. You will have durable, flexible and large displays. And, you would just have to bend the screen or twist it to zoom in or see along the sides. Or, perhaps you can fold it to focus on some areas and ignore the rest. You can roll up your screen and put it into your pocket while you travel. Once you reach a coffee shop, you can spread the screen atop your table and have fun!
Or, chant a spell!
Voice training is a problem with voice-controlled devices because it is very difficult to achieve the noise-free environment or constant voice tone and accent required by these nascent tools. A clogged nose can play havoc, causing all your instructions to be misunderstood by the systems.
A lot of research is happening across the world to make speech system training more natural and accurate. Several toolkits and developmental aids are also coming up to enable researchers, designers and developers. One recent success is the Open Source Kaldi toolkit.
[stextbox id=”info” caption=”Common goals of user-interface designers”]1. Make the technology or equipment easy and enjoyable to use
2. Minimise user effort in understanding and using the device
3. Make the interaction more human-like
4. Reduce unnecessary interactions—be it input or output
5. Avoid clutter; keep it simple
6. Improve portability and energy-efficiency
7. Focus on health and environmental factors
8. Cater to users or applications with special needs[/stextbox]
Kaldi is written in C++ and pro-vides a speech recognition system based on finite-state transducers, using the freely available OpenFst, together with detailed documentation and scripts for building complete recognition systems. It is more full-fledged than other ASR toolkits like HTK, Julius (both written in C), Sphinx-4 (writ-ten in Java) and RWTH ASR (written in C++). It works on common platforms like UNIX, Linux and Windows.
The advent of Siri instils a little more confidence in voice recognition. Apple’s brainchild, Siri on the iPhone 4S allows you to handle text, reminders, Web search, etc using voice. Siri and other new-generation technologies like Nuance are a bit more accurate than older voice recognition software. Plus, with a little bit of intelligence, Siri is often able to understand in just one or two attempts what you are trying to say. Quite like a personal assistant, you can actually tell Siri to “call the cook and tell her to make pasta for dinner.” You can dictate a whole e-mail or ask the phone to recommend restaurants nearby, all by simply talking to it.
Siri is the outcome of Apple’s research partnerships with organisations ranging from DARPA to Bing, Yahoo and Google. As a result, the natural language user interface is quite mature and intelligent.
Other voice recognition tools include Google’s Voice Actions and Microsoft’s TellMe speech interface. However, these are not really speech application platforms but simply programs that process your voice and carry out some instruction. There is no high-level integration.
Real speech interface
For a long time now, developers have been hoping for a whole speech platform that not only allows you to interact with specific built-in functions but is also extensible to third-party applications just like a real operating system’s graphical user interface. Imagine a speech user interface that gives you one single method of activating the device and then all its functions are accessible through the speech command of the user and the voice feedback of the device. There should be no need to use a screen, mouse or keyboard! Further, a real speech user interface should allow third-party applications to extend their functions to other programs through a series of application programming interfaces.
In a recent review, mobile expert Adam Z. Lein commented that the latest Xbox 360 offered something close to this—if not a complete platform, at least an extensible speech interface. The device appears to have a consistent speech interface that is shared by all applications and games that have been programmed to make use of it.
“No matter if I’m in the Dashboard, the YouTube app, Netflix, Crackle, MSNBC or Star Wars Kinect, if I say ‘Xbox’ it will start listening to me and highlight the relevant commands that I can say. I can tell the TV what I want it to do instead of picking up a plastic controller and pressing buttons. Plus, since the speech user interface is shared throughout the system, apps like YouTube and Netflix don’t have to reinvent the wheel in order to get proper voice control support,” writes Lein.
“Unfortunately, the speech interface of Xbox still requires your eyes on the screen in many situations since it does not provide voice feedback for any command. Hope-fully, we will finally see some real forward movement in the area of speech user interfaces for Windows Phone 8 and Windows 8 since that seems like the next step for human-computer interactions of the future—considering we already have 3D gesture recognition with Kinect,” adds Lein.
[stextbox id=”info” caption=”Carmakers also focus on user interface”]
The new models of Cadillac focus on the so-called Cadillac user experience or CUE. Now Cadillac users can put away their phones or tablets, yet channel the devices’ capability and media into the car using an elegant and intuitive user experience.
• The CUE innovations are primarily focused on proximity sensing, natural voice recognition, haptic feedback and a unique capacitive touchscreen.
• Proximity sensing minimises the display when not in use.
• The haptic feedback gives users a pulsing sensation when they select menu items.
• The 8-inch capacitive touchscreen allows for swipe and pinching gestures to move and resize items on the main screen.
• The built-in 3D navigation includes turn-by-turn natural speech recognition and auto-fill location input.
The CUE designer team has also opened up the Linux-based CUE platform to developers, to ensure that the CUE stays ahead of tech trends always.
[/stextbox]
Why not simply think?
While voice and gesture control are still in a nascent stage, products for thought-based user interfaces are already hitting the market. These too are still very, very nascent but nevertheless there is an option for applications or people with special needs.
One interesting product in this spectrum is inventor Tan Le’s EPOC headset, developed and marketed by Emotiv Systems. The Emotiv EPOC is a wireless neuro-headset that taps the latest developments in neuroscience to enable high-resolution neuro-signal acquisition and processing. It is a decent-sized wearable headset that uses electroencephalogram (EEG) readings to collect and decipher the user’s thoughts and translate them into instructions.
Users can control and influence the virtual environment with their mind, access applications and play games developed specifically for the EPOC. Or, they can use the EmoKey to connect to current PC games and experience them in a completely new way. The EEG readings can be used by researchers for neuro-therapy, biofeedback, and brain computer interface study and design. A software development kit is also available for developers to come out with interesting applications using EPOC.
The Sekati brain-computer application programming interface is another interesting tool. The framework allows you to interface with an EEG device via a socket (provided by the open ThinkGearConnector server; part of NeuroSky’s open development SDK) and allows you to read connection status, signal strength, NeuroSky’s proprietary algorithmic eSense values as well as raw EEG values. The tool may be used for medical observation, games or applications.
While NeuroSky and Emotiv are the two main commercial competitors, there are other players such as Cyberkinetics, Mindball and Starlab too which make brain-computer interface products.
In short, there is a lot of interesting work happening in the space of user interfaces—whether based on speech, touch, gesture or thought. However, designers have to keep in mind that it is not just the technology but the usability and user-engagement that are important for any user interface.
“A user interface must address what first-century BC architect Vitruvius stated—architecture (read ‘all creative design’) must address commodity, firmness and delight: value, structure and aesthetics,” reminds Ranjit Makkuni, president-Sacred World Foundation (www.sacredworld.com), multimedia researcher, designer and musician.
The author is a technically-qualified freelance writer, editor and hands-on mom based in Chennai