So what’s so special about voice and these conversational experiences? How did we end up here?
Well if you think about it, voice communication has a certain number of advantages compared to other mediums…
- it is faster than typing or writing
- it allows for hands-free interactions (cooking, driving, working from across the room)
- it is more intuitive (everyone learns how to talk at a very young age and does many times a day)
- facilitates empathy as it includes other aspects such as tone, volume, and intonation that make our meaning much more clear than any other form of visual or written communication (like email, SMS, etc.)
And technology has always been fascinated by it. In 1952, Bell Labs created the “Audrey” system which recognized digits spoken by a single voice. Then in 1962 IBM demonstrated their “Shoebox” machine capable of understanding 16 words spoken in English.
And so, with every decade we’ve seen faster evolution – from the first commercial speech recognition system, to prediction models, larger vocabularies, dictation systems, and voice-activated menus – we have been able to reach the 1st Great Era of Voice User Interfaces in the 2000s; which has been represented in great part via the Interactive Voice Response (IVR) systems that have become so prevalent over the telephone.
In particular, smart speakers have exploded in just a few years. Now 16% of the US population own a smart speaker, with an estimated 44 Million units sold, dominating 81% of the market split between Amazon Echo (55%), Google Home (23%) and Apple HomePod (3%) devices. By 2022, it is expected that 55% of US homes will own a smart speaker.
Have we reached a tipping point where everything will be driven by voice? Are we now able to meet or even exceed user’s expectations when it comes to conversational interactions and the ability of this technology to mimic human conversation?
Companies like Amazon, Google, Microsoft and many others are betting we are, and many others are joining them. With that in mind, here are three key things you have to consider as you venture into this new world:
1) It’s all about making it easy – most users and customers don’t pick one technology over another just because it is cool or trendy. They normally do it because it either makes things easier/faster for them or because it solves a problem/pain they have. Therefore, as you think about creating a voice experience, you have to find the reason why someone would prefer to use your product or service over an alternative (which may very well be a non-technological way in which they achieve their goal or solve their problem today).
2) Speech is not the Holy Grail – context matters. For as natural as talking might be, there are certain contexts in which speech may not be the best way of interacting with a product or service. For example, if you’re in a public place, you may not want to use your voice if you’re going to annoy others or if someone else might listen to your conversation. There’s also the discomfort some people still have of talking to a computer, their preference for texting, and of course concerns over privacy.
3) It takes skills – conversations have evolved over thousands of years and are based on multiple psychological principles, social behaviors, and unique linguistic attributes that come with every language. Therefore, just because we can talk and communicate with others, it doesn’t mean we can (or should) be designing these types of conversational interactions.
As we continue to explore this new field and discover together what it means for your product or business, don’t hesitate to reach out and share these principles with others before they become the next creepy AI news story.
About the speaker
Phillip Hunter is the VP of Product at Pulse Labs - creating user experience solutions for voice-driven products. Prior to joining Pulse Labs, Phillip led the user experience team for Alexa at Amazon. In addition, Phillip managed user experience team for Amazon Web Services. Before his career at Amazon, Phillip worked on design teams at Microsoft’s application services group - including Bing, Office and Skype.