Blog

Voice First or Voice And? Dispatches from Voice Summit

The inaugural Voice Summit was held last week in New Jersey, with the hashtag #voicefirst. At Wellpepper, we’re actually in the “Voice and” camp. We love voice interfaces for their convenience, promotion of empathy and connection, and their natural engagement. However, there are times when voice isn’t the best interface for the task or others when voice plus other interfaces are even better, which is reflected in some of our work with the Alexa Diabetes Challenge, which I spoke about at the conference.

People can only remember 5 things at a time, which is a challenge for delivery of complex instructions, education, or information through voice. Add this to the fact that voice is a “headless” navigation. That is, there are often no cues to figure out where you are going. Most of us are visual creatures, and visual cues together with voice or text often provide a richer experience. And believe it or not, the many of the sessions at this inaugural voice conference also seemed to reinforce this idea, in particular many of the consumer sessions, in addition to the healthcare sessions.

Talks by two very different consumer organizations, Comcast and Lego both showed how early we are in voice design, and how when voice is more seamless and ubiquitous we may see the promise of “voice first” but also how “voice and” is possibly the better path forward.

While when you think of giants of voice, you many immediately think of Amazon and Google, did you know that Comcast processed over 6B voice queries last year? My first thought on attending this session was that it was going to be about using interactive voice response trees before you get to a customer service agent, but Comcast has been quietly infusing voice into their entertainment experiences.

Did you know that your Comcast remote has a “voice” interface? You can talk to your TV to find programs, change the channel, or start a show. This is probably one of the best examples of “voice and.” First, voice search is actually found on a physical device. The Comcast design team had originally created a mobile app for the remote voice experience, but found that downloads were a small fraction of their entire subscriber base, so adding a “voice button” to the remote encouraged more searches. Also remember that when you use voice to search it shows you the results on your television screen. This is a “voice and” experience which wouldn’t make a lot of sense as voice standalone. Imagine searching for a movie to watch, say you’re looking for something starring Harrison Ford, and you’ve got to keep in your mind all the titles over his varied career and then choose one. First it’s a lot to remember, and second isn’t it easier to browse titles when you can see pictures and a description to jog your memory? I spoke briefly with the Comcast presenters about why they chose to put voice on the remote, versus directly in the cable box, and they said that it helped their users find the option, which was a big takeaway from the conference for me, although voice is a natural interface, the end-user still needs guidance. (A nice side benefit of the button on the remote is that it’s not always on and listening.)

Lego was another unlikely consumer company playing in the voice arena. Lego “Duplo Stories” is an Alexa skill that tells stories that children can then build using Duplo blocks. While the video was heartwarming, this session in particular highlighted both opportunities for “Voice And” using augmented reality, and also the current discovery limitations of voice.

In the video, a child playing with Duplo blocks asks his mother to start a story. The mother asks Alexa to play a Duplo story. Think about this: the skill had to be discovered and activated before any of this could take place. How would you learn about the skill without something printed on the box that the Duplo blocks came in? While it’s clever, imagine a new scenario where voice and augmented reality are built right into the blocks: a virtual Duplo minecraft. The child builds something with Duplo, and then a voice and visual interface projects the story on the child’s creation.

It’s still early days, and the potential for “Voice And” is still huge. In fact, a lot of the content at this conference reminded me of the early days of web interfaces. There was lots of talk about taxonomy of information, and “chunking” information into manageable pieces. (I used to teach a course on writing for the web, where we practiced this, which is funny as we now are so accustomed to screens that long-form journalism is making a real comeback.)

Similar to the early days of the web, there seemed to be slightly more focus on publishing than on end-user goals: what does the end-user actually want to accomplish, not what is the end-goal of the content publisher. What’s different though is that while during Web 1.0, the answer to question of whether every business needed a website, was a resounding yes, it’s not clear that everyone needs a voice skill. With 30,000 skills already available for Alexa, and new features coming online weekly, the irony is that the Alexa team sends a weekly newsletter to keep us up to date. So, even Alexa knows it’s a “Voice And” world.

Posted in: Behavior Change, Voice

Leave a Comment (0) ↓

Leave a Comment

Google+